On the evolution of recurrent neural systems
DOI:
https://doi.org/10.20535/SRIT.2308-8893.2024.4.06Keywords:
recurrent neural networks, transformer technology, KANsAbstract
The evolution of neural network architectures, first of the recurrent type and then with the use of attention technology, is considered. It shows how the approaches changed and how the developers’ experience was enriched. It is important that the neural networks themselves learn to understand the developers’ intentions and actually correct errors and flaws in technologies and architectures. Using new active elements instead of neurons expanded the scope of connectionist networks. It led to the emergence of new structures — Kolmogorov–Arnold Networks (KANs), which may become serious competitors to networks with artificial neurons.
References
S. Hochreiter, Y. Bengio, P. Frasconi, and J. Schmidhuber, “Gradient flow in recurrent nets: the difficulty of learning long-term dependencies,” A Field Guide to Dynamical Recurrent Neural Networks. IEEE Press, 2001.
J. Chung, C. Gulcehre, K. Cho, and Y. Bengio, Empirical evaluation of gated recurrent neural networks on sequence modeling. 2014. Available: https://arxiv.org/pdf/1412.3555
I.V. Gushchin, O.V. Kirychok, and V.M. Kuklin, Introduction to the methods of organization and optimization of neural networks: a study guide. Kh.: KhNU named after V. N. Karazin, 2021, 152 p.
E. Charniak, Introduction to deep learning. Massachusetts: The MIT Press Cambridge, 2019, 192 p.
D. Bahdanau, K. Cho, and Y. Bengio, Neural machine translation by joint learning to align and translate. 2014. Available: https://arxiv.org/abs/1409.0473
I. Sutskever, J. Martens, G. Dahl, and G. Hinton, “On the importance of initialization and momentum in deep learning,” International Conference on Machine Learning, pp. 1139–1147, 2013.
A. Vaswani et al., “Attention is all you need,” Advances in Neural Information Processing Systems, pp. 5998–6008, 2017.
E.A. Nadaraya, “On estimating regression,” Theory of Probability & its Applications, 9(1), pp. 141–142, 1964. doi: https://doi.org/10.1137/1109020
A.P. Parikh, O. Täckström, D. Das, and J. Uszkoreit, A decomposable attention model for natural language inference. 2016. Available: https://arxiv.org/pdf/1606.01933
V. Gushchin, V.M. Kuklin, O.V. Mishin, and O.V. Pryimak, Modeling of physical processes using CUDA technology. Kh.: V.N. Karazin KhNU, 2017, 116 p.
Z. Liu et al., KAN: Kolmogorov-Arnold Networks. 2024. doi: https://doi.org/10.48550/arXiv.2404.19756