On the evolution of recurrent neural systems

Authors

DOI:

https://doi.org/10.20535/SRIT.2308-8893.2024.4.06

Keywords:

recurrent neural networks, transformer technology, KANs

Abstract

The evolution of neural network architectures, first of the recurrent type and then with the use of attention technology, is considered. It shows how the approaches changed and how the developers’ experience was enriched. It is important that the neural networks themselves learn to understand the developers’ intentions and actually correct errors and flaws in technologies and architectures. Using new active elements instead of neurons expanded the scope of connectionist networks. It led to the emergence of new structures — Kolmogorov–Arnold Networks (KANs), which may become serious competitors to networks with artificial neurons.

Author Biographies

Gennadii Abramov, Kherson State Maritime Academy, Kherson

Candidate of Physical and Mathematical Sciences (Ph.D.), an associate professor at the Navigation Department of Kherson State Maritime Academy, Kherson, Ukraine.

Ivan Gushchin, V. N. Karazin Kharkiv National University, Kharkiv

Senior lecturer at the Department of Artificial Intelligence and Software of V. N. Karazin Kharkiv National University, Kharkiv, Ukraine.

Tetiana Sirenka, V. N. Karazin Kharkiv National University, Kharkiv

Graduate student at V. N. Karazin Kharkiv National University, Kharkiv, Ukraine.

References

S. Hochreiter, Y. Bengio, P. Frasconi, and J. Schmidhuber, “Gradient flow in recurrent nets: the difficulty of learning long-term dependencies,” A Field Guide to Dynamical Recurrent Neural Networks. IEEE Press, 2001.

J. Chung, C. Gulcehre, K. Cho, and Y. Bengio, Empirical evaluation of gated recurrent neural networks on sequence modeling. 2014. Available: https://arxiv.org/pdf/1412.3555

I.V. Gushchin, O.V. Kirychok, and V.M. Kuklin, Introduction to the methods of organization and optimization of neural networks: a study guide. Kh.: KhNU named after V. N. Karazin, 2021, 152 p.

E. Charniak, Introduction to deep learning. Massachusetts: The MIT Press Cambridge, 2019, 192 p.

D. Bahdanau, K. Cho, and Y. Bengio, Neural machine translation by joint learning to align and translate. 2014. Available: https://arxiv.org/abs/1409.0473

I. Sutskever, J. Martens, G. Dahl, and G. Hinton, “On the importance of initialization and momentum in deep learning,” International Conference on Machine Learning, pp. 1139–1147, 2013.

A. Vaswani et al., “Attention is all you need,” Advances in Neural Information Processing Systems, pp. 5998–6008, 2017.

E.A. Nadaraya, “On estimating regression,” Theory of Probability & its Applications, 9(1), pp. 141–142, 1964. doi: https://doi.org/10.1137/1109020

A.P. Parikh, O. Täckström, D. Das, and J. Uszkoreit, A decomposable attention model for natural language inference. 2016. Available: https://arxiv.org/pdf/1606.01933

V. Gushchin, V.M. Kuklin, O.V. Mishin, and O.V. Pryimak, Modeling of physical processes using CUDA technology. Kh.: V.N. Karazin KhNU, 2017, 116 p.

Z. Liu et al., KAN: Kolmogorov-Arnold Networks. 2024. doi: https://doi.org/10.48550/arXiv.2404.19756

Downloads

Published

2024-12-25

Issue

Section

Methods, models, and technologies of artificial intelligence in system analysis and control