Adaptive hybrid activation function for deep neural networks
Keywords:adaptive hybrid activation function, double-stage parameter turning process, deep neural networks
The adaptive hybrid activation function (AHAF) is proposed that combines the properties of the rectifier units and the squashing functions. The proposed function can be used as a drop-in replacement for ReLU, SiL and Swish activations for deep neural networks and can evolve to one of such functions during the training. The effectiveness of the function was evaluated on the image classification task using the Fashion-MNIST and CIFAR-10 datasets. The evaluation shows that the neural networks with AHAF activations achieve better classification accuracy comparing to their base implementations that use ReLU and SiL. A double-stage parameter tuning process for training the neural networks with AHAF is proposed. The proposed approach is sufficiently simple from the implementation standpoint and provides high performance for the neural network training process.
Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning”, Nature, vol. 521, no. 7553, pp. 436–444, 2015. doi: 10.1038/nature14539.
J. Schmidhuber, “Deep learning in neural networks: An overview”, Neural Networks, vol. 61, pp. 85–117, 2015. doi: 10.1016/j.neunet.2014.09.003.
I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning. MIT Press, 2016.
D. Graupe, Deep Learning Neural Networks: Design and Case Studies. USA: World Scientific Publishing Co., Inc., 2016.
A.L. Caterini and D.E. Chang, Deep Neural Networks in a Mathematical Framework, 1st ed. Springer Publishing Company, Incorporated, 2018.
C.C. Aggarwal, Neural Networks and Deep Learning: A Textbook, 1st ed. Springer Publishing Company, Incorporated, 2018.
G. Cybenko, “Approximation by superpositions of a sigmoidal function”, Mathematics of Control, Signals and Systems, vol. 2, no. 4, pp. 303–314, 1989. doi: 10.1007/BF02551274.
K. Hornik, “Approximation capabilities of multilayer feedforward networks”, Neural Networks, vol. 4, no. 2, pp. 251–257, 1991. doi: 10.1016/0893-6080(91)90009-T.
A. Cichocki and R. Unbehauen, Neural Networks for Optimization and Signal Processing, 1st ed. USA: John Wiley & Sons, Inc., 1993.
K. He, X. Zhang, S. Ren, and J. Sun, “Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification”, in 2015 IEEE International Conference on Computer Vision (ICCV), 2015, pp. 1026–1034. doi: 10.1109/ICCV.2015.123.
D.-A. Clevert, T. Unterthiner, and S. Hochreiter, “Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs)”, arXiv [cs.LG], 2016. doi: 10.1162/neco.19126.96.36.1995.
K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition”, in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 770–778. doi: 10.1109/CVPR.2016.90.
S. Hochreiter and J. Schmidhuber, “Long Short-Term Memory”, Neural Comput., vol. 9, no. 8, pp. 1735–1780, 1997. doi: 10.1162/neco.19188.8.131.525.
S. Elfwing, E. Uchibe, and K. Doya, “Sigmoid-Weighted Linear Units for Neural Network Function Approximation in Reinforcement Learning”, arXiv [cs.LG], 2017.
P. Ramachandran, B. Zoph, and Q.V. Le, “Searching for Activation Functions”, arXiv [cs.NE], 2017.
X. Jin, C. Xu, J. Feng, Y. Wei, J. Xiong, and S. Yan, “Deep Learning with S-shaped Rectified Linear Activation Units”, arXiv [cs.CV], 2015.
M. Tanaka, “Weighted Sigmoid Gate Unit for an Activation Function of Deep Neural Network”, arXiv [cs.CV], 2018.
B. Yuen, M.T. Hoang, X. Dong, and T. Lu, “Universal Activation Function For Machine Learning”, arXiv [cs.LG], 2020.
D. Misra, “Mish: A Self Regularized Non-Monotonic Activation Function”, arXiv [cs.LG], 2020.
J.K. Kruschke and J.R. Movellan, “Benefits of gain: speeded learning and minimal hidden layers in back-propagation networks”, IEEE Transactions on Systems, Man, and Cybernetics, vol. 21, no. 1, pp. 273–280, 1991. doi: 10.1109/21.101159.
Z. Hu and H. Shao, “The study of neural network adaptive control systems”, Control and Decision, no. 7, pp. 361–366, 1992.
C.-T. Chen and W.-D. Chang, “A Feedforward Neural Network with Function Shape Autotuning”, Neural Netw., vol. 9, no. 4, pp. 627–641, 1996. doi: 10.1016/0893-6080(96)00006-8.
E. Trentin, “Networks with Trainable Amplitude of Activation Functions”, Neural Netw., vol. 14, no. 4–5, pp. 471–493, 2001. doi: 10.1016/S0893-6080(01)00028-4.
F. Agostinelli, M. Hoffman, P. Sadowski, and P. Baldi, “Learning Activation Functions to Improve Deep Neural Networks”, arXiv [cs.NE], 2015.
L.R. Sütfeld, F. Brieger, H. Finger, S. Füllhase, and G. Pipa, “Adaptive Blending Units: Trainable Activation Functions for Deep Neural Networks”, arXiv [cs.LG], 2018.
Y.V. Bodyanskiy, A. Deineko, I. Pliss, and V. Slepanska, “Formal Neuron Based on Adaptive Parametric Rectified Linear Activation Function and its Learning”, in Proc. 1st Int. Workshop on Digital Content & Smart Multimedia “DCSMART 2019”, vol. 2533, pp. 14–22.
D.P. Kingma and J. Ba, “Adam: A Method for Stochastic Optimization”, arXiv [cs.LG], 2017.
P. Otto, Y. Bodyanskiy, and V. Kolodyazhniy, “A new learning algorithm for a forecasting neuro-fuzzy network”, Integrated Computer-Aided Engineering, vol. 10, pp. 399–409, 2003. doi: 10.3233/ICA-2003-10409.
F. Manessi and A. Rozza, “Learning Combinations of Activation Functions”, CoRR, vol. abs/1801.09403, 2018.
A. Paszke et al., “PyTorch: An Imperative Style, High-Performance Deep Learning Library”, in Advances in Neural Information Processing Systems 32, H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, Reds Curran Associates, Inc., 2019, pp. 8024–8035.
H. Xiao, K. Rasul, and R. Vollgraf, “Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms”, arXiv [cs.LG], 2017.
A. Krizhevsky, Learning multiple layers of features from tiny images, 2009.
Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition”, Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, 1998. doi: 10.1109/5.726791.
F. Chollet et al., “Keras”, 2015. [Online]. Available: https://github.com/ fchollet/keras.