Efficient evaluation of machine learning models: a unified metric balancing performance and cost
DOI:
https://doi.org/10.20535/SRIT.2308-8893.2026.1.10Keywords:
artificial intelligence efficiency, compute-aware evaluation, model evaluation, artificial intelligence sustainability, software efficiencyAbstract
This paper introduces a novel, unified metric for evaluating the efficiency of machine learning, deep learning, and artificial intelligence models by balancing predictive performance and execution cost. Existing metrics typically isolate performance or execution measures (e.g., FLOPs, latency, energy), failing to capture the inherent trade-off between resource constraints and predictive capability in single formula. The proposed formula incorporates a tunable trade-off factor and hard constraints on performance and cost, allowing principled comparison across models and deployment settings. Our formulation generalizes prior heuristics and demonstrates clear interpretability, scalability, and hardware awareness.References
M. Tan, Q. Le, “EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks,” ICML, 2019. doi: https://doi.org/10.48550/arXiv.1905.11946
A. Howard et al., “Searching for MobileNetV3,” 2019 IEEE/CVF International Confer-ence on Computer Vision (ICCV), Seoul, Korea (South), 2019, pp. 1314–1324. doi: https://doi.org/10.1109/ICCV.2019.00140
S. Han, H. Mao, W. Dally, “Deep Compression: Compressing DNNs with Pruning, Trained Quantization and Huffman Coding,” ICLR, 2016. doi: https://doi.org/10.48550/arXiv.1510.00149
T. Wolf et al., “Transformers: State-of-the-Art Natural Language Processing,” EMNLP, pp. 38–45, 2020. doi: https://doi.org/10.18653/v1/2020.emnlp-demos.6
T.B. Brown et al., “Language Models are Few-Shot Learners,” NeurIPS, 2020. doi: https://doi.org/10.48550/arXiv.2005.14165
A. Dosovitskiy et al., “An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale,” ICLR, 2021. doi: https://doi.org/10.48550/arXiv.2010.11929
Sukhpal Singh Gill, Rupinder Kaur, ChatGPT: Vision and Challenges. 2023. doi: https://doi.org/10.48550/arXiv.2305.15323
Y. Cheng, D. Wang, P. Zhou, T. Zhang “Model Compression and Acceleration for Deep Neural Networks: The Principles, Progress, and Challenges,” IEEE Signal Pro-cessing Magazine, vol. 35, no. 1, pp. 126–136, Jan. 2018. doi: https://doi.org/10.1109/MSP.2017.2765695
J. Deng, W. Dong, R. Socher, L.-J. Li, Kai Li, Li Fei-Fei, “ImageNet: A large-scale hi-erarchical image database,” 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 2009, pp. 248–255. doi: https://doi.org/10.1109/CVPR.2009.5206848
“MLPerf Training Benchmark,” MLPerf Consortium. 2022. Available: https://mlcommons.org
M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, L.-C. Chen, “MobileNetV2: Inverted Residuals and Linear Bottlenecks,” 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 2018, pp. 4510–4520. doi: https://doi.org/10.1109/CVPR.2018.00474
J. Frankle, M. Carbin, “The Lottery Ticket Hypothesis,” ICLR, 2019. doi: https://doi.org/10.48550/arXiv.1803.03635
H. Cai, T. Chen, W. Zhang, Y. Yu, J. Wang, “Efficient Architecture Search by Network Transformation,” AAAI Conference on Artificial Intelligence, vol. 32, no. 1, 2018. doi: https://doi.org/10.1609/aaai.v32i1.11709
R. Collobert, J. Weston, L. Bottou, M. Karlen, K. Kavukcuoglu, P. Kuksa, “Natural Language Processing (Almost) from Scratch,” JMLR, vol. 12, pp. 2493–2537, 2011. doi: https://doi.org/10.5555/1953048.2078186
Haozhi Qi, Xiaolong Wang, Deepak Pathak, Yi Ma, Jitendra Malik, “Learning Long-Term Visual Dynamics with Region Proposal Interection Networks,” CoRR, 2020. doi: https://doi.org/10.48550/arXiv.2008.02265
M. Tan et al., “MnasNet: Platform-Aware Neural Architecture Search for Mobile,” 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 2019, pp. 2815–2823. doi: https://doi.org/10.1109/CVPR.2019.00293
Barret Zoph, Quoc V. Le, “Neural Architecture Search with Reinforcement Learning,” ICLR, 2017. doi: https://doi.org/10.48550/arXiv.1611.01578
“MLPerf Inference Benchmark v2.1,” MLCommons, 2022. Available: https://mlcommons.org/
Xuanyi Dong, Yi Yang, “NAS-Bench-201: Extending the Scope of Reproducible Neural Architecture Search,” ICLR, 2020. doi: https://doi.org/10.48550/arXiv.2001.00326
H. Benmeziane, K. El Maghraoui, H. Ouarnoughi, S. Niar, M. Wistuba, N. Wang, A Comprehensive Survey on Hardware-Aware Neural Architecture Search, 2021. doi: https://doi.org/10.48550/arXiv.2101.09336
D. Brooks et al. “Power-Aware Microarchitecture: Design and Modeling Challenges for Next-Generation Microprocessors,” IEEE Micro, vol. 20, issue 6, pp. 26–44, 2000. doi: https://doi.org/10.1109/40.888701
James H. Laros, “Energy Delay Product,” Energy-Efficient High Performance Compu-ting, SpringerBriefs in Computer Science. Springer, London, 2013. doi: https://doi.org/10.1007/978-1-4471-4492-2_8
S. Han et al., “EIE: Efficient Inference Engine on Compressed Deep Neural Network,” 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA), Seoul, Korea (South), 2016, pp. 243–254. doi: https://doi.org/10.1109/ISCA.2016.30
C.D. Manning, P. Raghavan, H. Schütze, Introduction to Information Retrieval. Cam-bridge University Press, 2008. doi: https://doi.org/10.1017/CBO9780511809071
Y. LeCun, Y. Bengio, G. Hinton, “Deep Learning,” Nature, 521, pp. 436–444, 2015. doi: https://doi.org/10.1038/nature14539
A. Veit, S. Belongie, “Convolutional Networks with Adaptive Inference Graphs,” IJCV, 2019. doi: https://doi.org/10.48550/arXiv.1711.11503
Álvaro Domingo Reguero, Silverio Martínez-Fernández, Roberto Verdecchia, “Energy-efficient neural network training through runtime layer freezing, model quantization, and early stopping,” Computer Standards & Interfaces, vol. 92, 103906, 2024. doi: https://doi.org/10.1016/j.csi.2024.103906
Yu Emma Wang, Gu-Yeon Wei, David Brooks, Benchmarking TPU, GPU, and CPU Platforms for Deep Learning, 2019. doi: https://doi.org/10.48550/arXiv.1907.10701
D.M.W. Powers, Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation, 2010. doi: https://doi.org/10.48550/arXiv.2010.16061
J.R. Hershey, Z. Chen, J. Le Roux, S. Watanabe, “Deep clustering: Discriminative em-beddings for segmentation and separation,” 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Shanghai, China, 2016, pp. 31–35, doi: https://doi.org/10.1109/ICASSP.2016.7471631
W. Kay et al., “The kinetics human action video dataset,” CoRR, 2017. doi: https://doi.org/10.48550/arXiv.1705.06950
D. Tran, H. Wang, L. Torresani, J. Ray, Y. LeCun, M. Paluri, “A Closer Look at Spa-tiotemporal Convolutions for Action Recognition,” 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 2018, pp. 6450–6459. doi: https://doi.org/10.1109/CVPR.2018.00675
J. Carreira, A.Zisserman, “Quo Vadis, Action Recognition? A new model and the kinetics dataset,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6299–6308, 2017. doi: https://doi.org/10.48550/arXiv.1705.07750
X. Wang, R. Girshick, A. Gupta, K. He, “Non-local Neural Networks,” 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 2018, pp. 7794–7803. doi: https://doi.org/10.1109/CVPR.2018.00813
C. Feichtenhofer, H. Fan, J. Malik, K. He, “SlowFast Networks for Video Recogni-tion,” 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Korea (South), 2019, pp. 6201–6210. doi: https://doi.org/10.1109/ICCV.2019.00630
C. Feichtenhofer, “X3D: Expanding Architectures for Efficient Video Recognition,” 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Se-attle, WA, USA, 2020, pp. 200–210. doi: https://doi.org/10.1109/CVPR42600.2020.00028
H. Fan et al., “Multiscale Vision Transformers,” 2021 IEEE/CVF International Confer-ence on Computer Vision (ICCV), Montreal, QC, Canada, 2021, pp. 6804–6815. doi: https://doi.org/10.1109/ICCV48922.2021.00675
G. Bertasius, H. Wang, L. Torresani, “Is space-time attention all you need for video un-derstanding?” CoRR, 2021. doi: https://doi.org/10.48550/arXiv.2102.05095
D. Neimark, O. Bar, M. Zohar, D. Asselmann, “Video transformer network,” CoRR, 2021. doi: https://doi.org/10.48550/arXiv.2102.00719
A. Arnab, M. Dehghani, G. Heigold, C. Sun, M. Lučić, C. Schmid, “ViViT: A Video Vision Transformer,” 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 2021, pp. 6816–6826. doi: https://doi.org/10.1109/ICCV48922.2021.00676
Z. Liu et al., “Video Swin Transformer,” 2022 IEEE/CVF Conference on Computer Vi-sion and Pattern Recognition (CVPR), New Orleans, LA, USA, 2022, pp. 3192–3201. doi: https://doi.org/10.1109/CVPR52688.2022.00320
Alex Wang, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, Samuel R. Bowman, “GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Lan-guage Understanding,” CoRR, 2018. doi: https://doi.org/10.48550/arXiv.1804.07461
Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova, BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. 2018 doi: https://doi.org/10.48550/arXiv.1810.04805
Zhiqing Sun, Hongkun Yu, Xiaodan Song, Renjie Liu, Yiming Yang, Denny Zhou, MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices. 2020. doi: https://doi.org/10.48550/arXiv.2004.02984
Sahana Viswanath et al., “The DistilBERT Model: A Promising Approach to Improve Machine Reading Comprehension Models,” International Journal on Recent and Inno-vation Trends in Computing and Communication, vol. 11, no. 8, pp. 293–309, 2023. doi: https://doi.org/10.17762/ijritcc.v11i8.7957
Xiaoqi Jiao et al., TinyBERT: Distilling BERT for Natural Language Understanding. 2019. doi: https://doi.org/10.48550/arXiv.1909.10351