Is evaluation based on accuracy of classification algorithms misleading? An approach to model validation using Bayes error rate

Kazemzadeh Gharechopogh, Hossein; Mohammadpour, Adel

doi:10.22124/jmm.2025.28744.2555

Is evaluation based on accuracy of classification algorithms misleading? An approach to model validation using Bayes error rate

Document Type : Research Article

Authors

Faculty of Mathematics and Computer Science, Amirkabir University of Technology

10.22124/jmm.2025.28744.2555

Abstract

Researchers have long regarded model accuracy as the primary metric for evaluating
the performance of classification algorithms. The current evaluation approach, which relies solely
on model accuracy, often leads to inappropriate evaluation of classifiers, regardless of the dataset’s
separability and complexity. This limitation underscores the need for a new and more comprehen
sive method. We argue that accuracy-based evaluation can be misleading, even when considering
measures of data separability and complexity. We compare the error rates of well-known classifiers
on Gaussian-generated datasets and show that, paradoxically, many algorithms’ observed errors are
lower than that of the theoretical optimal classifier, leading to an overestimation of their performance.
We consider a model invalid if its error rate is lower than the optimal classifier error, known as the
Bayes error rate. To identify such invalid models, we introduce a procedure and propose an algorithm
for model validation based on the Bayes error rate.

Keywords

Main Subjects

Integral equations

References

[1] E.Alpaydin, Introduction to Machine Learning, Adaptive Computation and Machine Learning,
MIT Press, Third Edition, 2014.
[2] C.M. Bishop, Pattern Recognition and Machine Learning, Volume 4 of Information Science
and Statistics, Springer, 2006.
[3] L. Dalton, E. Dougherty, Optimal Bayesian Classification, Press Monograph Series, SPIE Press,
2020.
[4] R. Duda, P. Hart, D. Stork, Pattern Classification, Wiley, 2012.
[5] A. Fern´andez, S. Garc´ıa, M. Galar, R.C. Prati, B. Krawczyk, F. Herrera, Data Intrinsic Charac-
teristics, pages 253–277, Springer, 2018.
[6] K. Fukunaga, Introduction to Statistical Pattern Recognition, Chapter 10, Academic Press,
1990.
[7] S. Guan, M.H. Loew, A novel intrinsic measure of data separability, 52 (2022) 17734–17750.
[8] T.K. Ho, M. Basu, Complexity measures of supervised classification problems, IEEE Trans.
Pattern Anal. Mach. Intell. 24 (2002) 289–300.
[9] A. Izenman, Modern Multivariate Statistical Techniques: Regression, Classification, and Man-
ifold Learning, Springer Texts in Statistics. Springer, 2009.
[10] A.C. Lorena, L.P.F. Garcia, J. Lehmann, M.C.P. Souto, T.K. Ho, How complex is your clas-
sification problem? A survey on measuring classification complexity, ACM Comput. Surv. 52
(2019) 1–34.
[11] G.J. McLachlan, Discriminant Analysis and Statistical Pattern Recognition, Wiley, 2004.
[12] K. Murphy, Probabilistic Machine Learning: An Introduction, Adaptive Computation and Ma-
chine Learning series, MIT Press, 2022.
[13] M. Noshad, L. Xu, A. Hero, Learning to benchmark: Determining best achievable misclassifi-
cation error from training data, 2019, arXiv:1909.07192.
[14] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P.
Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M.
Perrot, E. Duchesnay, Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 12 (2011)
2825–2830.
[15] Y. Peleg, Hungabunga: Brute-Force all sklearn models with all possible hyperparameters, and
rank using cross-validation, GitHub, Retrieved from https://github.com/ypeleg/HungaBunga,
2023.

[16] S. Theodoridis, Machine Learning: A Bayesian and Optimization Perspective, Elsevier, 2020.
[17] L. Wasserman, All of Statistics: A Concise Course in Statistical Inference, Springer, 2004.
[18] L. Xue, X. Zhang, W. Jiang, K. Huo, Q. Shen, A classification performance evaluation measure
considering data separability In L. Iliadis, A. Papaleonidas, P. Angelov, and C. Jayne, editors,
Artificial Neural Networks and Machine Learning – ICANN 2023, pages 1–13, Springer Nature
Switzerland, 2023.
[19] S. Yu, X. Li, Y. Feng, X. Zhang, S. Chen. An instance-oriented performance measure for clas-
sification. Inf. Sci. 580 (2021) 598–619.

Article View: 427
PDF Download: 707

Is evaluation based on accuracy of classification algorithms misleading? An approach to model validation using Bayes error rate

References

Volume 13, Issue 4
December 2025
Pages 917-927

Files

Share

How to cite

Statistics

Is evaluation based on accuracy of classification algorithms misleading? An approach to model validation using Bayes error rate

References

Volume 13, Issue 4December 2025Pages 917-927

Files

Share

How to cite

Statistics

Volume 13, Issue 4
December 2025
Pages 917-927