Dimension reduction by identifying and removing redundant variables using copula function

Document Type : Research Article

Authors

1 Department of Statistics, Islamic Azad University, North Tehran Branch, Tehran, Iran

2 Department of Mathematics, Islamic Azad University, Shahriar Branch, Shahriar, Iran

Abstract

In today's world, rapid developments in science and engineering are increasingly adding up to larger amounts of data; as a result, numerous problems have emerged in the analysis of big data. Hence, data dimensionality reduction can accelerate data analysis and even yield better results without losing any useful data.  A copula represents an appropriate model of dependence to compare multivariate distributions and better detect the relationships of data. Therefore, a copula is employed in this study to identify and delete noisy data  from the original data.  Then, it is compared to  the principal component analysis to show its superiority.

Keywords


[1] F. Badakhshan Farahabadi, K. Fathi Vajargah, R. Farnoosh, Dimension reduction big data using
recognition of data features based on Copula function and principal component analysis, Adv.
Math. Phys. 2021 (2021) 9967368.
[2] F. Badakhshan Farahabadi, K. Fathi Vajargah, R. Farnoosh, Dimension reduction of big data and
deleting noise and its efficiency in the decision tree method and its use in covid 19, Int. J. Math.
Model. Comput. 12 (2022) 183–190.
[3] C.M. Bishop. Pattern Recognition and Machine Learning, Information Science and Statistics,
Springer, New York, 2006.
[4] U. Braga-Neto. Fundamentals of Pattern Recognition and Machine Learning, Springer, Cham,
2024.
[5] D.K. Choubey, P. Kumar, S. Tripathi, S. Kumar, Performance evaluation of classification methods
with pca and pso for diabetes, Netw. Model. Anal. Health Inform. Bioinform. 9 (2020) 5.
[6] B. Fathi Vajargah, F. Merdoust, K. Fathi Vajargah, On computing dominant eigenpair by markov
chain monte carlo method, J. Appl. Math. Inform. 6 (2010) 2.
[7] K. Fathi Vajargah, Comparing ridge regression and principal components regression by monte
carlo simulation basedon MSE, J. Comput. Sci. Comput. Math. 3 (2013) 25–29.
[8] K. Fathi Vajargah, H. Mottaghi Golshan, F. Badakhshan Farahabadi, Improving the LDA linear
discriminant analysis method by eliminating redundant variables for the diagnosis of COVID-19
patients, Appl. Appl. Math. 18 (2023) p1.
[9] J. Forkman, J. Josse, and H.-P. Piepho, Hypothesis tests for principal component analysis when
variables are standardized, J. Agric. Biol. Environ. Stat. 24 (2019) 289–308.
[10] K. Fukunaga, Introduction to Statistical Pattern Recognition, Computer Science and Scientific
Computing, Academic Press, Boston, MA, second edition, 1990.
[11] P. Geethanjali, Comparative study of PCA in classification of multichannel EMG signals, Australas
Phys. Eng. Sci. Med. 38 (2015) 331–343.
[12] G.H. Golub, C.F. Van Loan, Matrix Computations, JHU press, 2013.
[13] F. Gorunescu, Data Mining: Concepts, Models and Techniques, Springer Science & Business
Media, 2011.
[14] T. Hastie, R. Tibshirani, J. Friedman, The Elements of Statistical Learning, Springer Series in
Statistics, Springer, New York, second edition, 2009.
[15] M. Haugh, An Introduction to Copulas: Quantitative risk management, Lecture Notes. New York:
Columbia University, 2016.
[16] D. Hong, F. Zhang, Weighted elastic net model for mass spectrometry imaging processing, Math.
Model. Nat. Phenom. 5 (2010) 115–133 .
[17] R. Houari, A. Bounceur, M.T. Kechadi, A.K. Tari, R. Euler, Dimensionality reduction in data
mining: A copula approach, Expert Syst. Appl. 64 (2016) 247–260.
[18] I. Jolliffe, A 50-year personal journey through time with principal component analysis, J. Multivar.
Anal. 188 (2022) 104820.
[19] I.T. Jolliffe, Principal Component Analysis. Springer Series in Statistics. Springer-Verlag, New
York, second edition, 2002.
[20] I.T. Jolliffe, J. Cadima, Principal component analysis: a review and recent developments, Philos.
Trans. Roy. Soc. A, 374 (2016) 16.
[21] D. Lopez-Paz, J.M. Hernandez-Lobato, G. Zoubin, Gaussian process vine copulas for multivariate
dependence, In International Conference on Machine Learning 28 (2013) 10–18.
[22] C.E. Metz, Basic principles of ROC analysis, In Seminars in nuclear medicine, 8 (1978) 283–298.
[23] R.B. Nelsen, An Introduction to Copulas, Springer Series in Statistics, Springer, New York, second
edition, 2006.
[24] F.R. Pirolla, M.T. Santos, J.C. Felipe, M.X. Ribeiro, Dimensionality reduction to improve content-
based image retrieval: A clustering approach, In 2012 IEEE International Conference on Bioin-
formatics and Biomedicine Workshops (2012) 752–753.
[25] A. Zinovyev, Overcoming complexity of biological systems: from data analysis to mathematical
modeling, Math. Model. Nat. Phenom 10 (2015) 186–206.