METHODOLOGICAL FOUNDATIONS FOR BUILDING ROBUST MACHINE LEARNING MODELS USING PROBABILISTIC AND STATISTICAL METHODS
Keywords:
machine learning, statistical learning theory, bayesian inference, stochastic gradient descent, regularization, attention mechanismAbstract
This paper analyzes the mathematical foundations of machine learning through the lens of probability theory, mathematical statistics, and multidimensional geometry. Contrasting with the empirical approach often dominant in applied research, this study demonstrates that key learning algorithms—from classical regression to modern transformer architectures—are rigorous consequences of fundamental statistical principles such as Maximum Likelihood Estimation, Bayesian inference, and measure concentration. The work includes derivations of objective functions and gradients from first principles, geometric interpretations of regularization and dimensionality reduction, and analysis of stochastic optimization methods. Particular attention is paid to model robustness in high-dimensional input spaces and theoretical justification of the manifold hypothesis.
References
K. P. Murphy, Probabilistic Machine Learning: An Introduction. MIT Press, 2022.
G. James, D. Witten, T. Hastie, и R. Tibshirani, An Introduction to Statistical Learning. Springer, 2013.
C. M. Bishop, Pattern Recognition and Machine Learning. Springer, 2006.
S. Geman, E. Bienenstock, и R. Doursat, «Neural networks and the bias/variance dilemma», Neural Computation, т. 4, вып. 1, сс. 1–58, 1992.
M. Belkin, D. Hsu, S. Ma, и S. Mandal, «Reconciling modern machine-learning practice and the classical bias–variance trade-off», Proceedings of the National Academy of Sciences, т. 116, вып. 32, сс. 15849–15854, 2019.
C. Zhang, S. Bengio, M. Hardt, B. Recht, и O. Vinyals, «Understanding deep learning requires rethinking generalization», в International Conference on Learning Representations (ICLR), 2017.
J. R. Magnus, P. K. Katyshev, и A. A. Peresetsky, Econometrics. An Introductory Course. Moscow: Delo, 2007.
J. A. Nelder и R. W. Wedderburn, «Generalized Linear Models», Journal of the Royal Statistical Society: Series A (General), т. 135, вып. 3, сс. 370–384, 1972.
Goodfellow, Y. Bengio, и A. Courville, Deep Learning. MIT Press, 2016.
M. H. DeGroot, Optimal Statistical Decisions. McGraw-Hill, 1970.
L. Bottou, «Stochastic learning», Advanced lectures on machine learning. Springer, сс. 146–168, 2004 г.
B. Neyshabur, R. Tomioka, и N. Srebro, «In Search of the Real Inductive Bias: On the Role of Implicit Regularization in Deep Learning», в ICLR (Workshop), 2015.
C. Fefferman, S. Mitter, и H. Narayanan, «Testing the manifold hypothesis», Journal of the American Mathematical Society, т. 29, вып. 4, сс. 983–1020, 2016.
H. Whitney, «Differentiable manifolds», Annals of Mathematics, сс. 645–680, 1936.
Vaswani и др., «Attention is all you need», Advances in Neural Information Processing Systems, т. 30, 2017.
Корякин, С. В. Аналитический обзор технологий построения аппаратно-ориентированных облачных систем защиты информации с применением нейросетевых технологий / С. В. Корякин // Проблемы автоматики и управления. – 2025. – № 2(53). – С. 41–51. – EDN RCCRHC.
Downloads
Published
Issue
Section
License
Copyright (c) 2026 А.К. Чирягов, С.В. Корякин, К.Р. Карабакиров

This work is licensed under a Creative Commons Attribution 4.0 International License.
