APPLICATION OF MACHINE LEARNING METHODS FOR FORECASTING AIR QUALITY IN THE CITY OF BISHKEK

Authors

  • А.Э. Бекбоева Kyrgyz-German Institute of Applied Informatics
  • А.Л. Лайлиева Kyrgyz-German Institute of Applied Informatics

Keywords:

PM2.5, air quality, machine learning, Random Forest, Gradient Boosting, Logistic Regression, stacking models, Bishkek, AQI forecasting

Abstract

Air pollution with fine particulate matter (PM2.5) remains one of the most acute environmental problems in Central Asian cities. Bishkek, located in a basin and subject to temperature inversions, records extreme pollution levels in winter every year. This paper presents a comprehensive analysis of the dynamics of PM2.5 concentrations and the Air Quality Index (AQI) for 2019–2024 based on monitoring data from the U.S. Embassy in Bishkek. Machine learning methods—Random Forest, Gradient Boosting, Logistic Regression—as well as an ensemble stacking model that integrates their advantages are applied to forecast the AQI and classify pollution levels. The models are compared using key metrics (R², RMSE, MAE, Accuracy, Precision, Recall, F1-score). The results demonstrate that the stacking approach provides more robust and accurate forecasting than the baseline models. The study confirms the effectiveness of integrating machine learning algorithms into an environmental monitoring and air-quality forecasting system.

References

1. Анализ качества воздуха в Бишкеке: распределение источников PM2.5 и меры по сокращению выбросов : [Электронный ресурс]. — Climate and Clean Air Coalition, 2019. — URL:

https://www.ccacoalition.org/ru/resources/air-quality-analysis-bishkek-pm25-source-apportionment-and-emission-reduction-measures

(дата обращения: 15.09.2024).

2. AirNow. U.S. Embassies and Consulates: Kyrgyzstan — Bishkek (historical air quality data, 2019–2024) : [Электронный ресурс]. — URL:

https://www.airnow.gov/international/us-embassies-and-consulates/#Kyrgyzstan$Bishkek

(дата обращения: 10.09.2024).

3. World Health Organization. WHO global air quality guidelines: particulate matter (PM2.5 and PM10), ozone, nitrogen dioxide, sulfur dioxide and carbon monoxide. — Geneva : World Health Organization, 2021. — URL:

https://www.who.int/

(дата обращения: 30.10.2024).

4. Breiman L. Random Forests // Machine Learning. — 2001. — Vol. 45, No. 1. — P. 5–32. — DOI: 10.1023/A:1010933404324.

5. Chen T., Guestrin C. XGBoost: A scalable tree boosting system // Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’16). — San Francisco, CA, 2016. — P. 785–794. — DOI: 10.1145/2939672.2939785.

6. Stacking to improve model performance: a comprehensive guide on ensemble learning in Python : [Электронный ресурс]. — Medium. — URL:

https://medium.com/@brijesh_soni/stacking-to-improve-model-performance-a-comprehensive-guide-on-ensemble-learning-in-python-9ed53c93ce28

(дата обращения: 15.10.2024).

7. Performance metrics in machine learning: complete guide : [Электронный ресурс]. — Neptune.ai. — URL:

https://neptune.ai/blog/performance-metrics-in-machine-learning-complete-guide

(дата обращения: 01.10.2025).

8. McKinney W. Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython. — 2nd ed. — Sebastopol, CA : O’Reilly Media, 2017. — 544 p.

9. Hunter J. D. Matplotlib: A 2D graphics environment // Computing in Science & Engineering. — 2007. — Vol. 9, No. 3. — P. 90–95. — DOI: 10.1109/MCSE.2007.55.

10. Waskom M. L. seaborn: statistical data visualization // Journal of Open Source Software. — 2021. — Vol. 6, No. 60. — Art. 3021. — DOI: 10.21105/joss.03021.

11. Hochreiter S., Schmidhuber J. Long short-term memory // Neural Computation. — 1997. — Vol. 9, No. 8. — P. 1735–1780.

12. Казанцев М. Р., Верзунов С. Н. Методологические подходы к оценке влияния автотранспорта на состояние атмосферного воздуха в городе Бишкеке // Проблемы автоматики и управления. — 2025. — № 2(53). — С. 74–87.

Downloads

Published

2026-01-19

Issue

Section

INFORMATION TECHNOLOGY AND INFORMATION PROCESSING