Machine Learning–Based Downscaling of GLDAS Surface Soil Moisture and Performance Evaluation Across Climatic Regions of Iran

Document Type : Research Article

Authors

1 Department of Water Engineering, Department of Water Engineering, Aburaihan Campus, University of Tehran, Iran

2 Faculty of Interdisciplinary Science and Technology, University of Tehran, Iran

3 Department of Civil Engineering, SR.C., Islamic Azad University, Tehran, Iran

4 Department of Water Engineering, Aburaihan Campus,, University of Tehran

Abstract

Research Topic: Due to scarce in-situ soil moisture observations and coarse reanalysis resolution, downscaling GLDAS surface soil moisture (SSM) using machine learning is essential.
 
Objective: This study evaluates the performance of machine learning algorithms for downscaling GLDAS SSM across Iran’s diverse climatic zones.
 
Method: In this study, Random Forest (RF), XGBoost, CatBoost, and LightGBM algorithms were employed for downscaling process. Model inputs comprised station-based climatic variables (minimum and maximum temperature, precipitation, and evaporation) and spatial attributes on a monthly scale over a 31-year period. The models were trained using GLDAS surface soil moisture values, producing downscaled soil moisture at a higher spatial resolution as output. In-situ soil moisture observations were utilized exclusively for independent validation. The dataset was partitioned into training (80%) and testing (20%) sets based on a chronological order.
 
Results: CatBoost demonstrated a strong ability to capture nonlinear soil moisture patterns, achieving coefficients of determination exceeding 0.73 across all climatic zones considered in this study. Although model accuracy varied depending on climatic characteristics and the spatial distribution of reference data, CatBoost was identified as an efficient algorithm for soil moisture prediction over the study area due to its high generalization capability and satisfactory performance in independent national-scale validation (R² = 0.607, RMSE = 4.286, Bias = 2.131).
 
Conclusions: The findings indicate that machine learning frameworks, particularly CatBoost, offer a reliable approach for downscaling GLDAS SSM. Given its high generalization capability across varying hydro-climatic conditions, CatBoost is recommended for enhancing drought monitoring and water resource management in data-scarce regions.

Keywords

Main Subjects


  • Abdeh kolahchi,A. , Miri,M. , Zand,M. and Porhemmat,J. (2023). Comparative Evaluation of GLDAS, ESA CCI SM and SMAP Soil Moisture with in situ Measurements (Case Study: Lorestan Province). Environment and Water Engineering9(4), 548-562. doi: 10.22034/ewe.2023.367471.1819. (in Persian)
  • Ali, Z., Hamed, M. M., Nashwan, M. S., & Shahid, S. (2023). Spatiotemporal analysis of groundwater resources sustainability in South Asia and China using GLDAS data sets. Environmental Earth Sciences, 82(24), 586.
  • Amini, A., Moghadam, M. K., Kolahchi, A. A., Raheli-Namin, M., & Ahmed, K. O. (2023). Evaluation of GLDAS soil moisture product over Kermanshah province, Iran. H2Open Journal, 6(3), 373–386.
  • Arah, O. A., Chiba, Y., & Greenland, S. (2008). Bias formulas for external adjustment and sensitivity analysis of unmeasured confounders. Annals of Epidemiology, 18(8), 637–646.
  • Azur, M. J., Stuart, E. A., Frangakis, C., & Leaf, P. J. (2011). Multiple imputation by chained equations: what is it and how does it work? International Journal of Methods in Psychiatric Research, 20(1), 40–49.
  • Batchu, V., Nearing, G., & Gulshan, V. (2023). A deep learning data fusion model using sentinel-1/2, SoilGrids, SMAP, and GLDAS for soil moisture retrieval. Journal of Hydrometeorology, 24(10), 1789–1823.
  • Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32.
  • Cai, J., Zhang, Y., Li, Y., Liang, X. S., & Jiang, T. (2017). Analyzing the characteristics of soil moisture using GLDAS data: A case study in eastern China. Applied Sciences, 7(6), 566.
  • Chahine, M. T. (1992). The hydrological cycle and its influence on climate. Nature, 359(6394), 373–380.
  • Chen, L., He, Q., Liu, K., Li, J., & Jing, C. (2019). Downscaling of GRACE-derived groundwater storage based on the random forest model. Remote Sensing, 11(24), 2979.
  • Chen, S., She, D., Zhang, L., Guo, M., & Liu, X. (2019). Spatial downscaling methods of soil moisture based on multisource remote sensing data and its application. Water, 11(7), 1401.
  • Chen, Y., Yang, K., Qin, J., Zhao, L., Tang, W., & Han, M. (2013). Evaluation of AMSR‐E retrievals and GLDAS simulations against observations of a soil moisture network on the central Tibetan Plateau. Journal of Geophysical Research: Atmospheres, 118(10), 4466–4475.
  • Chicco, D., Warrens, M. J., & Jurman, G. (2021). The coefficient of determination R-squared is more informative than SMAPE, MAE, MAPE, MSE and RMSE in regression analysis evaluation. Peerj Computer Science, 7, e623.
  • Dormann, C. F., Elith, J., Bacher, S., Buchmann, C., Carl, G., Carré, G., Marquéz, J. R. G., Gruber, B., Lafourcade, B., & Leitão, P. J. (2013). Collinearity: a review of methods to deal with it and a simulation study evaluating their performance. Ecography, 36(1), 27–46.
  • Evans, S., Williams, G. P., Jones, N. L., Ames, D. P., & Nelson, E. J. (2020). Exploiting earth observation data to impute groundwater level measurements with an extreme learning machine. Remote Sensing, 12(12), 2044.
  • Famiglietti, J. S., Lo, M., Ho, S. L., Bethune, J., Anderson, K. J., Syed, T. H., Swenson, S. C., de Linage, C. R., & Rodell, M. (2011). Satellites measure recent rates of groundwater depletion in California’s Central Valley. Geophysical Research Letters, 38(3).
  • Fatolazadeh, F., Eshagh, M., & Goïta, K. (2020). A new approach for generating optimal GLDAS hydrological products and uncertainties. Science of the Total Environment, 730, 138932.
  • Gaona, J., Benito-Verdugo, P., Martínez-Fernández, J., González-Zamora, Á., Almendra-Martín, L., & Herrero-Jiménez, C. M. (2023). Predictive value of soil moisture and concurrent variables in the multivariate modelling of cereal yields in water-limited environments. Agricultural Water Management, 282, 108280.
  • Gedara, S. M., Wasantha, P. L. P., Teodosio, B., Yaghoubi, E., van Staden, R., & Guerrieri, M. (2025). Investigation of seasonal soil moisture and temperature variations underneath a waffle raft foundation built on reactive soil. Scientific Reports, 15(1), 34499.
  • (2024). The Bridge Between Data and Science. NASA. https://giovanni.gsfc.nasa.gov/giovanni/
  • Hasan, F., Medley, P., Drake, J., & Chen, G. (2024). Advancing hydrology through machine learning: insights, challenges, and future directions using the CAMELS, caravan, GRDC, CHIRPS, PERSIANN, NLDAS, GLDAS, and GRACE datasets. Water, 16(13), 1904.
  • Jiménez, C., Prigent, C., Mueller, B., Seneviratne, S. I., McCabe, M. F., Wood, E. F., Rossow, W. B., Balsamo, G., Betts, A. K., & Dirmeyer, P. A. (2011). Global intercomparison of 12 land surface heat flux estimates. Journal of Geophysical Research: Atmospheres, 116(D2).
  • Koster, R. D., & Suarez, M. J. (1992). Modeling the land surface boundary in climate models as a composite of independent vegetation stands. Journal of Geophysical Research: Atmospheres, 97(D3), 2697–2715.
  • Liang, X. (1994). A simple hydrologically based model of land surface water and enlergy fluxes for general circulation models. Geophys. Res., 99(7), 14–415.
  • Little, R. J. A., & Rubin, D. B. (2019). Statistical analysis with missing data. John Wiley & Sons.
  • Lundberg, S. M., & Lee, S.-I. (2017). A unified approach to interpreting model predictions. Advances in Neural Information Processing Systems, 30.
  • Mehmood, K., Anees, S. A., Muhammad, S., Shahzad, F., Liu, Q., Khan, W. R., Shrahili, M., Ansari, M. J., & Dube, T. (2025). Machine learning and spatio temporal analysis for assessing ecological impacts of the billion tree afforestation project. Ecology and Evolution, 15(2), e70736.
  • Mitchell, K. E., Lohmann, D., Houser, P. R., Wood, E. F., Schaake, J. C., Robock, A., Cosgrove, B. A., Sheffield, J., Duan, Q., & Luo, L. (2004). The multi‐institution North American Land Data Assimilation System (NLDAS): Utilizing multiple GCIP products and partners in a continental distributed hydrological modeling system. Journal of Geophysical Research: Atmospheres, 109(D7).
  • Mousavimehr, S. M., & Kavianpour, M. R. (2025). Estimating Groundwater Levels in Tehran Province Using Ensemble Learning Algorithms. Contributions of Science and Technology for Engineering, 2(1), 51–63.
  • Nabavi,S. N. , Alizadeh,A. and Faridhosseini,A. (2020). Evaluation of groundwater resources using GRACE Satellite Gravimetric Data (Case study: Khorasan Razavi). Iranian Journal of Irrigation & Drainage14(3), 855-866. (in Persian)
  • Nouraki, A. , golabi, M. , albaji, M. , naseri, A. and Homayouni, S. (2023). Spatial-temporal modeling of soil moisture using optical and thermal remote sensing data and machine learning algorithms. Iranian Journal of Soil and Water Research54(4), 637-653. doi: 10.22059/ijswr.2023.356707.669469. (in Persian)
  • Park, S., Park, S., Im, J., Rhee, J., Shin, J., & Park, J. D. (2017). Downscaling GLDAS soil moisture data in East Asia through fusion of multi-sensors by optimizing modified regression trees. Water, 9(5), 332.
  • Raziei, T. (2022). Climate of Iran according to Köppen-Geiger, Feddema, and UNEP climate classifications. Theoretical & Applied Climatology, 148.
  • Rodell, M., Chen, J., Kato, H., Famiglietti, J. S., Nigro, J., & Wilson, C. R. (2007). Estimating groundwater storage changes in the Mississippi River basin (USA) using GRACE. Hydrogeology Journal, 15, 159–166.
  • Rodell, M., Houser, P. R., Jambor, U. E. A., Gottschalck, J., Mitchell, K., Meng, C.-J., Arsenault, K., Cosgrove, B., Radakovich, J., & Bosilovich, M. (2004). The global land data assimilation system. Bulletin of the American Meteorological Society, 85(3), 381–394.
  • Rodell, M., Velicogna, I., & Famiglietti, J. S. (2009). Satellite-based estimates of groundwater depletion in India. Nature, 460(7258), 999–1002.
  • San Liang, X., & Zhang, Y. (2018). Coastal Environment, Disaster, and Infrastructure: A Case Study of China’s Coastline. BoD–Books on Demand.
  • Senanayake, I. P., Pathira Arachchilage, K. R. L., Yeo, I.-Y., Khaki, M., Han, S.-C., & Dahlhaus, P. G. (2024). Spatial downscaling of satellite-based soil moisture products using machine learning techniques: A review. Remote Sensing, 16(12), 2067.
  • Shang, K. Z., Wang, S. G., Ma, Y. X., Zhou, Z. J., Wang, J. Y., Liu, H. L., & Wang, Y. Q. (2007). A scheme for calculating soil moisture content by using routine weather data. Atmospheric Chemistry and Physics, 7(19), 5197–5206.
  • Strassberg, G., Scanlon, B. R., & Rodell, M. (2007). Comparison of seasonal terrestrial water storage variations from GRACE with groundwater‐level measurements from the High Plains Aquifer (USA). Geophysical Research Letters, 34(14).
  • Ting, Y.-S. (2024). Why Machine Learning Models Systematically Underestimate Extreme Values. ArXiv Preprint ArXiv:2412.05806.
  • Tiwari, V. M., Wahr, J., & Swenson, S. (2009). Dwindling groundwater resources in northern India, from satellite gravity observations. Geophysical Research Letters, 36(18).
  • Wang, L., & Gao, Y. (2025). Estimating and downscaling ESA-CCI soil moisture using Multi-Source remote sensing images and Stacking-Based ensemble learning algorithms in the Shandian River Basin, China. Remote Sensing, 17(4), 716.
  • Wu, Q., Si, B., He, H., & Wu, P. (2019). Determining regional-scale groundwater recharge with GRACE and GLDAS. Remote Sensing, 11(2), 154.
  • Xia, Y., Mitchell, K., Ek, M., Sheffield, J., Cosgrove, B., Wood, E., Luo, L., Alonge, C., Wei, H., & Meng, J. (2012). Continental‐scale water and energy flux analysis and validation for the North American Land Data Assimilation System project phase 2 (NLDAS‐2): 1. Intercomparison and application of model products. Journal of Geophysical Research: Atmospheres, 117(D3).
  • Xu, J., Su, Q., Li, X., Ma, J., Song, W., Zhang, L., & Su, X. (2024). A Spatial Downscaling Framework for SMAP Soil Moisture Based on Stacking Strategy. Remote Sensing, 16(1), 200.
  • Xu, Z., Sun, H., Gao, J., Wang, Y., Wu, D., Zhang, T., & Xu, H. (2024). PhySoilNet: A deep learning downscaling model for microwave satellite soil moisture with physical rule constraint. International Journal of Applied Earth Observation and Geoinformation, 135, 104290.
  • Yin, W., Hu, L., Zhang, M., Wang, J., & Han, S. (2018). Statistical downscaling of GRACE‐derived groundwater storage using ET data in the North China plain. Journal of Geophysical Research: Atmospheres, 123(11), 5973–5987.
  • Zaitchik, B. F., Rodell, M., & Olivera, F. (2010). Evaluation of the Global Land Data Assimilation System using global river discharge data and a source‐to‐sink routing scheme. Water Resources Research, 46(6).
  • Zuo, J., Xu, J., Li, W., & Yang, D. (2019). Understanding shallow soil moisture variation in the data-scarce area and its relationship with climate change by GLDAS data. Plos One, 14(5), e0217020.
Volume 12, Issue 4
December 2026
Pages 982-1004
  • Receive Date: 14 October 2025
  • Revise Date: 04 November 2025
  • Accept Date: 16 December 2025
  • First Publish Date: 22 December 2025
  • Publish Date: 22 December 2025