Development of a machine learning model for precipitation forecasting in Kenya

Mulinge, Damaris M.; Kihoro, John M.; Madila, Shadrack S.; Angolo, Shem M.

MoCU IR Home
→
Information and Communication Technology
→
Research Articles
→
View Item

dc.contributor.author	Mulinge, Damaris M.
dc.contributor.author	Kihoro, John M.
dc.contributor.author	Madila, Shadrack S.
dc.contributor.author	Angolo, Shem M.
dc.date.accessioned	2025-10-15T05:48:56Z
dc.date.available	2025-10-15T05:48:56Z
dc.date.issued	2025
dc.identifier.citation	Mulinge, D. M., Kihoro, J. M., Madila, S. S., & Angolo, S. M. (2025). Development of a machine learning model for precipitation forecasting in Kenya. Global Journal of Engineering and Technology Advances, 24(03), 043-050.	en_US
dc.identifier.uri	http://repository.mocu.ac.tz/xmlui/handle/123456789/2039
dc.description	Global Journal of Engineering and Technology Advances, 2025, 24(03), 043-050	en_US
dc.description.abstract	Accurate precipitation forecasting is important for mitigating the impacts of climate variability in Kenya, where erratic rainfall events considerably affect agriculture, water control and disaster preparedness. Traditional methods such as ARIMA (Autoregressive Integrated Moving Average) and NWP (Numerical Weather Prediction) have shown to struggle with complex weather patterns due to linearity assumptions, high computational demands and limited spatial resolution. This research develops and evaluates an XGBoost-based machine learning model to enhance precipitation predictions both long-term and short-term. Utilizing a a 20-year weather dataset (2004 - 2024) with 7300 daily data records sourced from online Visual Crossing Weather Data, key features include temperature, humidity, wind speed, lagged precipitation (1-7), rolling means and seasonal encoding to capture bimodal rainfall patterns of the months of march-May, and October-December. Data processing involved min-max normalization of 0-1 range, feature selection, sin/cosine transformations for seasonal patterns and temperature-humidity interactions for connective modelling processes. The dataset used was split with 80% for training and 20% for testing and a temporal split ≤ 2020 for training and > 2020 for testing maintaining the chronological data order. The initial attempts exhibited poor performance with low R2 = 0.066 and a high RMSE=1.06 hence leading to XGBoost binary classification shift to predict the likelihood of rain/no-rain tomorrow. Bayesian optimization and GridSearchCV hyperparameter tuning was applied with default 0.5 threshold adjustment for improved rain class sensitivity using classification metrics and resulted 76.76% accuracy, 70.14% precision, 33.36% recall, 45.12% F1-Score and ROC-AUC 0.75. Post-tuning accuracy by reducing the threshold to 0.3 to capture missed rainfall events: 73% accuracy, no-rain precision and recall 81%, 53% rain precision, 54% recall, F1-Score 54%. Temperature-humidity interaction as the top predictor in feature importance. The results contribute to improved precipitation prediction accuracy hence supporting decision making in agriculture, water resource management and early disaster preparedness in Kenya’s climate vulnerable regions	en_US
dc.language.iso	en	en_US
dc.publisher	Global Journal of Engineering and Technology Advances	en_US
dc.relation.ispartofseries	Vol. 24;No. 3
dc.subject	Machine Learning	en_US
dc.subject	Climate	en_US
dc.subject	Variability	en_US
dc.subject	Precipitation forecasting	en_US
dc.subject	XGBoost	en_US
dc.subject	Binary	en_US
dc.title	Development of a machine learning model for precipitation forecasting in Kenya	en_US
dc.type	Article	en_US