This project builds a machine learning system to predict public transportation delays using a noisy real-world dataset. It includes data cleaning, preprocessing, feature engineering, and exploratory data analysis to understand delay patterns.
Multiple models such as Linear Regression, Random Forest, and Gradient Boosting were trained and evaluated using metrics like MAE, MSE, RMSE, and R². SHAP was used for model explainability and feature importance analysis.
Technologies used: Python, Pandas, NumPy, Scikit-learn, Matplotlib, Seaborn, SHAP, and Jupyter Notebook.