CharityML – Income Prediction for Donor Targeting (Supervised Machine Learning – Binary Classification) CharityML – Income Prediction for Donor Targeting (Supervised Machine Learning – Binary Classification) CharityML – Income Prediction for Donor Targeting (Supervised Machine Learning – Binary Classification) CharityML – Income Prediction for Donor Targeting (Supervised Machine Learning – Binary Classification) CharityML – Income Prediction for Donor Targeting (Supervised Machine Learning – Binary Classification) CharityML – Income Prediction for Donor Targeting (Supervised Machine Learning – Binary Classification) CharityML – Income Prediction for Donor Targeting (Supervised Machine Learning – Binary Classification) CharityML – Income Prediction for Donor Targeting (Supervised Machine Learning – Binary Classification) CharityML – Income Prediction for Donor Targeting (Supervised Machine Learning – Binary Classification) CharityML – Income Prediction for Donor Targeting (Supervised Machine Learning – Binary Classification) CharityML – Income Prediction for Donor Targeting (Supervised Machine Learning – Binary Classification) CharityML – Income Prediction for Donor Targeting (Supervised Machine Learning – Binary Classification) CharityML – Income Prediction for Donor Targeting (Supervised Machine Learning – Binary Classification) CharityML – Income Prediction for Donor Targeting (Supervised Machine Learning – Binary Classification) CharityML – Income Prediction for Donor Targeting (Supervised Machine Learning – Binary Classification) CharityML – Income Prediction for Donor Targeting (Supervised Machine Learning – Binary Classification) CharityML – Income Prediction for Donor Targeting (Supervised Machine Learning – Binary Classification) CharityML – Income Prediction for Donor Targeting (Supervised Machine Learning – Binary Classification) CharityML – Income Prediction for Donor Targeting (Supervised Machine Learning – Binary Classification) CharityML – Income Prediction for Donor Targeting (Supervised Machine Learning – Binary Classification) CharityML – Income Prediction for Donor Targeting (Supervised Machine Learning – Binary Classification)
تفاصيل العمل

Objective Built a predictive model to help a non-profit identify individuals likely to donate by predicting whether annual income > $50,000 using the 1994 UCI Adult Census dataset. Dataset ~32,561 records (after cleaning → ~26,900 usable) Severe class imbalance (~25.6% >$50K) 14 raw features (6 numerical + 8 categorical) → expanded to ~103 features after encoding Key Technical Achievements Data Preparation & Cleaning Removed duplicates, handled missing values (' ?' → NaN), standardized target labels One-hot encoding of categorical variables MinMax scaling of continuous features Exploratory Data Analysis Identified strong predictors early: capital-gain, age, education level, marital status Visualized feature importance and class distribution Modeling & Optimization Compared 4 algorithms: GaussianNB, DecisionTree, AdaBoost, RandomForest Used GridSearchCV for hyperparameter tuning Optimized for Fβ=0.5 score (prioritizing precision – critical for donor targeting) Final Model Performance (best configuration – full feature set) Accuracy: 85.6% Fβ=0.5: 72.4% Feature Importance & Model Simplification Top 5 most predictive features (Random Forest): age capital-gain education-num hours-per-week marital-status_Married-civ-spouse Model trained only on these 5 features: Accuracy: 84.1% (–1.5%) Fβ=0.5: 68.4% (–4%) Training time reduced by ~45–50% Business & Modeling Takeaways Capital-gain is by far the strongest single signal Marital status (especially “Married-civ-spouse”) is a surprisingly powerful proxy for higher household income ~95% of predictive power can be captured with only 5 features → excellent trade-off between performance and simplicity Precision-focused evaluation metric (F0.5) was correctly chosen for real-world donor outreach use case Technologies & Skills Demonstrated Python, pandas, scikit-learn Data cleaning & preprocessing pipelines One-hot encoding + feature scaling Model comparison & hyperparameter tuning (GridSearchCV) Feature importance analysis & dimensionality reduction Handling class imbalance & appropriate metric selection (Fβ score) End-to-end ML workflow in Jupyter Notebook Clear business interpretation & trade-off analysis

شارك
بطاقة العمل
تاريخ النشر
منذ أسبوعين
المشاهدات
24
المستقل
Catherine Gaballah
Catherine Gaballah
مهندس ذكاء اصطناعي
طلب عمل مماثل
شارك
مركز المساعدة