Course: CIE 317/417 Machine Learning
Institution: Zewail City of Science, Technology and Innovation
A comprehensive Machine Learning system that analyzes educational data and predicts student performance. Leveraging a dataset of 20,000+ student records, the system identifies key factors influencing academic success and provides predictive insights through three core tasks:
- Regression: Predicting exact
final_score(0-100) - Binary Classification: Determining
pass_failstatus - Multiclass Classification: Predicting specific
final_grade(A, B, C, D, F)
An interactive Streamlit Dashboard visualizes model performance and enables real-time predictions.
| Name | Student ID |
|---|---|
| Mohammed Ali Sadek | 202200594 |
| Ahmed Amgad | 202200393 |
| Abdulrahman Madgy | 202200341 |
| SalahDin Ahmed Rezk | 202201079 |
Source: Term_Project_Dataset_20K.csv
- Size: 20,000+ samples
- Features: 40 input variables across 4 categories
- Target Variables:
final_score,final_grade,pass_fail
| Category | Examples |
|---|---|
| Demographic | Age, Gender, Parent Income, Sibling Count |
| Academic History | Previous GPA, High School Grade, Attendance Rate |
| Behavioral | Study Hours, Participation, Alcohol Consumption |
| Psychological | Stress Level, Motivation, Anxiety, Sleep Hours |
- Distribution analysis of grades
- Correlation matrices identifying relationships (e.g., Study Time vs. Score)
- Outlier detection and visualization
- Imputation: Handling missing values in numerical and categorical columns
- Encoding: One-Hot Encoding for nominal features (e.g., Gender)
- Balancing: SMOTE (Synthetic Minority Over-sampling Technique) for class imbalance
- Training multiple classical ML models
- Hyperparameter tuning via GridSearchCV/RandomizedSearchCV
- Cross-validation for robust performance estimation
- Regression Metrics: RMSE, MAE, RΒ² Score
- Classification Metrics: Accuracy, Precision, Recall, F1-Score
- Visualizations: Confusion Matrices, ROC Curves, Feature Importance
- Linear Regression
- Ridge & Lasso Regression
- Random Forest Regressor
- Support Vector Regressor (SVR)
- Gradient Boosting Regressor
- Logistic Regression
- Random Forest Classifier
- Gradient Boosting Classifier
- XGBoost Classifier
- Support Vector Machine (SVM)
- K-Nearest Neighbors (KNN)
- Python 3.8 or higher
- pip package manager
git clone https://github.com/aboalis/student-performance-prediction.git
cd student-performance-predictionpip install -r requirements.txtTo view the complete analysis and training process:
jupyter notebook notebooks/Final_Project.ipynbFor interactive predictions and visualizations:
streamlit run app.pyThe dashboard will open in your browser at http://localhost:8501
student-performance-prediction/
β
βββ data/
β βββ Term_Project_Dataset_20K.csv # Primary dataset
β
βββ notebooks/
β βββ Final_Project.ipynb # Main analysis & training notebook
β
βββ models/ # Saved trained models (generated)
β
βββ app.py # Streamlit dashboard application
βββ requirements.txt # Python dependencies
βββ README.md # Project documentation
βββ .gitignore # Git ignore file
- Previous GPA - Strongest predictor of academic success
- Study Hours per Week - Strong positive correlation with final scores
- Attendance Rate - Critical factor in pass/fail outcomes
- Sleep Hours - Significant impact on cognitive performance
- Best Regression Model: Random Forest Regressor
- Best Binary Classifier: XGBoost
- Best Multiclass Classifier: Gradient Boosting
- Non-linear relationships between features favor ensemble methods (Random Forest, XGBoost)
- SMOTE balancing significantly improved minority class predictions (F grades)
- Behavioral factors (study time, participation) outweigh demographic factors in importance
- Deep Learning models (Neural Networks) for comparison
- Feature engineering with polynomial features
- Real-time data integration with student information systems
- Mobile application deployment
- Explainable AI (SHAP values) for model interpretability
This project is licensed under the MIT License - see the LICENSE file for details.
Course: CIE 317/417 Machine Learning
Instructor: Dr. Ahmed Tolba
Institution: Zewail City of Science, Technology and Innovation
Tools & Libraries:
- Python, Scikit-Learn, XGBoost
- Pandas, NumPy, Matplotlib, Seaborn
- Streamlit, Jupyter Notebook
- Google Colab
Project Date: Fall 2024
Last Updated: January 2025