we built a complete end-to-end Network Intrusion Detection System using the CIC-IDS-2017 dataset from the Canadian Institute for Cybersecurity, focusing on detecting malicious network traffic in large-scale, real-world data.
What this project covers:
• Processing and engineering 2.8M+ network flow records
• Advanced data cleaning: duplicate removal, redundant feature elimination, handling missing and infinite values
• Attack label engineering (grouping 15+ attack types into meaningful classes)
• Handling severe class imbalance using downsampling and cost-sensitive learning
Training and benchmarking multiple models:
• Naive Bayes
• Logistic Regression
• KNN
• SVM
• Random Forest
• Feedforward Neural Networks (with and without class weighting)
• Deep evaluation using Accuracy, Precision, Recall, F1-score, Confusion Matrices, Learning Curves
• Hardware-aware comparison (training time, memory usage, CPU utilization)
Key outcome:
Random Forest achieved the best overall performance (~99% accuracy and F1-score), showing strong generalization with moderate computational cost, making it suitable for real-world IDS deployment.