forked from aditikilledar/spark-stream
-
Notifications
You must be signed in to change notification settings - Fork 0
/
progress.txt
25 lines (22 loc) · 1.06 KB
/
progress.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
# Big Data Project, ML w/ Spark, 5th Semester
## !!! DUE 6th-7th December.
## Timeline
- [x] [8 Nov] Choosing a Dataset
- [x] [11 Nov] Figure out stream.py
- [x] [25 Nov] Size of Batch
- [x] [24 Nov] Decide Preprocessing Techniques
- [x] [26 Nov] Implement them ^
- [x] [4 Dec] Reasoning for Preprocessing Steps
- [x] [4 Dec] Decide which 3 Models
- [x] [4 Dec] Implement at least 3 classifiers on data
- [x] [6 Dec] Analyse each classifier, and figure out which classifier works best with particular type of data
- [x] [6 Dec] Hyperparameter Tuning
- [x] [6 Dec] Training Batch Size Tuning
- [x] [6 Dec] Predict Classification of Test Data Using Model
- [x] [6 Dec] Metrics to Evaluate Classifier
- [x] [6 Dec] Compute Performance of the Classifier using Metric chosen
- [x] [6 Dec] Clustering
- [ ] [6 Dec] Add Project Report to the repository
# Helpful Links:
- Streaming Programming Guide https://spark.apache.org/docs/latest/streaming-programming-guide.html
- Streaming over TCP https://stackoverflow.com/questions/33214988/spark-streaming-over-tcp