week2

History

Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md

README.md

Week 2

Table of Contents

Day 1
Day 2
Day 3

The goals of this week are to:

understand the overall workflow of a machine learning project
to use scikit learn to implement a supervised classifier for your project
evaluate your approach on your labeled dataset

Day 1

Today we will extract some features from our data and perform an initial classification experiment.

See the starter notebook: https://github.com/tapilab/elevate-osna-starter/tree/master/notebooks/W2L1.ipynb

Day 2

Continue working on your notebook from the last lab. Do the following:

Use CountVectorizer to create a matrix of all terms. Experiment with the following to see the affect on accuracy:
- min_df: [1,2,5,10]
- max_df: [1, .95, .8]
- ngram_range: [(1,1), (1,2), (1,3)]
Experiment with different regularization for LogisticRegression
- C: [.1, 1, 5, 10]
- penalty: [l1, l2]
Summarize your results with a table for each setting, like this:

C	Accuracy
.1	xxx
1	xxx

Vary one parameter at a time, while using the defaults for the rest.Let the defaults be (min_df=2, max_df=1, ngram_range=(1,1), C=1, penalty=l2).

Day 3

Today we will (1) determine the best version of the classifier that we can find, (2) fit the classifier, (3) load it in the web app, (4) classify the tweets.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Files

week2

week2

README.md

Week 2

Day 1

Day 2

Day 3

Files

week2

Directory actions

More options

Directory actions

More options

Latest commit

History

week2

Folders and files

parent directory

README.md

Week 2

Day 1

Day 2

Day 3