Initial commit

tulane-cmps6730 · Mar 7, 2024 · fb0d32d · fb0d32d
commit fb0d32d
Show file tree

Hide file tree

Showing 157 changed files with 33,166 additions and 0 deletions.
diff --git a/.editorconfig b/.editorconfig
@@ -0,0 +1,21 @@
+# http://editorconfig.org
+
+root = true
+
+[*]
+indent_style = space
+indent_size = 4
+trim_trailing_whitespace = true
+insert_final_newline = true
+charset = utf-8
+end_of_line = lf
+
+[*.bat]
+indent_style = tab
+end_of_line = crlf
+
+[LICENSE]
+insert_final_newline = false
+
+[Makefile]
+indent_style = tab
diff --git a/.gitignore b/.gitignore
@@ -0,0 +1,108 @@
+# latex
+*.aux
+*.bbl
+*.blg
+*.out
+
+# Byte-compiled / optimized / DLL files
+__pycache__/
+*.py[cod]
+*$py.class
+
+# C extensions
+*.so
+
+# Distribution / packaging
+.Python
+env/
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+wheels/
+*.egg-info/
+.installed.cfg
+*.egg
+
+# PyInstaller
+#  Usually these files are written by a python script from a template
+#  before PyInstaller builds the exe, so as to inject date/other infos into it.
+*.manifest
+*.spec
+
+# Installer logs
+pip-log.txt
+pip-delete-this-directory.txt
+
+# Unit test / coverage reports
+htmlcov/
+.tox/
+.coverage
+.coverage.*
+.cache
+nosetests.xml
+coverage.xml
+*.cover
+.hypothesis/
+.pytest_cache/
+
+# Translations
+*.mo
+*.pot
+
+# Django stuff:
+*.log
+local_settings.py
+
+# Flask stuff:
+instance/
+.webassets-cache
+
+# Scrapy stuff:
+.scrapy
+
+# Sphinx documentation
+docs/_build/
+
+# PyBuilder
+target/
+
+# Jupyter Notebook
+.ipynb_checkpoints
+
+# pyenv
+.python-version
+
+# celery beat schedule file
+celerybeat-schedule
+
+# SageMath parsed files
+*.sage.py
+
+# dotenv
+.env
+
+# virtualenv
+.venv
+venv/
+ENV/
+
+# Spyder project settings
+.spyderproject
+.spyproject
+
+# Rope project settings
+.ropeproject
+
+# mkdocs documentation
+/site
+
+# mypy
+.mypy_cache/
diff --git a/.gitmodules b/.gitmodules
@@ -0,0 +1,3 @@
+[submodule "docs/reveal.js"]
+	path = docs/reveal.js
+	url = https://github.com/hakimel/reveal.js
diff --git a/.travis.yml b/.travis.yml
@@ -0,0 +1,16 @@
+# Config file for automatic testing at travis-ci.org
+
+language: python
+python:
+  - 3.6
+  - 3.5
+  - 3.4
+  - 2.7
+
+# Command to install dependencies, e.g. pip install -r requirements.txt --use-mirrors
+install: pip install -U tox-travis
+
+# Command to run tests, e.g. python setup.py test
+script: tox
+
+
diff --git a/GettingStarted.md b/GettingStarted.md
@@ -0,0 +1,137 @@
+# Getting Started
+
+This document aims to help you to familiarize yourself with the project starter code so you can adapt it to your project.
+
+You should have been assigned a project repository for your work. In the examples below, we'll assume the repository is called <https://github.com/tulane-cmps6730/sample-project>. This is where all your code will live. 
+
+## Installation
+
+Your repository has been setup with a lot of starter code so you can up and running quickly. To use it, do the following:
+
+1. Make sure you've completed all the course **Background Resources** listed on the [README](https://github.com/tulane-cmps6730/sample-project/blob/main/README.md).
+2. Clone your repo:  `git clone https://github.com/nlp/sample-project` [use your project's repository name]
+3. Start a [virtual environment](https://virtualenv.pypa.io/en/stable/).
+  - First, make sure you have virtual env installed. `pip install virtualenv`
+  - Next, outside of the team repository, create a new virtual environment folder by `virtualenv nlp-virtual`. 
+  - Activate your virtual environment by `source nlp-virtual/bin/activate`
+  - Now, when you install python software, it will be saved in your `nlp-virtual` folder, so it won't conflict with the rest of your system.
+4. Install your project code by
+```
+cd sample-project   # enter your project repository folder
+pip install -r requirements.txt
+python setup.py develop # install the code. 
+```
+
+This may take a while, as all dependencies listed in the `requirements.txt` file will also be installed. By using the `develop` command (instead of `install`), any changes you make to your code will automatically be reflected without having to reinstall anything.
+
+**Windows users**: if you're having troubles, try reading [this](http://timmyreilly.azurewebsites.net/python-flask-windows-development-environment-setup/). It looks like you will need to:
+- install `pip install virtualenvwrapper-win`
+- instead of `virtualenv nlp-virtual` above, do `mkvirtualenv nlp-virtual`
+- other students have also had luck starting environments with the command `py -3 -m venv env env\scripts\activate`
+
+5. If everything worked properly, you should now be able to run your project's command-line tool by typing:  
+```
+nlp --help
+```
+which should print
+```
+Usage: nlp [OPTIONS] COMMAND [ARGS]...
+
+  Console script for nlp.
+
+Options:
+  --help  Show this message and exit.
+
+Commands:
+  dl-data  Download training/testing data.
+  stats    Read the data files and print interesting statistics.
+  train    Train a classifier and save it.
+  web      Launch the flask web app
+```
+
+## Running the sample project
+
+The sample project trains a very simple classifier to predict if a news headline comes from a liberal or conservative news source. Your project should be much more involved and interesting! This example is just to demonstrate the key steps required in the project. To run it:
+
+1. `nlp dl-data`: This downloads the training data from Dropbox and saves it to `~/.nlp/headlines.csv`. The file looks like this:
+```
+partisan,title
+1,"Democrats Still Can’t Accept Results Of The 2016 Election, Says John Davidson"
+0,How Conservative Media Outlets Are Covering Trump’s Impeachment
+1,Donald Trump at March for Life: ‘Every Child Is a Sacred Gift from God'
+```
+Here, `partisan` is the class label (1: conservative, 0: liberal), and `title` is the headline. 
+
+2. `nlp stats`: This computes some simple stats over the data
+```
+48187 rows
+label counts:
+1    27660
+0    20527
+Name: partisan, dtype: int64
+```
+
+3. `nlp train`: This trains a classifier, reports cross-validation accuracy, and saves the classifier to `~/.nlp/clf.pkl`
+
+```
+              precision    recall  f1-score   support
+
+           0       0.67      0.72      0.70     20527
+           1       0.78      0.74      0.76     27660
+
+    accuracy                           0.73     48187
+   macro avg       0.73      0.73      0.73     48187
+weighted avg       0.74      0.73      0.73     48187
+
+top coef for conservative
+               nolte  3.47
+             shapiro  3.07
+              pollak  3.01
+                8217  2.94
+           flashback  2.80
+  illegal immigrants  2.67
+              klavan  2.54
+                curl  2.37
+          fact check  2.30
+                 fnc  2.30
+
+
+top coef for liberal
+           explained  -4.04
+           headlines  -2.04
+                 x27  -2.02
+             staying  -1.99
+              savior  -1.95
+       controversial  -1.92
+        biden denies  -1.88
+  conservative media  -1.88
+         schiff says  -1.87
+           announced  -1.82
+```
+
+4. `nlp web`: This launches a Flask web server to demo the classifier. 
+
+```
+read clf LogisticRegression(C=1, class_weight='balanced', max_iter=1000)
+read vec CountVectorizer(binary=True, min_df=5, ngram_range=(1, 3), stop_words='english')
+ * Serving Flask app "nlp.app" (lazy loading)
+ * Environment: production
+   WARNING: This is a development server. Do not use it in a production deployment.
+   Use a production WSGI server instead.
+ * Debug mode: on
+ * Running on http://0.0.0.0:5000/ (Press CTRL+C to quit)
+ * Restarting with stat
+read clf LogisticRegression(C=1, class_weight='balanced', max_iter=1000)
+read vec CountVectorizer(binary=True, min_df=5, ngram_range=(1, 3), stop_words='english')
+ * Debugger is active!
+ * Debugger PIN: 128-371-422
+```
+
+If you open your web browser and go to `http://0.0.0.0:5000/` you should see something like:
+
+![web.png](web.png)
+
+
+
+**Tips:**
+- Some web browsers will cache the page, which will sometimes make it hard to see the updates you make. You may have to force a refresh that ignores the cache (e.g. see [here for Chrome](https://superuser.com/questions/89809/how-to-force-refresh-without-cache-in-google-chrome)).
diff --git a/LICENSE b/LICENSE
@@ -0,0 +1,22 @@
+MIT License
+
+Copyright (c) 2019, A Student
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
+
diff --git a/MANIFEST.in b/MANIFEST.in
@@ -0,0 +1,12 @@
+include CONTRIBUTING.rst
+include HISTORY.rst
+include LICENSE
+include README.rst
+include nlp/app/templates/*.html
+include nlp/app/static/*
+
+recursive-include tests *
+recursive-exclude * __pycache__
+recursive-exclude * *.py[co]
+
+recursive-include docs *.rst conf.py Makefile make.bat *.jpg *.png *.gif *.html *.css
diff --git a/Makefile b/Makefile
@@ -0,0 +1,69 @@
+.PHONY: clean clean-test clean-pyc clean-build help
+.DEFAULT_GOAL := help
+
+define BROWSER_PYSCRIPT
+import os, webbrowser, sys
+
+try:
+	from urllib import pathname2url
+except:
+	from urllib.request import pathname2url
+
+webbrowser.open("file://" + pathname2url(os.path.abspath(sys.argv[1])))
+endef
+export BROWSER_PYSCRIPT
+
+define PRINT_HELP_PYSCRIPT
+import re, sys
+
+for line in sys.stdin:
+	match = re.match(r'^([a-zA-Z_-]+):.*?## (.*)$$', line)
+	if match:
+		target, help = match.groups()
+		print("%-20s %s" % (target, help))
+endef
+export PRINT_HELP_PYSCRIPT
+
+BROWSER := python -c "$$BROWSER_PYSCRIPT"
+
+help:
+	@python -c "$$PRINT_HELP_PYSCRIPT" < $(MAKEFILE_LIST)
+
+clean: clean-build clean-pyc clean-test ## remove all build, test, coverage and Python artifacts
+
+clean-build: ## remove build artifacts
+	rm -fr build/
+	rm -fr dist/
+	rm -fr .eggs/
+	find . -name '*.egg-info' -exec rm -fr {} +
+	find . -name '*.egg' -exec rm -f {} +
+
+clean-pyc: ## remove Python file artifacts
+	find . -name '*.pyc' -exec rm -f {} +
+	find . -name '*.pyo' -exec rm -f {} +
+	find . -name '*~' -exec rm -f {} +
+	find . -name '__pycache__' -exec rm -fr {} +
+
+clean-test: ## remove test and coverage artifacts
+	rm -fr .tox/
+	rm -f .coverage
+	rm -fr htmlcov/
+	rm -fr .pytest_cache
+
+lint: ## check style with flake8
+	flake8 nlp tests
+
+test: ## run tests quickly with the default Python
+	python setup.py test
+
+test-all: ## run tests on every Python version with tox
+	tox
+
+coverage: ## check code coverage quickly with the default Python
+	coverage run --source nlp setup.py test
+	coverage report -m
+	coverage html
+	$(BROWSER) htmlcov/index.html
+
+install: clean ## install the package to the active Python's site-packages
+	python setup.py install