Skip to content

Latest commit

 

History

History
97 lines (70 loc) · 7.02 KB

README.md

File metadata and controls

97 lines (70 loc) · 7.02 KB

Text Mining 2021/22

Binder

Amsterdam University College -- Text Mining -- Winter/Spring 2022.

Contents

You can use the Hello World notebooks to check that everything is working.

Week Topic Materials
1 Introduction and Python refresher slides + notebooks 1, 2, 3, 4, 5
2 Introduction to NLP and NLP pipelines slides + notebook
3 Language modelling slides + notebooks 1, 2
4 Vector space semantics slides + notebook
5 Word embeddings slides + notebook
6 Machine learning fundamentals slides + notebook
7 Text classification slides + notebook (Scikit-learn), notebook (PyTorch)
8 RNNs and NER slides + notebook
9 Recommender systems slides + notebook
10 Creating annotated corpora, Web scraping and APIs slides, notebook
11 Sentiment analysis slides + notebook
12 Clustering and topic modelling slides + notebook
13 XAI and Bias in Word Embeddings Selected contents from this course - slides
14 Fairness and Text Mining for Humanities slides

External materials

Neural networks

Tutorials

Group projects

See the projects folder for info.

Ongoing projects

Age prediction: https://github.com/d-hagen/TM_project
News headlines: https://github.com/norahahr/TMproject
Greek Mythology: https://github.com/RianneAr/Text_Mining_Project
Lyrics formation: https://github.com/Claudio-creis/TEXT-MINING-PROJECT
Movie recommender: https://github.com/XiaoxuanLu/Movie_recommender_system
Marvel vs DC sentiment analysis: https://github.com/TomR2022/Text-Mining-Group-Project
Sentiment analysis and stocks: https://github.com/joshbrook/TM-Project-2022
Loanwords and their sentiment: https://github.com/DanielFM0/TM-Group-Project

Project outcomes

Set-up

Running in the cloud (Google Colab)

  1. Fork the repository to your Github account: go to https://github.com/bloemj/AUC_TMCI_2022 and click Fork
  2. Get updates (from time to time): In your fork on the Github website, click "Fetch upstream"
  3. Launch notebooks by going to your Google Colab: https://colab.research.google.com/ and loading them using the "Open Notebook" window. Enter the GitHub URL of the fork of the course materials in your own GitHub account to be able to save your changes. Click "Open notebook in new tab" to run the notebook.
  4. To save your changes, choose "Save a copy in GitHub" and accept the suggested location. Note that just using "Save" will not work, and changes will not automatically save. This will also not work if you did not perform step 1 and are loading my version of the repository directly.

Running on your own system

  1. Clone the repository locally: git clone https://github.com/bloemj/AUC_TMCI_2022.git
  2. Get updates (from time to time): git pull
  3. Create a conda environemnt: conda create -n myenv python=3.7 anaconda (where myenv is the envirnoment name)
  4. Activate it: conda activate myenv
  5. Install packages (see the requirements.txt file), e.g. conda install pandas
  6. Launch a Jupyter notebook: jupyter notebook

Alternatively, use Binder (link above).

A more detailed guide to setup your environment, with multiple options.

Credits

  • Giovanni Colavizza, who ran the previous-year edition of this course.
  • Michael Repplinger, who ran the 2018/19 edition and Gianluca Lebani, who ran the 2017/18 edition.
  • Giovanni Colavizza and Matteo Romanello, Applied Data Analysis course for the Oxford Digitial Humanities Summer School DOI
  • James Hetherington and Giovanni Colavizza, Research Software Engineering with Python

License

Everything in this repository which is not already attributed to someone else is released under CC BY 4.0.