Welcome to the NLP course (Winter 2024) repository.
This repository is part of a Natural Language Processing (NLP) course and includes four assignments that cover various aspects of NLP, from text preprocessing to advanced model fine-tuning. Additionally, it contains course materials and a comprehensive source book to aid your learning journey.
To get started with the assignments and the provided resources, follow these steps:
- Clone this repository:
git clone https://github.com/AlirezaSaei1/NLP-Assignments.git cd NLP-Assignments
- You are ready to go!
Title: Research and Preprocessing Steps
Description: This assignment includes a summary of the paper "Abstractive Summarization Guided by Latent Hierarchical Document Structure." Additionally, it covers preprocessing steps for English and Farsi, followed by a spell checker implementation.
Files:
Assignment1/Research/Research.pdf
: Summary of the paper.Assignment1/Codes/Preprocessing_Fa.py
: Preprocessing steps for Farsi.Assignment1/Codes/Preprocessing_Eng.py
: Preprocessing steps for English.Assignment1/Codes/SpellChecker.py
: Spell checker code.
Title: Autofill and POS Tagging
Description: This assignment involves creating an autofill feature using n-gram modeling on the Digikala comments dataset. It also includes a Part-of-Speech (POS) tagger.
Files:
Assignment2/Codes/AutoFiller.py
: Autofill implementation using n-gram modeling.Assignment2/Codes/POS_Tagging.py
: POS tagger code.
Title: Sentiment Analysis using RNNs
Description: This assignment focuses on sentiment analysis using Recurrent Neural Networks (RNNs). Both SimpleRNN and LSTM architectures are utilized to analyze sentiments in the given dataset.
Files:
Assignment3/Codes/Sentiment_Analysis.py
: Sentiment analysis using SimpleRNN.
Title: Fine-Tuning wav2vec 2.0
Description: This assignment involves fine-tuning the wav2vec 2.0 model (xlsr-53) using the Common Voice Mozilla Persian dataset.
Files:
Assignment4/Codes/ASR_fa_v1.py
: Code for fine-tuning wav2vec 2.0. (v1)Assignment4/Codes/ASR_fa_v2.py
: Code for fine-tuning wav2vec 2.0. (v2 - main)
The course slides provide a comprehensive overview of the topics covered in the course taught by Dr.Baradaran. They are available in the Course Slides
directory.
A presentation on LangChain and Retrieval-Augmented Generation (RAG) is included in the Presentation
directory, containing the report, PowerPoint, and code. (with the help of DeepLearning.AI)
The Source
folder contains Jurafsky's NLP book, a valuable resource for understanding the theoretical foundations of NLP.
Feel free to explore each assignment and utilize the additional resources to enhance your learning experience. If you have any questions or need further assistance, please reach out.
Happy Learning!