Skip to content

AlirezaSaei1/NLP-Assignments

Repository files navigation

Natural Language Processing (NLP) Course Repository

Welcome to the NLP course (Winter 2024) repository.

Table of Contents

Overview

This repository is part of a Natural Language Processing (NLP) course and includes four assignments that cover various aspects of NLP, from text preprocessing to advanced model fine-tuning. Additionally, it contains course materials and a comprehensive source book to aid your learning journey.

Setup

To get started with the assignments and the provided resources, follow these steps:

  1. Clone this repository:
    git clone https://github.com/AlirezaSaei1/NLP-Assignments.git
    cd NLP-Assignments
  2. You are ready to go!

Assignments

Assignment 1

Title: Research and Preprocessing Steps

Description: This assignment includes a summary of the paper "Abstractive Summarization Guided by Latent Hierarchical Document Structure." Additionally, it covers preprocessing steps for English and Farsi, followed by a spell checker implementation.

Files:

  • Assignment1/Research/Research.pdf: Summary of the paper.
  • Assignment1/Codes/Preprocessing_Fa.py: Preprocessing steps for Farsi.
  • Assignment1/Codes/Preprocessing_Eng.py: Preprocessing steps for English.
  • Assignment1/Codes/SpellChecker.py: Spell checker code.

Assignment 2

Title: Autofill and POS Tagging

Description: This assignment involves creating an autofill feature using n-gram modeling on the Digikala comments dataset. It also includes a Part-of-Speech (POS) tagger.

Files:

  • Assignment2/Codes/AutoFiller.py: Autofill implementation using n-gram modeling.
  • Assignment2/Codes/POS_Tagging.py: POS tagger code.

Assignment 3

Title: Sentiment Analysis using RNNs

Description: This assignment focuses on sentiment analysis using Recurrent Neural Networks (RNNs). Both SimpleRNN and LSTM architectures are utilized to analyze sentiments in the given dataset.

Files:

  • Assignment3/Codes/Sentiment_Analysis.py: Sentiment analysis using SimpleRNN.

Assignment 4

Title: Fine-Tuning wav2vec 2.0

Description: This assignment involves fine-tuning the wav2vec 2.0 model (xlsr-53) using the Common Voice Mozilla Persian dataset.

Files:

  • Assignment4/Codes/ASR_fa_v1.py: Code for fine-tuning wav2vec 2.0. (v1)
  • Assignment4/Codes/ASR_fa_v2.py: Code for fine-tuning wav2vec 2.0. (v2 - main)

Additional Resources

Course Slides

The course slides provide a comprehensive overview of the topics covered in the course taught by Dr.Baradaran. They are available in the Course Slides directory.

Presentation

A presentation on LangChain and Retrieval-Augmented Generation (RAG) is included in the Presentation directory, containing the report, PowerPoint, and code. (with the help of DeepLearning.AI)

Source Folder

The Source folder contains Jurafsky's NLP book, a valuable resource for understanding the theoretical foundations of NLP.


Feel free to explore each assignment and utilize the additional resources to enhance your learning experience. If you have any questions or need further assistance, please reach out.

Happy Learning!


About

NLP Course Assignments and Projects

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published