Document Tag Generator

Introduction

✨ Problem

The project website of the Department of Computer Engineering currently has nearly 150 projects. These projects are categorized by only batches and subjectwise. Also, some of the projects have some tags but some of them are not relevant to those projects. Also, some projects do not have any tags. Currently, users can search projects by keywords, but those keywords are derived only from project descriptions.

Project Website

✨ Our goal

Our goal is to generate relevant tags for each project according to the description of the projects and other valid data available on the project pages.

✨ Solution

Our plan is to build an ML model to generate relevant tags. The data needed to implement the ML model is retrieved from the project pages and the project repositories. To get all details (link to the repositories and project pages + other details) of the project pages and repositories API of the website is used. To get the data from the project pages a scraping tool will be used.

Local Installation

The site is built by Jekyll Builder and hosted on GitHub pages.

Fork the repository and clone that into your local machine.
Follow the build instruction to install the necessary dependencies to run the Jekyll builder in your local machine ##Build Instruction

gem install just-the-docs
gem install jekyll-sitemap
bundle exec just-the-docs rake search:init jekyll-sitemap

Note: - If you face any dependency/version issue follow the instruction in this link to downgrade/upgrade the versions

current version is 2.7.1

rbenv install 2.7.1
rbnev global 2.7.1

For the API install this additional python packages

pip install requests
cd ./python_scripts/
python3 stat_script.py

Architecture

___

In department projects website, frontend, and backend are already implemented. According to the current implementation, users can search projects using tags. But the tagging was done using a simple algorithm such that it checks whether the project description contains the searching tag. Our goal is to implement a machine learning model, which can do tagging in a much better way.

In order to train the machine learning model we need a data set that contains the details of the projects. We hope to use project descriptions, project repositories, and project pages to generate the data set. By using this dataset, we have to train a good ML model, which can tag projects in the department website in a better way.

After implementing the ML model, we have to integrate it with the backend of the department website. Then we need to run the ML model to generate tags and those tags should be stored in a json file inside the backend repository.

Backend of the department project website can be accessed by a API. It contains a end point to access that json file which contains all the generated tags and their corresponding tags. When a user search a project using tags, by using tags file, relevant projects will be shown to the user.

When new project is added to the department project website, we need to run the ML model again and update the tags file. GitHub actions can be used for that.

Since project pages and project repositories are update regularly, we hope to run the ML model weekly.

Name		Name	Last commit message	Last commit date
Latest commit History 51 Commits
Architecture		Architecture
ML Model		ML Model
Manually Labeled Tags		Manually Labeled Tags
Presentations		Presentations
Project Resources		Project Resources
Sprint Reviews/Sprint 1		Sprint Reviews/Sprint 1
UML Diagram		UML Diagram
docs		docs
README.md		README.md
Web_Scaping_tools_Analyze_Report.pdf		Web_Scaping_tools_Analyze_Report.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Document Tag Generator

Table of Contents

Introduction

✨ Problem

✨ Our goal

✨ Solution

Local Installation

Architecture

Sample Tag Cloud

✨Our Team

E/17/100 - Gunathilaka R.M.S.M

E/17/246 - Perera K.S.D

E/17/284 - Rathnayaka R.L.D.A.S

Links

About

Releases

Packages

Languages

ShenalPerera/e17-co328-Document-Tag-Generator

Folders and files

Latest commit

History

Repository files navigation

Document Tag Generator

Table of Contents

Introduction

✨ Problem

✨ Our goal

✨ Solution

Local Installation

Architecture

Sample Tag Cloud

✨Our Team

E/17/100 - Gunathilaka R.M.S.M

E/17/246 - Perera K.S.D

E/17/284 - Rathnayaka R.L.D.A.S

Links

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages