Law's Topics' distribution

Agenda

This project was built as part of Digital Humanities course at the BGU University with the assistance of the Ministry of Justice, to help investigate and identify topics distribution in laws over different periods in time.

Project Overview

The project was made with unsupervised machine learning algorithm (LDA - Latend Dirichlet Algorithm) which excels in distributing a corpus to topics and determine the amount of contribution each topic has for a specific document in the corpus. This, however, was not the obvious choice since the laws are already been classified in the 'National Legislative Database' (מאגר החקיקה הלאומי) and I could have used the calssification suggested there. But, letting an ML algorithm to classify on its own can potentially suggests insightful results that couldn't have been accepted otherwise. (i.e, classification that put all laws about 'religion' along with 'criminal justice' laws, might suggest that are some overlapping with how the government handle these manners, etc). I have used a method of 'remote reading' (קריאה מרחוק) and some other NLP algorithms (tagger, lemmatizer) for the purpose of composing the corpus.

The project was assembled of 3 main steps:

Pre-Processing: Parsing the akn dataset, lemmatizing and tagging (using YAP) each word in each sentence. Then, combining a huge list to be used as corpus for phase 2, the Topic Modeling.
Topic Modeling: Cleaning the corpus by pin-pointing the stopwords, trying various of options for words' grouping, choosing the optimal model and assigning a topic title for each group of words.
Visuazlie Data: Running the web application that shows the topics ditribution for a custom periods in time.

Project Structure

./
├── Step_2-Topic_Modeling
│   ├── Topic Modeling(LDA).ipynb
│   ├── output
│   │   ├── dominant_topic_df.csv
│   │   ├── optimal_lda_model.pk
│   │   ├── topic_distribution_df.csv
│   │   └── topics_divisions.csv
│   └── resources
│       ├── id2topic.txt
│       ├── laws_corpus_lemmatized.json
│       ├── laws_index.json
│       ├── laws_words_count.csv
│       └── stopwords.txt
├── Step_3-Visualize_Data
│   ├── api
│   │   ├── api.py
│   │   └── rules
│   │       └── rules.py
│   ├── package.json
│   ├── public
│   │   ├── index.html
│   │   ├── manifest.json
│   │   └── robots.txt
│   ├── src
│   │   ├── App.css
│   │   ├── App.js
│   │   ├── components
│   │   │   ├── NivoBarChart.js
│   │   │   ├── Sidebar.css
│   │   │   └── Sidebar.js
│   │   ├── index.css
│   │   └── index.js
│   └── yarn.lock
└── requirements.txt

9 directories, 25 files

Installation

Install python requirements:

pip install -r requirements.txt

Install the Web-App dependencies: navigate to Step_3-Visualize_Data directory, and run

yarn install

How-To Guide

Pre-Processing: This is already made for you. The pre-processed dataset is called laws_corpus_lemmatized.json
Topic Modeling: The Topic Modeling(LDA).ipynb notebook contains a very detailed instruction on how to perform the topic modeling, run the notebook and go through each step thoroughly (For further explanations you can check the README for part 2).
Visuazlie Data: For visualizing the data, navigate to Step_3-Visualize_Data and run the following commands, each in its own terminal and by this order:

yarn start-api

yarn start

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
Step_2-Topic_Modeling		Step_2-Topic_Modeling
Step_3-Visualize_Data		Step_3-Visualize_Data
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Law's Topics' distribution

Agenda

Project Overview

Project Structure

Installation

How-To Guide

About

Releases

Packages

Languages

lesagi/digital-humanities

Folders and files

Latest commit

History

Repository files navigation

Law's Topics' distribution

Agenda

Project Overview

Project Structure

Installation

How-To Guide

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages