CMPS 6730 Philosophy Natural Language Processing Project

Overview

This repository contains to Bobby Becker's project for CMPS 4730/6730 at Tulane University, which applies advanced NLP techniques to analyze and interpret major philosophical texts. The project utilizes Latent Dirichlet Allocation (LDA) and Word2Vec models to explore thematic connections within and between 43 works of philosophy, inlcuding philosophers such as Plato, Aristotle, Marx, Nietzsche, Kant, and others.

Project Artifacts

1. `final_project.ipynb`

This Jupyter notebook showcases the data analysis of the project. Here's what it contains:

Latent Dirichlet Allocation (LDA)

Text Preparation: Each of the 43 philosophical works is preprocessed to remove stopwords and other non-informative text elements. This clean text is then tokenized.
LDA Processing: The LDA model is applied to the tokenized text to extract key themes, each represented by four words. This thematic extraction helps in understanding the central topics discussed in each work.

Word2Vec

Vector Training: Post-LDA, a Word2Vec model is trained on the corpus to generate word vectors for the identified thematic words.
Vector Averaging: For each text, the vectors of its four theme words are averaged to create a single vector that represents the overall thematic essence of the text.

Principal Component Analysis (PCA)

Dimensionality Reduction: The high-dimensional vectors are reduced to 2D and 3D using PCA, which creates a visual representation of the vector similarities between the philosophical works. Philosphical works are shown compared to each other, compared to the word vectors representing each philosopher, and to word vectors representing the philosphical themes.

2. `flask.py` and `plato_matcher_online.py`

The web component uses Flask for an interactive interface that allows the user to make queries into Plato's texts. This can be viewed as a potential application of the research portion of this project.

Workflow

Text Segmentation: First, works of Plato are loaded in, tokenized, & segmented into 500 parts.
Latent Dirichlet Allocation (LDA) Then, using LDA, we analyze each portion of the text and generate 6 words to represent that passage.
Vector Representation: We then load in a Word2Vec model, trained on all 43 philosophical works used in the Jupyter notebook. The 6 words generated by each passage are averaged together to create a unique vector to represent each passage.
User Interaction: Users submit text through the web interface, which goes through the same process as the passages of Plato: 6 words are generated by LDA to represent the user input, and those 6 words are vectorized and averaged to create a vector representing the user's input.
Similarity Calculation: We then calculate cosine similarity between the user's vector and each passage's vector to find the best match.
Text Refinement and Citation: Once the most relevant passage is identified, a GPT-3.5 model is used to identify and rewrite the most important portion of the passage and provide a citation to the user.

Getting Started

Prerequisites

Python 3.8+
Flask
Gensim
NLTK
sklearn
OpenAI API key

Installation

bash: git clone https://github.com/yourusername/cmps-6730-nlp-project.git cd cmps-6730-nlp-project pip install -r requirements.txt

OpenAI Key

Put in your OpenAI key at the top of the 'plato_online_matcher.py' file.

Running the Application

python flask.py and navigate to http://127.0.0.1:5000/ in your web browser

Example Usage:

Question:

Answer:

Question:

Answer:

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
Milestone		Milestone
docs		docs
nlp		nlp
notebooks		notebooks
report		report
templates		templates
tests		tests
textfiles		textfiles
.editorconfig		.editorconfig
.gitignore		.gitignore
.gitmodules		.gitmodules
.travis.yml		.travis.yml
43texts_word2vec.model		43texts_word2vec.model
GettingStarted.md		GettingStarted.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
Makefile		Makefile
README.md		README.md
credentials.json		credentials.json
final_project.ipynb		final_project.ipynb
flash.py		flash.py
plato_matcher_online.py		plato_matcher_online.py
requirements.txt		requirements.txt
requirements_dev.txt		requirements_dev.txt
setup.cfg		setup.cfg
setup.py		setup.py
tox.ini		tox.ini
web.png		web.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CMPS 6730 Philosophy Natural Language Processing Project

Overview

Project Artifacts

1. `final_project.ipynb`

Latent Dirichlet Allocation (LDA)

Word2Vec

Principal Component Analysis (PCA)

2. `flask.py` and `plato_matcher_online.py`

Workflow

Getting Started

Prerequisites

Installation

OpenAI Key

Running the Application

Example Usage:

About

Releases

Packages

Contributors 2

Languages

License

tulane-cmps6730/project-philosophy

Folders and files

Latest commit

History

Repository files navigation

CMPS 6730 Philosophy Natural Language Processing Project

Overview

Project Artifacts

1. final_project.ipynb

Latent Dirichlet Allocation (LDA)

Word2Vec

Principal Component Analysis (PCA)

2. flask.py and plato_matcher_online.py

Workflow

Getting Started

Prerequisites

Installation

OpenAI Key

Running the Application

Example Usage:

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

1. `final_project.ipynb`

2. `flask.py` and `plato_matcher_online.py`

Packages