generated from tulane-cmps6730/sample-project
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
3936f35
commit 7741544
Showing
12 changed files
with
134 additions
and
89 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,8 @@ | ||
--- | ||
layout: slide | ||
title: "NLP Project" | ||
title: "Using Natural Language Processing to Identify Unfair Clauses in Terms and Conditions Documents" | ||
--- | ||
|
||
Use the right arrow to begin! | ||
**Authors:** Jonathan Sears, Nick Radwin | ||
**Institution:** Tulane University | ||
**Emails:** [email protected], [email protected] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,19 +1,9 @@ | ||
--- | ||
layout: slide | ||
title: "Equations and Tables" | ||
title: "Introduction" | ||
--- | ||
|
||
|
||
Here is an inline equation: $\sum_{i=1}^n i = ?$ | ||
## Introduction | ||
|
||
And a block one: | ||
|
||
$$e = mc^2$$ | ||
|
||
|
||
Here is a table: | ||
|
||
| header 1 | header 2 | | ||
|----------|----------| | ||
| value 1 | value 2 | | ||
| value 3 | value 4 | | ||
Despite their ubiquity, terms and conditions are seldom read by users, leading to widespread ignorance about potentially exploitative or unfair clauses. Our project aims to bring these hidden clauses to light by using a sentence level text classifier that labels clauses as either exploitative (1) or non exploitative(0). We based these labels off of categories as outlined in a prior paper we will discuss shortly. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,13 +1,9 @@ | ||
--- | ||
layout: slide | ||
title: "Images" | ||
title: "Related Work" | ||
--- | ||
Our experiments are Primarily based off of **CLAUDETTE** a research project conducted at Stanford in 2018. | ||
|
||
They ultimately used an ensemble method, combining SVMs with LSTMs ,and CNNs, to achieve accuracy and f1-scores above .8. This was our target for this project. | ||
|
||
Two ways to add an image. | ||
|
||
Note that the image is in the assets/img folder. | ||
|
||
<img src="{{ site.baseurl }}/assets/img/tulane.png" width="50%"> | ||
|
||
![tulane](assets/img/tulane.png) | ||
![claudette](assets/img/claudette.png) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
--- | ||
layout: slide | ||
title: "Approach" | ||
--- | ||
|
||
We employed multiple machine learning approaches to address the challenge of identifying unfair clauses: | ||
- **BERT models:** Utilized for their deep contextual representations. | ||
- **Bag of Words (BoW):** Simplified text representation focusing on term frequencies. | ||
- **Support Vector Machine (SVM):** Tested for its capability to establish a clear decision boundary. | ||
- **Convolutional Neural Network (CNN):** Explored for its pattern recognition capabilities within text data. | ||
- **Gradient Boosting Machine (GBM):** Chosen for its robustness and iterative improvement on classification tasks. | ||
- **Hybrid BERT/BoW model:** An attempt to combine the strengths of BERT and BoW models. |
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
--- | ||
layout: slide | ||
title: "Dataset and Metrics" | ||
--- | ||
- **Dataset:** Consisted of 100 labeled terms and conditions documents, each sentence categorized as either fair or one of nine subcategories of unfair. | ||
- **Binary Classification:** Simplified from multiple to two classes (fair and unfair) to address the dataset's imbalance (92% unfair). | ||
- **Evaluation Metrics:** Precision, recall, and F1 score, with models trained on an evenly distributed sample for fairness in performance evaluation. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
--- | ||
layout: slide | ||
title: "Experiments" | ||
--- | ||
We originally experimented with the more complex BERT representation of the text. The thinking behind this was that the BERT encodings would be able to capture a better understanding of the text both semantically and contextually. We experimented with many different methods of fine tuning BERT, attempting to fine tune a single classifier layer on to of pooled | ||
However we were unable to produce results near that of claudette, with our best variants of the fine tuned BERT model unable to crack an f1-score of .6 |
Empty file.
Empty file.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Oops, something went wrong.