This is a web application that performs text analysis using a language model. It calculates the perplexity and burstiness score of the input text and determines if it is likely generated by a language model. It also plots the most common words and repeated words in the text.
Before running the code, make sure to install the necessary dependencies: pip install -r requirements.txt
- Run the application using the following command: python app.py
- Enter the text you want to analyze in the text area provided.
- Click the "Analyze" button to perform the analysis.
- The application will display the perplexity, burstiness score, and text analysis result.
The code uses the following libraries:
- NLTK: For text preprocessing, n-grams, and frequency distribution.
- Matplotlib: For generating the plots.
- The preprocess_text function tokenizes the input text, removes stopwords and punctuation, and converts the tokens to lowercase.
- The plot_most_common_words function calculates the frequency distribution of words and plots the 10 most common words.
- The plot_repeated_words function identifies the words that appear more than once in the text and plots their frequencies.
- The calculate_perplexity function calculates the perplexity of the text using an n-gram language model.
- The calculate_burstiness function calculates the burstiness score of the text.
- The is_generated_text function checks if the perplexity and burstiness score indicate that the text is likely generated by a language model.
- The main function creates the Streamlit application interface, handles the button click event, and performs the text analysis.