Synthetic Review Generation

Problem

A common challenge in AI tasks like intent detection, sentiment analysis, slot filling, or recommendation algorithms is the limited availability of datasets for model training. The objective of this project is to create synthetic datasets which can mimic realistic human-written text to aid in training, designing, and evaluating these AI systems.
Given a large data of real ecommerce reviews, we need to generate synthetic reviews.

Dataset

Amazon reviews dataset: https://amazon-reviews-2023.github.io/main.html For fine tuning the model a subset (“Supplements/Vitamins”) of the data has been used. Download link https://drive.google.com/file/d/1o9IvevRbxKagdE-Op1BJRl8iJUc-0kJ4/view?usp=sharing

Approach

Pre-trained GPT-2 117M has been fine-tuned on the real reviews.
• Number of epochs = 30, time taken: 2 hrs 10 mins (on NVIDIA Tesla P100 GPU)
• Few-shot Learning was also used. For every product, we take 2 sample data of the same product title from train data, then make use of it to perform 2-shot prompting.

Evaluation

• Using a classifier to check it it detects synthetic data or not. After generating the synthetic data, we mix it with real data and add label to both real and synthetic data (real = 0, synthetic=1). We train a BERT model to classify them. The results came out that BERT was able to classify them quite well. Thus the synthetic texts were identifiable.
• We check the term frequency distribution of both real data and synthetic data. If we check top 10 frequent terms, their distribution is quite similar, not that much though.

• We also check the semantic similarity. The mean semantic similarity score for real – synthetic is 0.1878. It suggests that synthetic reviews are somewhat similar to real reviews, yet not that much.

We need to work more on our synthetic data preparation strategy

Fine Tuned Model

Fine tuned GPT-2 117M model can be downloaded from the link below. https://drive.google.com/file/d/1aNz5GdHRR1fPKhs5pAHxQzqkQlYy88_u/view?usp=sharing

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
README.md		README.md
bert-review-classifier.ipynb		bert-review-classifier.ipynb
generate-amazon-reviews (1).ipynb		generate-amazon-reviews (1).ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Synthetic Review Generation

Problem

Dataset

Approach

Evaluation

Fine Tuned Model

About

Releases

Packages

Languages

abhroroy365/Synethic-Review-Generation

Folders and files

Latest commit

History

Repository files navigation

Synthetic Review Generation

Problem

Dataset

Approach

Evaluation

Fine Tuned Model

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages