Skip to content

KiranPrasath-26/picturebook.ai

Repository files navigation

picturebook.ai

Todo

  1. Extract Data
  2. Train GPT2
  3. Build an API for GPT2 and Diffusers

Process involved in this:

Data Extraction

As for the dataset, we use the following websites:

  1. for English, extracted the data from the Gutenberg Website.
    • Used the dataset by mateibejan to extract the txt files.
    • We took up a subset of the books listed in the dataset.
  2. For Tamil, extracted the data from Siruvarmalar and the Oscar/unshuffled_deduplicated_ta dataset for adding more to the corpus and pretraining.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages