Skip to content

An Algorithm that can generate conversation history dataset for your own custom LLM/ChatBot finetuning

License

Notifications You must be signed in to change notification settings

Ryotess/ChatGen

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MIT License LinkedIn


Logo

ChatGen--An Algorithm to generate coversation history for training llm/chatbot

This is an algorithm that can help you generate your own custom conversation dataset with history to be used to train LLM/ChatBot
Explore the docs »

Report Bug · Request Feature

Table of Contents
  1. About The Project
  2. Getting Started
  3. Instruction
  4. Contributing
  5. License
  6. Contact

About The Project

(back to top)

Built With

Python

(back to top)

Getting Started

Please follow the instructions to install and set up the environment for this project.

Installation

  1. Clone the repo
    git clone https://github.com/Ryotess/ChatGen.git
  2. Move to the project
    cd ./ChatGen
  3. Build environment
    pip install -r requirements.txt

(back to top)

Quick Start

Here is the quick start demo base on sample dataset, if you wanna use this project on your own dataset, please read the instruction

# Import packages
from chatgen.chat_algo import ChatAlgo
from chatgen.data_loader import load_xlsx, create_input_data
# Set data path
input_file = "./dataset/sample_dataset.xlsx" # sample dataset
sheet_name = 'QuestionAskingMerge' # the sheet we would use
output_file = "./output/conversations.json" # output path(please remember to create an ./output directory)
# Load raw data & Create input data
data = load_xlsx(input_file, sheet_name)
input_data = create_input_data(data)
# Create conversation dataset
chat_algo = ChatAlgo(input_data) # initialization
chat_algo.create_chat_history() # generate
# save to JSON
chat_algo.to_json(output_file)

(back to top)

Instruction

Here we provide a brief instruction of our algorithm design in this project and sample demo.
Please open the instruction and follow the steps to get a more comprehensive understand of this project.

(back to top)

Contributing

Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.

If you have a suggestion that would make this better, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement". Don't forget to give the project a star! Thanks again!

  1. Fork the Project
  2. Create your Feature Branch (git checkout -b feature/AmazingFeature)
  3. Commit your Changes (git commit -m 'Add some AmazingFeature')
  4. Push to the Branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

(back to top)

License

Distributed under the MIT License. See LICENSE.txt for more information.

(back to top)

Contact

E-Mail: [email protected]

Project Link: https://github.com/Ryotess/ChatGen

(back to top)

About

An Algorithm that can generate conversation history dataset for your own custom LLM/ChatBot finetuning

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published