This is an algorithm that can help you generate your own custom conversation dataset with history to be used to train LLM/ChatBot
Explore the docs »
Report Bug
·
Request Feature
Table of Contents
Please follow the instructions to install and set up the environment for this project.
- Clone the repo
git clone https://github.com/Ryotess/ChatGen.git
- Move to the project
cd ./ChatGen
- Build environment
pip install -r requirements.txt
Here is the quick start demo base on sample dataset, if you wanna use this project on your own dataset, please read the instruction
# Import packages
from chatgen.chat_algo import ChatAlgo
from chatgen.data_loader import load_xlsx, create_input_data
# Set data path
input_file = "./dataset/sample_dataset.xlsx" # sample dataset
sheet_name = 'QuestionAskingMerge' # the sheet we would use
output_file = "./output/conversations.json" # output path(please remember to create an ./output directory)
# Load raw data & Create input data
data = load_xlsx(input_file, sheet_name)
input_data = create_input_data(data)
# Create conversation dataset
chat_algo = ChatAlgo(input_data) # initialization
chat_algo.create_chat_history() # generate
# save to JSON
chat_algo.to_json(output_file)
Here we provide a brief instruction of our algorithm design in this project and sample demo.
Please open the instruction and follow the steps to get a more comprehensive understand of this project.
Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.
If you have a suggestion that would make this better, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement". Don't forget to give the project a star! Thanks again!
- Fork the Project
- Create your Feature Branch (
git checkout -b feature/AmazingFeature
) - Commit your Changes (
git commit -m 'Add some AmazingFeature'
) - Push to the Branch (
git push origin feature/AmazingFeature
) - Open a Pull Request
Distributed under the MIT License. See LICENSE.txt
for more information.
E-Mail: [email protected]
Project Link: https://github.com/Ryotess/ChatGen