LLM Fine-Tuning API

This repository contains scripts for scraping data and performing preprocessing tasks necessary for fine-tuning large language models (LLMs). It is designed to provide a flexible and scalable foundation for future development of an API for fine-tuning LLMs.

Features

Data Scraping: Scripts to scrape data from various sources defined in a configuration file. Data Preprocessing: Functions to clean and preprocess scraped data. Extensible Design: Modular structure to easily add more functionalities in the future.

Project Structure

.github
- workflows
  - workflow.yml
scripts
- api
- automation
- data_annotation
- data_collection
  - create_raw_database.py
  - facebook_scrapper.py
  - scrapper.py
  - utils.py
- data_preprocessing
  - clean_text.py
  - create_preprocessed_database.py
  - database_operations.py
  - preprocess.py
- create_database.py
- schema.sql
.gitignore
news_sources.json
requirements.txt

Installation

To get started with this project, clone the repository and install the required dependencies.

git clone https://github.com/10Academy-chortB/LLM_fine_tunning_API.git
cd LLM_fine_tunning_API
pip install -r requirements.txt

Ensure you have the necessary environment setup, including Python 3.8+ and any other dependencies specified in the requirements.txt file.

Usage

Configure Data Sources: Update the config_json.json file with the information about the websites and sources you want to scrape. Below is an example configuration:

Run Data Scraping: Execute the scrape.py script to scrape data from the configured sources.

python scripts/data_collection/scrape.py

Run Data Preprocessing: Execute the preprocess.py script to preprocess the scraped data.

python scripts/data_preprocessing/preprocess.py

Contributing

We welcome contributions to this project. To contribute:

Fork the repository.

Create a new branch for your feature or bugfix. Make your changes and commit them with descriptive messages. Push your changes to your fork. Open a pull request to the main repository. Please ensure your code adheres to the project's coding standards and includes relevant tests.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 87 Commits
.github/workflows		.github/workflows
airflow/dags		airflow/dags
data		data
frontend		frontend
logs/scheduler		logs/scheduler
notebooks		notebooks
screenshots		screenshots
scripts		scripts
.DS_Store		.DS_Store
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py
sources.json		sources.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLM Fine-Tuning API

Table of Contents

Features

Project Structure

Installation

Usage

Contributing

Fork the repository.

License

About

Releases

Packages

Languages

License

mistir-nigusse/LLM_fine_tunning_API

Folders and files

Latest commit

History

Repository files navigation

LLM Fine-Tuning API

Table of Contents

Features

Project Structure

Installation

Usage

Contributing

Fork the repository.

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages