🎓 LibLearner

Train your own T5 model on any codebase with powerful code extraction and processing tools.

🎯 Purpose

LibLearner is a comprehensive toolkit designed to help you create custom T5 models trained on specific codebases. It extracts, processes, and prepares code from various sources, making it easy to train specialized code understanding and generation models.

✨ Key Features

🔍 Universal Code Extraction: Process any codebase or GitHub repository
📊 Multi-Format Support: Handle Python, JavaScript, Jupyter notebooks, and more
🤖 T5 Training Ready: Structured output perfect for fine-tuning T5 models
🔄 Batch Processing: Process entire repositories or directories at once
📈 Rich Analysis: Extract functions, classes, docstrings, and more
🌐 GitHub Integration: Direct processing of GitHub repositories

🚀 Quick Start

# Install LibLearner
pip install -e .

# Process a GitHub repository
process_files https://github.com/user/repo -o ./training_data

# Process local files
process_files ./my_codebase -o ./training_data

# Extract with detailed logging
process_files -v ./src --ignore-dirs tests,docs

🛠️ Core Tools

1. File Processor (`process_files`)

Our main utility for code extraction and analysis:

process_files [-h] [-o OUTPUT] [--ignore-dirs [DIRS...]] [-v] [--temp-dir DIR] input_paths...

Features:

🔄 Batch processing of files and directories
🌐 Direct GitHub repository processing
📊 CSV output for structured data
⚙️ Configurable processing options
📝 Comprehensive error reporting

2. Function Extractor (`extract_functions`)

Specialized tool for function-level extraction:

extract_functions path/to/code -o output_dir

3. Extension Scout (`scout_extensions`)

Analyze file types in your codebase:

scout_extensions path/to/directory --sort

📝 Supported File Types

Type	Extension	Features
Python	`.py`	Functions, classes, type hints, docstrings
JavaScript	`.js`	Functions, classes, JSDoc, ES6+ syntax
Jupyter	`.ipynb`	Code cells, markdown, outputs
Markdown	`.md`	Headers, code blocks, documentation
YAML	`.yml/.yaml`	Configurations, schemas
JSON	`.json`	Data structures, configs
Shell	`.sh`	Scripts, commands

🔧 Processors

LibLearner includes specialized processors for each file type:

Current Processors

✅ Python Processor: Full language feature support
✅ JavaScript Processor: Modern JS/ES6+ analysis
✅ Jupyter Processor: Notebook analysis
✅ Markdown Processor: Documentation parsing
✅ YAML Processor: Configuration analysis
✅ MDX Processor: JSX in Markdown support

Coming Soon

🚧 RST Processor: Documentation processing
📋 JSONL Processor: Streaming data handling
📋 TypeScript Processor: Static typing support

🎯 T5 Training Pipeline

Extract Code

process_files your/codebase -o training_data

Prepare Dataset

prepare_t5_dataset training_data -o t5_ready

Train Model

train_t5_model t5_ready -o trained_model

📚 Documentation

For detailed documentation, visit our documentation site.

🤝 Contributing

We welcome contributions! See our Contributing Guide for details.

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 88 Commits
liblearner		liblearner
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🎓 LibLearner

🎯 Purpose

✨ Key Features

🚀 Quick Start

🛠️ Core Tools

1. File Processor (`process_files`)

2. Function Extractor (`extract_functions`)

3. Extension Scout (`scout_extensions`)

📝 Supported File Types

🔧 Processors

Current Processors

Coming Soon

🎯 T5 Training Pipeline

📚 Documentation

🤝 Contributing

📄 License

About

Releases

Packages

Languages

License

CodeBlackwell/LibLearner

Folders and files

Latest commit

History

Repository files navigation

🎓 LibLearner

🎯 Purpose

✨ Key Features

🚀 Quick Start

🛠️ Core Tools

1. File Processor (process_files)

2. Function Extractor (extract_functions)

3. Extension Scout (scout_extensions)

📝 Supported File Types

🔧 Processors

Current Processors

Coming Soon

🎯 T5 Training Pipeline

📚 Documentation

🤝 Contributing

📄 License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

1. File Processor (`process_files`)

2. Function Extractor (`extract_functions`)

3. Extension Scout (`scout_extensions`)

Packages