GitHub - sciknoworg/OntoAligner: OntoAligner: A Python Toolkit for Ontology Alignment https://pypi.org/project/OntoAligner/

OntoAligner: A Comprehensive Modular and Robust Python Toolkit for Ontology Alignment

OntoAligner is a Python library designed to simplify ontology alignment and matching for researchers, practitioners, and developers. With a modular architecture and robust features, OntoAligner provides powerful tools to bridge ontologies effectively.

🧪 Installation

You can install OntoAligner from PyPI using pip:

pip install ontoaligner

Alternatively, to get the latest version directly from the source, use the following commands:

git clone [email protected]:sciknoworg/OntoAligner.git
pip install ./ontoaligner

📚 Documentation

Comprehensive documentation for OntoAligner, including detailed guides and examples, is available at ontoaligner.readthedocs.io.

Tutorials

Example	Tutorial	Script
Lightweight	📚 Fuzzy Matching	📝 Code
Retrieval	📚 Retrieval Aligner	📝 Code
Large Language Models	📚 Large Language Models Aligner	📝 Code
Retrieval Augmented Generation	📚 Retrieval Augmented Generation	📝 Code
FewShot	📚 FewShot RAG	📝 Code
In-Context Vectors Learning	📚 In-Context Vectors RAG	📝 Code

🚀 Quick Tour

Below is an example of using Retrieval-Augmented Generation (RAG) step-by-step approach for ontology matching:

from ontoaligner.ontology import MaterialInformationMatOntoOMDataset
from ontoaligner.utils import metrics, xmlify
from ontoaligner.ontology_matchers import MistralLLMBERTRetrieverRAG
from ontoaligner.encoder import ConceptParentRAGEncoder
from ontoaligner.postprocess import rag_hybrid_postprocessor

# Step 1: Initialize the dataset object for MaterialInformation MatOnto dataset
task = MaterialInformationMatOntoOMDataset()
print("Test Task:", task)

# Step 2: Load source and target ontologies along with reference matchings
dataset = task.collect(
    source_ontology_path="assets/MI-MatOnto/mi_ontology.xml",
    target_ontology_path="assets/MI-MatOnto/matonto_ontology.xml",
    reference_matching_path="assets/MI-MatOnto/matchings.xml"
)

# Step 3: Encode the source and target ontologies
encoder_model = ConceptParentRAGEncoder()
encoded_ontology = encoder_model(source=dataset['source'], target=dataset['target'])

# Step 4: Define configuration for retriever and LLM
retriever_config = {"device": 'cuda', "top_k": 5,}
llm_config = {"device": "cuda", "max_length": 300, "max_new_tokens": 10, "batch_size": 15}

# Step 5: Initialize Generate predictions using RAG-based ontology matcher
model = MistralLLMBERTRetrieverRAG(retriever_config=retriever_config, llm_config=llm_config)
predicts = model.generate(input_data=encoded_ontology)

# Step 6: Apply hybrid postprocessing
hybrid_matchings, hybrid_configs = rag_hybrid_postprocessor(predicts=predicts,
                                                            ir_score_threshold=0.1,
                                                            llm_confidence_th=0.8)

evaluation = metrics.evaluation_report(predicts=hybrid_matchings, references=dataset['reference'])
print("Hybrid Matching Evaluation Report:", evaluation)

# Step 7: Convert matchings to XML format and save the XML representation
xml_str = xmlify.xml_alignment_generator(matchings=hybrid_matchings)
open("matchings.xml", "w", encoding="utf-8").write(xml_str)

Ontology alignment pipeline using RAG method:

import ontoaligner

pipeline = ontoaligner.OntoAlignerPipeline(
    task_class=ontoaligner.ontology.MouseHumanOMDataset,
    source_ontology_path="assets/MI-MatOnto/mi_ontology.xml",
    target_ontology_path="assets/MI-MatOnto/matonto_ontology.xml",
    reference_matching_path="assets/MI-MatOnto/matchings.xml",
)

matchings, evaluation = pipeline(
    method="rag",
    encoder_model=ontoaligner.encoder.ConceptRAGEncoder(),
    model_class=ontoaligner.ontology_matchers.MistralLLMBERTRetrieverRAG,
    postprocessor=ontoaligner.postprocess.rag_hybrid_postprocessor,
    llm_path='mistralai/Mistral-7B-v0.3',
    retriever_path='all-MiniLM-L6-v2',
    llm_threshold=0.5,
    ir_threshold=0.7,
    top_k=5,
    max_length=512,
    max_new_tokens=10,
    device='cuda',
    batch_size=32,
    return_matching=True,
    evaluate=True
)

print("Matching Evaluation Report:", evaluation)

⭐ Contribution

We welcome contributions to enhance OntoAligner and make it even better! Please review our contribution guidelines in CONTRIBUTING.md before getting started. Your support is greatly appreciated.

If you encounter any issues or have questions, please submit them in the GitHub issues tracker.

💡 Acknowledgements

If you use OntoAligner in your work or research, please cite the following:

@software{babaei_giglou_ontoaligner_2024,
  author       = {Hamed Babaei Giglou and Jennifer D'Souza and Oliver Karras and S{"o}ren Auer},
  title        = {OntoAligner: A Comprehensive Modular and Robust Python Toolkit for Ontology Alignment},
  version      = {1.0.0},
  year         = {2024},
  url          = {https://github.com/sciknoworg/OntoAligner},
}

This software is licensed under the Apache 2.0.

Name		Name	Last commit message	Last commit date
Latest commit History 287 Commits
.github/workflows		.github/workflows
assets		assets
docs		docs
examples		examples
images		images
ontoaligner		ontoaligner
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CITATION.cff		CITATION.cff
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
build.sh		build.sh
pyproject.toml		pyproject.toml
readthedocs.yml		readthedocs.yml
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OntoAligner: A Comprehensive Modular and Robust Python Toolkit for Ontology Alignment

🧪 Installation

📚 Documentation

🚀 Quick Tour

⭐ Contribution

💡 Acknowledgements

About

Releases 5

Contributors 3

Languages

License

sciknoworg/OntoAligner

Folders and files

Latest commit

History

Repository files navigation

OntoAligner: A Comprehensive Modular and Robust Python Toolkit for Ontology Alignment

🧪 Installation

📚 Documentation

🚀 Quick Tour

⭐ Contribution

💡 Acknowledgements

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 5

Contributors 3

Languages