Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Add schema matcher plugin #13

Draft
wants to merge 5 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,9 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/) and this p

## [Unreleased]

TODO: add at least one Added, Changed, Deprecated, Removed, Fixed or Security section
### Added

- Added `SchemaMatcherPlugin` interface.


## [4.5.0] 2024-01-10
Expand Down
52 changes: 52 additions & 0 deletions cmem_plugin_base/dataintegration/ai/schema_matcher.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
"""All classes related to schema matcher plugins.
WARNING: All classes in this file are preliminary and might be changed."""

from cmem_plugin_base.dataintegration.plugins import PluginBase


class MatchingSchema:
"""The schema that is used by schema matchers."""

def write_rdf_file(self, path: str, lang: str) -> None:
"""Write this schema to an RDF file.

:param path: The target file path.
:param lang: The RDF format. Usually, either "N-TRIPLE" or "TURTLE".
"""
# Implementation provided by DataIntegration


class Correspondence:
"""Candidate match between two properties from two schemata."""

def __init__(self, source: str, target: str, confidence: float):
self.source = source
self.target = target
self.confidence = confidence

def __str__(self):
"""Convert to a string representation"""
return f"{self.source} - {self.target} ({self.confidence})"


class Alignment:
"""Set of correspondences between two schemata."""

def __init__(self, matches: list[Correspondence]):
self.matches = matches


class SchemaMatcherPlugin(PluginBase):
"""
A schema matcher aligns a source dataset with target vocabularies.
"""

def match(self, source: MatchingSchema, target: MatchingSchema) -> Alignment:
"""
Aligns a source dataset with target vocabularies.

:param source: Source schema
:param target: Target schema

:return: The generated alignment.
"""
3 changes: 3 additions & 0 deletions cmem_plugin_base/dataintegration/description.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@
PluginContextParameterType,
)
from cmem_plugin_base.dataintegration.utils import generate_id
from cmem_plugin_base.dataintegration.ai.schema_matcher import SchemaMatcherPlugin


class Icon:
Expand Down Expand Up @@ -129,6 +130,8 @@ def __init__( # pylint: disable=too-many-arguments
self.plugin_type = "WorkflowPlugin"
elif issubclass(plugin_class, TransformPlugin):
self.plugin_type = "TransformPlugin"
elif issubclass(plugin_class, SchemaMatcherPlugin):
self.plugin_type = "SchemaMatcherPlugin"
else:
raise ValueError(
f"Class {plugin_class.__name__} does not implement a supported "
Expand Down
Loading
Loading