GitHub

Basic Overview

Distribution Matching is an Instance-Based Ontology Matching Technique. Two terminologies can be align by comparing the distribution of term values.

This technique is applied to the case of HealthCare information system alignement, where laboratory results tend to have the same distribution of values regardless of the* geographical site of analysis.

This technique has been applied to the alignment of the MIMIC-III dataset with the DataWarehouse of the Lille University Hospital.

Installation

As the project is still under development we recommend an installation in editable mode.

git clone https://github.com/mcrts/dmatch dmatch
pip install -e dmatch/

This will download the package source code into a dmatch directory, and install it in editable mode.

mkdir my_workspace
cd my_workspace
dmatch init .

This will create a workspace and initialize it. The initialization procedure will copy two files : connections.cfg in which you can define your own connection strings. _connectors.py in which you can develop your own query to extract data from datasources. A MimicConnector class is provided as a working example.

Resources

The resources directory contain several files : mimic3filter.csv a filter file to only keep a specific set of terms. mimic3-mimic3reference.csv a reference file for training purposes. mimic3-mimic3.pipeline a model to perform distribution matching alignment.

Command Line Interface

The package is delivered with an API dmatch to perform the alignment.

dmatch align \
    MIMIC3 \
    MIMIC3 \
    resources/mimic3-mimic3.pipeline \
    mimic3-mimic3 \
     --filter1 resources/mimic3_filter.csv \
     --filter2 resources/mimic3_filter.csv \

This will align the MIMIC3 datasource with itself, all results will be stored in the mimic-mimic directory.

dmatch prepare \
    MIMIC3 \
    MIMIC3 \
    mimic3-mimic3 \
     --filter1 resources/mimic3_filter.csv \
     --filter2 resources/mimic3_filter.csv \

This will prepare a dataset of all correspondances between the two datasources. As well as evaluate each correspondances using different measure such as the Hellinger Distance and the Kolmogorov-Smirnov Test. Combined with the reference file provided this will produce a dataset ready for training a decision model.

A lower level API dmatch-tools is also available for a finer control over the processus.

dmatch-tools index MIMIC3 mimic3A
dmatch-tools index MIMIC3 mimic3B
dmatch-tools preprocess mimic3A --filter resources/mimic3_filter.csv
dmatch-tools preprocess mimic3B --filter resources/mimic3_filter.csv
dmatch-tools prepare mimic3A mimic3B mimic3-mimic3
dmatch-tools score mimic3-mimic3
dmatch-tools match mimic3-mimic3 resources/mimic3-mimic3.pipeline

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
bin		bin
dmatch		dmatch
examples		examples
resources		resources
.gitignore		.gitignore
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
icon.png		icon.png
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Basic Overview

Installation

Resources

Command Line Interface

About

Releases

Packages

Languages

License

mcrts/dmatch

Folders and files

Latest commit

History

Repository files navigation

Basic Overview

Installation

Resources

Command Line Interface

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages