-
Notifications
You must be signed in to change notification settings - Fork 4
advanced usage
AdeBC edited this page Feb 3, 2021
·
1 revision
construct a biome ontology using microbiomes.txt
expert construct -i microbiomes.txt -o ontology.pkl
# Also equivalent to
expert construct --input microbiomes.txt --output ontology.pkl
- Input:
microbiomes.txt
file, contains path from "root" node to each leaf node of biome ontology.
root:Environmental:Terrestrial:Soil
root:Environmental:Terrestrial:Soil:Agricultural
root:Environmental:Terrestrial:Soil:Boreal_forest
root:Environmental:Terrestrial:Soil:Contaminated
root:Environmental:Terrestrial:Soil:Crop
root:Environmental:Terrestrial:Soil:Crop:Agricultural_land
root:Environmental:Terrestrial:Soil:Desert
root:Environmental:Terrestrial:Soil:Forest_soil
root:Environmental:Terrestrial:Soil:Grasslands
root:Environmental:Terrestrial:Soil:Loam:Agricultural
root:Environmental:Terrestrial:Soil:Permafrost
root:Environmental:Terrestrial:Soil:Sand
root:Environmental:Terrestrial:Soil:Tropical_rainforest
root:Environmental:Terrestrial:Soil:Uranium_contaminated
root:Environmental:Terrestrial:Soil:Wetlands
root:Host-associated:Plants:Rhizosphere:Soil
- Output: constructed biome ontology (pickle format, non-human-readable).
Mapping their source environments to microbiome ontology
expert map --to-otlg -t ontology.pkl -i mapper.csv -o labels.h5
# Also equivalent to
expert map --to-otlg --otlg ontology.pkl --input mapper.csv --output labels.h5
- Input: the mapper file, contains biome source information for samples.
Env | SampleID | |
---|---|---|
0 | root:Engineered:Wastewater | ERR2260442 |
1 | root:Engineered:Wastewater | SRR980322 |
2 | root:Engineered:Wastewater | ERR2985272 |
3 | root:Engineered:Wastewater | ERR2814648 |
4 | root:Engineered:Wastewater | ERR2985275 |
- Output: the labels for samples in each layer of the biome ontology (HDF format, non-human-readable).
Convert input data to a count matrix in genus level.
expert convert -i countMatrices.txt -o countMatrix.h5 --in-cm
# Also equivalent to
expert convert --input countMatrices.txt --output countMatrix.h5 --in-cm
- Input: a text file contains path to input count matrix files / abundance tables.
datasets/soil_dataset/root:Host-associated:Plants:Rhizosphere:Soil/MGYS00005146-ERR1690680.tsv
datasets/soil_dataset/root:Host-associated:Plants:Rhizosphere:Soil/MGYS00005146-ERR1689675.tsv
datasets/soil_dataset/root:Host-associated:Plants:Rhizosphere:Soil/MGYS00000513-ERR986792.tsv
datasets/soil_dataset/root:Host-associated:Plants:Rhizosphere:Soil/MGYS00005146-ERR1691198.tsv
datasets/soil_dataset/root:Host-associated:Plants:Rhizosphere:Soil/MGYS00001704-ERR1905845.tsv
datasets/soil_dataset/root:Host-associated:Plants:Rhizosphere:Soil/MGYS00005146-ERR1689214.tsv
datasets/soil_dataset/root:Host-associated:Plants:Rhizosphere:Soil/MGYS00005146-ERR1689910.tsv
- Output: converted count matrix file in genus level (HDF format, non-human-readable).
Build EXPERT model from scratch and training
expert train -i countMatrix.h5 -l labels.h5 -t ontology.pkl -o model
# Also equivalent to
expert train --input countMatrix.h5 --labels labels.h5 --otlg ontology.pkl --output model
- Input: biome ontology and converted count matrix in genus level (and also labels for samples involved in the count matrix).
- Output: trained model.
expert transfer -i countMatrix.h5 -l labels.h5 -t ontology.pkl -o model
# Also equivalent to
expert transfer --input countMatrix.h5 --labels labels.h5 --otlg ontology.pkl --output model
- Input: biome ontology and converted count matrix in genus level (and also labels for samples involved in the count matrix).
- Output: trained model.
expert search -i countMatrix.h5 -o searchResult -m model
# Also equivalent to
expert search --input countMatrix.h5 --output searchResult --model model
- Input: converted count matrix in genus level.
- Output: search result (multi-layer ).
searchResult
├── layer-2.csv
├── layer-3.csv
├── layer-4.csv
├── layer-5.csv
└── layer-6.csv
Take layer-2.csv
as an example.
root:Engineered | root:Environmental | root:Host-associated | root:Mixed | Unknown | |
---|---|---|---|---|---|
ERR2278752 | 0.0041427016 | 0.26372418 | 0.68632126 | 0.00040003657 | 0.045411825 |
ERR2278753 | 0.002841179 | 0.07928896 | 0.91735524 | 0.00051463145 | 0.0 |
ERR2666855 | 0.0006751048 | 0.0021803565 | 0.9970531 | 9.1493675e-05 | 0.0 |
ERR2666860 | 0.0005227786 | 0.013902989 | 0.98542625 | 0.00014803928 | 0.0 |
ERR2666881 | 0.0009569057 | 0.0023957777 | 0.9965403 | 0.00010694566 | 0.0 |
expert evaluate -i searchResultFolder -l labels.h5 -o EvaluationReport -p NUMProcesses
# Also equivalent to
expert evaluate --input searchResultFolder --labels labels.h5 --output EvaluationReport --processors NUMProcesses
- Input: multi-layer labels and search result (source contribution) for samples.
- Output: label-based evaluation report.
EvaluationReport
├── layer-2
│ └── root:Host-associated.csv
├── layer-2.csv
├── layer-3
│ └── root:Host-associated:Human.csv
├── layer-3.csv
├── layer-4
│ ├── root:Host-associated:Human:Circulatory_system.csv
│ ├── root:Host-associated:Human:Digestive_system.csv
│ ├── root:Host-associated:Human:Lympathic_system.csv
│ ├── root:Host-associated:Human:Reproductive_system.csv
│ ├── root:Host-associated:Human:Respiratory_system.csv
│ └── root:Host-associated:Human:Skin.csv
├── layer-4.csv
├── layer-5
│ ├── root:Host-associated:Human:Circulatory_system:Blood.csv
│ ├── ...
│ └── root:Host-associated:Human:Respiratory_system:Pulmonary_system.csv
├── layer-5.csv
├── layer-6
│ ├── root:Host-associated:Human:Digestive_system:Large_intestine:Fecal.csv
│ ├── ...
│ └── root:Host-associated:Human:Respiratory_system:Pulmonary_system:Sputum.csv
└── layer-6.csv
Take layer-4/root:Host-associated:Human:Skin.csv
as an example.
t | TN | FP | FN | TP | Acc | Sn | Sp | TPR | FPR | Rc | Pr | F1 | ROC-AUC | F-max |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0.0 | 0 | 47688 | 0 | 4847 | 0.0923 | 1.0 | 0.0 | 1.0 | 1.0 | 1.0 | 0.0923 | 0.1689 | 0.9951 | 0.9374 |
0.01 | 44794 | 2893 | 30 | 4816 | 0.9444 | 0.9938 | 0.9393 | 0.9938 | 0.0607 | 0.9938 | 0.6247 | 0.7672 | 0.9951 | 0.9374 |
0.02 | 45545 | 2142 | 44 | 4802 | 0.9584 | 0.9909 | 0.9551 | 0.9909 | 0.0449 | 0.9909 | 0.6915 | 0.8146 | 0.9951 | 0.9374 |
0.03 | 45934 | 1753 | 59 | 4787 | 0.9655 | 0.9878 | 0.9632 | 0.9878 | 0.0368 | 0.9878 | 0.732 | 0.8409 | 0.9951 | 0.9374 |
0.04 | 46228 | 1459 | 73 | 4773 | 0.9708 | 0.9849 | 0.9694 | 0.9849 | 0.0306 | 0.9849 | 0.7659 | 0.8617 | 0.9951 | 0.9374 |
Run the program with -h
option to see a detailed description on work modes & options.