Sources (not included in the repo):
- PubMed abstracts
- GenBank
- DSMZ
- CIRM
Named entity recognition (taxon names and habitat mentions), normalization (habitats) and relation extraction (lives in): openminted/UC-AS-C.
make -n output/food-microbiome-habitats
prepare-food-microbiome.py
: main script for building the table. It requires the following resources:
resources/taxa+id_Bacteria.txt
: synonym map of bacteria taxa.
predictions/PubMed_lives-in.txt
, predictions/genbank_mappings.txt
, predictions/dsmz-taxon-habitat-mappings.txt
, predictions/CIRM_08022018.txt
: taxon-habitat predictions.
etc/habitat-focus.txt
: identifier and label for habitat focus concepts.
Goal: filter bacteria samples and split into several files.
make -n resources/biosample-split
resources/biosample_set.xml
: complete BioSample as downloaded from NCBI.
Goal: inventory of all Attributes used in BioSample entries.
make -n output/biosample-attributes-count.txt
Goal: evaluate the amount of habitat information found in BioSample.
make -n output/biosample-table.txt