The OncoTree Mapping tool was developed to facilitate the mapping of OncoTree codes between different OncoTree release versions. Below you can find basic instructions in the "Running the tool" section, and a detailed walkthrough in the "Tutorial" section.
As of January 2020, the sun has set on Python 2.X. If you are still using Python 2.X, we encourage you to use Python 3.6 and above, but here is the last version of the code to work with Python 2.X. In the future, only Python 3.X versions of this tool will be supported.
Click here to download the script: oncotree_to_oncotree.py
The OncoTree Mapping tool can be run with the following command:
python3 <path/to/scripts/oncotree_to_oncotree.py> --source-file <path/to/source/file> --target-file <path/to/target/file> --source-version <source_oncotree_version> --target-version <target_oncotree_version>
Options
-i | --source-file
: This is the source clinical file path. It must containONCOTREE_CODE
in the file header and it must contain OncoTree codes corresponding to the<source_oncotree_version>
. Read more about the cBioPortal clinical file format here.-o | --target-file
: This is the path to the target clinical file that will be generated. It will contain mapped OncoTree codes from<source_oncotree_version>
-to-<target_oncotree_version>
.-s | --source-version
: This is the source OncoTree version. The OncoTree codes in the source file must correspond to this version.-t | --target-version
: This is the target OncoTree version that the script will attempt to map the source file OncoTree codes to.
The list of OncoTree versions available are viewable here or on the dropdown menu of the OncoTree home page.
Note: the source file should not contain embedded line breaks within any single cell in the table, such as those created by using the keyboard combinations Alt-Enter or Command-Option-Enter while editing a cell in Microsoft Excel.
For a detailed walkthrough of running the tool, see the "Tutorial" section below.
The OncoTree Mapper Tool will automatically replace the value in the ONCOTREE_CODE
column with the mapped code if available. The tool will also add a new column called ONCOTREE_CODE_OPTIONS
containing suggestions for OncoTree codes if one or more nodes could not be directly mapped. The ONCOTREE_CODE_OPTIONS
column formats its suggestions differently depending on the mapping results. Possible suggestion formats and corresponding examples are shown below.
Unambiguous direct mappings occur when an OncoTree code maps directly to a single code in the target version. In this case, the ONCOTREE_CODE_OPTIONS
column will be left blank, and the mapped code will be automatically placed in the ONCOTREE_CODE
column. Unambiguous direct mappings are checked for addition of more granular nodes; to see how this may affect the ONCOTREE_CODE_OPTIONS
column formatting, please refer to the subsection below "4. More Granular Nodes Introduced".
Ambiguous direct mappings occur when an OncoTree code maps to multiple codes in the target version. The ONCOTREE_CODE_OPTIONS
column formats the output as follows:
'Source Code' -> {'Code 1', 'Code 2', 'Code 3', } e.g. ALL -> {TLL, BLL}
Example: Schema describing the revocation of OncoTree node ALL is mapped to multiple nodes.
In
oncotree_2018_05_01
,ALL
had two children:TALL
andBALL
. On releaseoncotree_2018_06_01
, the ALL node was discontinued and theTALL
node was renamedTLL
and theBALL
node was renamedBLL
.
The ONCOTREE_CODE_OPTIONS
column would be shown as follows:
ALL -> {TLL, BLL}
Ambiguous direct mappings are also checked for addition of more granular nodes; to see how this may affect the ONCOTREE_CODE_OPTIONS
column formatting, please refer to the subsection below "4. More Granular Nodes Introduced".
No direct mappings occur when the source OncoTree code is unrelated to any OncoTree code in the target version. One such possibility is mapping a newly introduced OncoTree code backwards in time. In this case, the tool finds the closest set of neighbors (e.g parents and children) which are mappable in the target version. The ONCOTREE_CODE_OPTIONS
column returns the set with the keyword Neighbors as follows:
'Source Code' -> Neighbors {'Code 1', 'Code 2', 'Code 3', } e.g. UPA -> Neighbors {BLADDER}
Example: Schema describing a case where new OncoTree node UPA cannot be directly mapped backwards to a node.
In
oncotree_2019_03_01
,UPA
was added to the OncoTree as a child node ofBLADDER
. BecauseUPA
did not exist in previous versiononcotree_2018_05_01
and did not replace any existing node, the tool uses the surrounding nodes when mapping backwards. In this case, the parent nodeBLADDER
is returned as the closest match.
The ONCOTREE_CODE_OPTIONS
column would be shown as follows:
UPA -> Neighbors {BLADDER}
In certain cases, the target version can also introduce nodes with more specific descriptions. When this occurs, the tool will add the string more granular choices introduced
to the existing text in the ONCOTREE_CODE_OPTIONS
column as follows:
'Source Code' -> {'Code 1', }, more granular choices introduced e.g. TALL -> {TLL}, more granular choices introduced
Example: Schema describing a case where OncoTree node TALL is mapped to a node with more granular children
In
oncotree_2019_03_01
,TALL
was a leaf node with no children. In releaseoncotree_2019_06_01
,TLL
was introduced as a replacement forTALL
with additional childrenETPLL
andNKCLL
.
The ONCOTREE_CODE_OPTIONS
column would be shown as follows:
TALL -> {TLL}, more granular choices introduced
An invalid source OncoTree Code means the provided code cannot be found in the source version. In such a case, mapping cannot be attempted and the ONCOTREE_CODE_OPTIONS
column displays the following:
'Source Code' -> ???, OncoTree code not in source OncoTree version
The following tutorial will guide the user through using the oncotree_to_oncotree.py tool. The tutorial will go through the expected output to highlight specific mapping cases. Additionally, the tutorial will cross-reference the output with the generated mapping summary to demonstrate how it can be used to aid in manual selection of unresolved nodes.
Download the sample input file (data_clinical_sample.txt) from here .
Download oncotree_to_oncotree.py from here .
Run the following command from the command line:
python oncotree_to_oncotree.py -i data_clinical_sample.txt -o data_clinical_sample_remapped.txt -s oncotree_2018_03_01 -t oncotree_2019_03_01
The tool will output two files: data_clinical_sample_remapped.txt
and data_clinical_sample_remapped_summary.html
.
For your reference, you can see the expected output files - here and here
Examine data_clinical_sample_remapped.txt
; the first five columns of the file should look as follows:
PATIENT_ID | SAMPLE_ID | AGE_AT_SEQ_REPORT | ONCOTREE_CODE | ONCOTREE_CODE_OPTIONS |
---|---|---|---|---|
P1 | S1 | 41 | ALL -> {BLL,TLL}, more granular choices introduced | |
P2 | S2 | 60 | BALL -> {BLL}, more granular choices introduced | |
P3 | S3 | <18 | TALL -> {TLL}, more granular choices introduced | |
P4 | S4 | 71 | PTCL | |
P5 | S5 | 64 | PTCL | |
P6 | S6 | 36 | CHL -> {CHL}, more granular choices introduced | |
P7 | S7 | 63 | SpCC -> ???, OncoTree code not in source OncoTree version | |
P8 | S8 | 63 | MCL -> {MCL}, more granular choices introduced | |
P9 | S9 | 73 | HGNEE -> ???, OncoTree code not in source OncoTree version | |
P10 | S10 | 52 | ONCOTREE_CODE column blank : use a valid OncoTree code or "NA" | |
P11 | S11 | 77 | NA | |
P12 | S12 | 87 | TNKL -> {MTNN}, more granular choices introduced | |
P13 | S13 | 79 | HIST -> {HDCN}, more granular choices introduced | |
P14 | S14 | 53 | CLLSLL | |
P15 | S15 | 69 | CLLSLL | |
P16 | S16 | 65 | LEUK -> {MNM}, more granular choices introduced | |
P17 | S17 | 66 | MYCF | |
P18 | S18 | 66 | RBL |
Using values in the ONCOTREE_CODE_OPTIONS
as a guide, manually select and place an OncoTree Code in the ONCOTREE_CODE
column. For additional information, refer to the summary file data_clinical_sample_remapped_summary.html
. Repeat for all rows in the output file. Several examples are shown below.
SAMPLE_ID | ONCOTREE_CODE | ONCOTREE_CODE_OPTIONS |
---|---|---|
S1 | ALL -> {BLL,TLL}, more granular choices introduced |
Source OncoTree code ALL
maps directly to codes BLL
and TLL
. Users should place either BLL
or TLL
in the ONCOTREE_CODE
column. The ONCOTREE_CODE_OPTIONS
column also notes that more granular choices were introduced; as such, users can use the summary file for additional guidance.
Searching by source code, the following information can be found in the summary file:
The summary file provides a link to the closest shared parent node LNM
; users can choose more granular nodes by referencing the provided tree:
SAMPLE_ID | ONCOTREE_CODE | ONCOTREE_CODE_OPTIONS |
---|---|---|
S2 | BALL -> {BLL}, more granular choices introduced |
Source OncoTree code BALL
maps directly to BLL
. Users should place BLL
in the ONCOTREE_CODE
column. However, similar to sample 1, the ONCOTREE_CODE_OPTIONS
indicates there are more granular choices available. Users can follow the same steps as above and use the summary file to select a more granular node.
SAMPLE_ID | ONCOTREE_CODE | ONCOTREE_CODE_OPTIONS |
---|---|---|
S4 | PTCL |
No additional resolution is needed; the previous OncoTree code was already automatically mapped to PTCL and placed in the ONCOTREE_CODE
column. ONCOTREE_CODE_OPTIONS
is empty because no manual selections were necessary.
SAMPLE_ID | ONCOTREE_CODE | ONCOTREE_CODE_OPTIONS |
---|---|---|
S4 | HGNEE -> ???, OncoTree code not in source OncoTree version |
Source OncoTree code HGNEE
was not found in the source OncoTree version and therefore could not be mapped. Users can either reassign a new source OncoTree code (and rerun the script) or remove the sample.
After filling in the ONCOTREE_CODE
column with an OncoTree code for each sample, use an editor (e.g. Microsoft Excel, vim, etc.) to trim off the ONCOTREE_CODE_OPTIONS
column. The resulting file will be a new data_clinical_sample.txt
file with all codes mapped to the target version. The first four columns of the final result is shown below:
PATIENT_ID | SAMPLE_ID | AGE_AT_SEQ_REPORT | ONCOTREE_CODE |
---|---|---|---|
P1 | S1 | 41 | BLL |
P2 | S2 | 60 | BLL |
P3 | S3 | <18 | TLL |
P4 | S4 | 71 | PTCL |
P5 | S5 | 64 | PTCL |
P6 | S6 | 36 | CHL |
P7 | S7 | 63 | SPCC |
P8 | S8 | 63 | MCL |
P10 | S10 | 52 | NA |
P11 | S11 | 77 | NA |
P12 | S12 | 87 | MTNN |
P13 | S13 | 79 | HDCN |
P14 | S14 | 53 | CLLSLL |
P15 | S15 | 69 | CLLSLL |
P16 | S16 | 65 | MNM |
P17 | S17 | 66 | MYCF |
P18 | S18 | 66 | RBL |
The Ontology Mapping tool was developed to facilitate the mapping between different cancer classification systems. We currently support mapping between OncoTree, ICD-O, NCIt, UMLS and HemeOnc systems. The tool and the mapping file (the mapping file does not need to be downloaded to run the tool) can be found here
The Ontology Mapping tool runs on python 3 and requires pandas
and requests
libraries. These libraries can be installed using
pip3 install pandas
pip3 install requests
The Ontology Mapping tool can be run with the following command:
python <path/to/scripts/ontology_to_ontology_mapping_tool.py> --source-file <path/to/source/file> --target-file <path/to/target/file> --source-code <source_ontology_code> --target-code <target_ontology_code>
Options
i | --source-file
: This is the source file path. The source file must contain one of theONCOTREE_CODE
,NCIT_CODE
,UMLS_CODE
,ICDO_TOPOGRAPHY_CODE
,ICDO_MORPHOLOGY_CODE
orHEMEONC_CODE
in the file header and it must contain codes corresponding to the Ontology System.o | --target-file
: This is the path to the target file that will be generated. It will contain ontologies mapped from source code in<source-file>
to<target-code>
.s | --source-code
: This is the source ontology code in<source-file>
. It must be one of theONCOTREE_CODE
,NCIT_CODE
,UMLS_CODE
,ICDO_TOPOGRAPHY_CODE
,ICDO_MORPHOLOGY_CODE
orHEMEONC_CODE
.t | --target-code
: This is the target ontology code that the script will attempt to map the source file ontology code to. It must be one of theONCOTREE_CODE
,NCIT_CODE
,UMLS_CODE
,ICDO_TOPOGRAPHY_CODE
,ICDO_MORPHOLOGY_CODE
orHEMEONC_CODE
.
Note
- The source file should be tab delimited and should contain one of the ontology:
ONCOTREE_CODE
,NCIT_CODE
,UMLS_CODE
,ICDO_TOPOGRAPHY_CODE
,ICDO_MORPHOLOGY_CODE
orHEMEONC_CODE
in the file header. - We currently are allowing only one ontology to another ontology mapping. In the future, we plan to extend the tool to support mapping to multiple ontology systems.