diff --git a/docs/genome_annotation.rst b/docs/genome_annotation.rst new file mode 100644 index 0000000..2912748 --- /dev/null +++ b/docs/genome_annotation.rst @@ -0,0 +1,75 @@ +.. _genome_annotation: + +Annotated Genome Data +---------------------- + +Overview +......... + +Generate JSON-LD files for annotated genes from a given GFF3 file. Currently GFF3 files from ENSEMBL and NCBI are supported. + +Each JSON-LD file will contain: + +- GeneAnnotation objects +- 1 GenomeAnnotation object +- 1 GenomeAssembly object +- 1 OrganismTaxon object +- 1 Checksum object + +Command Line +............. + +``bkbit gff2jsonld`` +,,,,,,,,,,,,,,,,,,,,, + + .. code-block:: bash + + $ bkbit gff2jsonld [OPTIONS] GFF3_URL + +Options +,,,,,,,, + + ``-a, --assembly_accession`` + ID assigned to the genomic assembly used in the GFF3 file. + **Note: Must be provided when using ENSEMBL GFF3 files** + + ``-s, --assembly_strain`` + Specific strain of the organism associated with the GFF3 file. + + ``-l, --log_level`` + Logging level. + + Default: + WARNING + Options: + DEBUG | INFO | WARNING | ERROR | CRITICIAL + + ``-f, --log_to_file`` + Log to a file instead of the console. + + Default: + FALSE + +Arguments +,,,,,,,,,,, + + ``GFF3_URL`` + URL to the GFF3 file. + +Examples +......... + +Example 1: NCBI GFF3 file +,,,,,,,,,,,,,,,,,,,,,,,,,, + +.. code-block:: bash + + $ bkbit gff2jsonld 'https://ftp.ncbi.nlm.nih.gov/genomes/all/annotation_releases/9823/106/GCF_000003025.6_Sscrofa11.1/GCF_000003025.6_Sscrofa11.1_genomic.gff.gz' > output.jsonld + + +Example 2: ENSEMBL GFF3 file +,,,,,,,,,,,,,,,,,,,,,,,,,,,,, + +.. code-block:: bash + + $ bkbit gff2jsonld -a 'GCF_003339765.1' 'https://ftp.ensembl.org/pub/release-104/gff3/macaca_mulatta/Macaca_mulatta.Mmul_10.104.gff3.gz' > output.jsonld diff --git a/docs/index.rst b/docs/index.rst index 921af7c..1028ac3 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -14,10 +14,12 @@ This package contains tools to use the BICAN Knowledgebase Data Models. install .. toctree:: - :maxdepth: 2 - :caption: USAGE + :maxdepth: 4 + :caption: DATA TRANSLATORS - data_translators + specimen_file_manifest + specimen_metadata + genome_annotation .. toctree:: :maxdepth: 1 @@ -25,11 +27,9 @@ This package contains tools to use the BICAN Knowledgebase Data Models. modules - - Indices and tables ================== * :ref:`genindex` * :ref:`modindex` -* :ref:`search` \ No newline at end of file +.. * :ref:`search` diff --git a/docs/specimen_file_manifest.rst b/docs/specimen_file_manifest.rst new file mode 100644 index 0000000..1e6507d --- /dev/null +++ b/docs/specimen_file_manifest.rst @@ -0,0 +1,4 @@ +.. _specimen_file_manifest: + +Specimen File Manifest +---------------------- diff --git a/docs/data_translators.rst b/docs/specimen_metadata.rst similarity index 61% rename from docs/data_translators.rst rename to docs/specimen_metadata.rst index e7f4a28..c262a67 100644 --- a/docs/data_translators.rst +++ b/docs/specimen_metadata.rst @@ -1,107 +1,55 @@ -.. _datatranslators: +.. _specimen_metadata: -Data Translators -====== - -Annotated Genome Data +Specimen Metadata ---------------------- -Generate JSON-LD files for annotated genes from a given GFF3 file. Currently GFF3 files from ENSEMBL and NCBI are supported. -Each JSON-LD file will contain: +Overview +......... -- GeneAnnotation objects -- 1 GenomeAnnotation object -- 1 GenomeAssembly object -- 1 OrganismTaxon object -- 1 Checksum object +Generate JSON-LD files for specimens, subjects, and their repective ancestors or descendants. Data is retrieved from the `BICAN Specimen Portal `_. Command Line ............. -``bkbit gff2jsonld`` +``bkbit specimen2jsonld`` ,,,,,,,,,,,,,,,,,,,,, - .. code-block:: bash - - $ bkbit gff2jsonld [OPTIONS] GFF3_URL - -Options -,,,,,,,, - - ``-a, --assembly_accession`` - ID assigned to the genomic assembly used in the GFF3 file. - **Note: Must be provided when using ENSEMBL GFF3 files** - - ``-s, --assembly_strain`` - Specific strain of the organism associated with the GFF3 file. - - ``-l, --log_level`` - Logging level. - - Default: - WARNING - Options: - DEBUG | INFO | WARNING | ERROR | CRITICIAL - - ``-f, --log_to_file`` - Log to a file instead of the console. - - Default: - FALSE - -Arguments -,,,,,,,, - - ``GFF3_URL`` - URL to the GFF3 file. - -Examples -......... - -Example 1: NCBI GFF3 file -,,,,,,,,,,,,,,,,,,,,,,,,,, - .. code-block:: bash - $ bkbit gff2jsonld 'https://ftp.ncbi.nlm.nih.gov/genomes/all/annotation_releases/9823/106/GCF_000003025.6_Sscrofa11.1/GCF_000003025.6_Sscrofa11.1_genomic.gff.gz' > output.jsonld - + $ bkbit specimen2jsonld [OPTIONS] NHASH_ID_OR_FILE -Example 2: ENSEMBL GFF3 file -,,,,,,,,,,,,,,,,,,,,,,,,,,,,, - -.. code-block:: bash - - $ bkbit gff2jsonld -a 'GCF_003339765.1' 'https://ftp.ensembl.org/pub/release-104/gff3/macaca_mulatta/Macaca_mulatta.Mmul_10.104.gff3.gz' > output.jsonld +**Options** + ``-d, --decendants`` + A boolean flag that, when provided, generates BICAN objects for the given NHASH_ID and all of its descendants. + If this flag is not set (DEFAULT), then the ancestors will be processed. -Specimen Data ----------------------- -Generate JSON-LD files for specimens, subjects, and their repective ancestors or descendants. Data is retrieved from the `BICAN Specimen Portal `_. +**Arguments** -Command Line -............. + ``NHASH_ID_OR_FILE`` + The NHASH_ID of the specimen or a file containing a list of NHASH_IDs. + If a file is provided, the file should contain one NHASH_ID per line. -``bkbit specimen2jsonld`` +``filemanifest2jsonld`` ,,,,,,,,,,,,,,,,,,,,, - .. code-block:: bash +.. code-block:: bash - $ bkbit specimen2jsonld [OPTIONS] NHASH_ID_OR_FILE + $ bkbit specimen2jsonld [OPTIONS] NHASH_ID_OR_FILE -Options -,,,,,,,, +**Options** ``-d, --decendants`` A boolean flag that, when provided, generates BICAN objects for the given NHASH_ID and all of its descendants. If this flag is not set (DEFAULT), then the ancestors will be processed. -Arguments -,,,,,,,, +**Arguments** ``NHASH_ID_OR_FILE`` The NHASH_ID of the specimen or a file containing a list of NHASH_IDs. If a file is provided, the file should contain one NHASH_ID per line. + Environment Variables ............. @@ -174,8 +122,3 @@ Example 4: Parse a file containing record(s) and their respective descendants DO-WFFF3774.jsonld DO-RMRL6873.jsonld -Structured Anatomical Data ----------------------------- - - -