Skip to content

Commit

Permalink
added more documentation as well as reformatted the files
Browse files Browse the repository at this point in the history
  • Loading branch information
puja-trivedi committed Oct 4, 2024
1 parent e18846e commit 02c6d57
Show file tree
Hide file tree
Showing 4 changed files with 106 additions and 84 deletions.
75 changes: 75 additions & 0 deletions docs/genome_annotation.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
.. _genome_annotation:

Annotated Genome Data
----------------------

Overview
.........

Generate JSON-LD files for annotated genes from a given GFF3 file. Currently GFF3 files from ENSEMBL and NCBI are supported.

Each JSON-LD file will contain:

- GeneAnnotation objects
- 1 GenomeAnnotation object
- 1 GenomeAssembly object
- 1 OrganismTaxon object
- 1 Checksum object

Command Line
.............

``bkbit gff2jsonld``
,,,,,,,,,,,,,,,,,,,,,

.. code-block:: bash
$ bkbit gff2jsonld [OPTIONS] GFF3_URL
Options
,,,,,,,,

``-a, --assembly_accession``
ID assigned to the genomic assembly used in the GFF3 file.
**Note: Must be provided when using ENSEMBL GFF3 files**

``-s, --assembly_strain``
Specific strain of the organism associated with the GFF3 file.

``-l, --log_level``
Logging level.

Default:
WARNING
Options:
DEBUG | INFO | WARNING | ERROR | CRITICIAL

``-f, --log_to_file``
Log to a file instead of the console.

Default:
FALSE

Arguments
,,,,,,,,,,,

``GFF3_URL``
URL to the GFF3 file.

Examples
.........

Example 1: NCBI GFF3 file
,,,,,,,,,,,,,,,,,,,,,,,,,,

.. code-block:: bash
$ bkbit gff2jsonld 'https://ftp.ncbi.nlm.nih.gov/genomes/all/annotation_releases/9823/106/GCF_000003025.6_Sscrofa11.1/GCF_000003025.6_Sscrofa11.1_genomic.gff.gz' > output.jsonld
Example 2: ENSEMBL GFF3 file
,,,,,,,,,,,,,,,,,,,,,,,,,,,,,

.. code-block:: bash
$ bkbit gff2jsonld -a 'GCF_003339765.1' 'https://ftp.ensembl.org/pub/release-104/gff3/macaca_mulatta/Macaca_mulatta.Mmul_10.104.gff3.gz' > output.jsonld
12 changes: 6 additions & 6 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -14,22 +14,22 @@ This package contains tools to use the BICAN Knowledgebase Data Models.
install

.. toctree::
:maxdepth: 2
:caption: USAGE
:maxdepth: 4
:caption: DATA TRANSLATORS

data_translators
specimen_file_manifest
specimen_metadata
genome_annotation

.. toctree::
:maxdepth: 1
:caption: REFERENCE

modules



Indices and tables
==================

* :ref:`genindex`
* :ref:`modindex`
* :ref:`search`
.. * :ref:`search`
4 changes: 4 additions & 0 deletions docs/specimen_file_manifest.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
.. _specimen_file_manifest:

Specimen File Manifest
----------------------
99 changes: 21 additions & 78 deletions docs/data_translators.rst → docs/specimen_metadata.rst
Original file line number Diff line number Diff line change
@@ -1,107 +1,55 @@
.. _datatranslators:
.. _specimen_metadata:

Data Translators
======

Annotated Genome Data
Specimen Metadata
----------------------
Generate JSON-LD files for annotated genes from a given GFF3 file. Currently GFF3 files from ENSEMBL and NCBI are supported.

Each JSON-LD file will contain:
Overview
.........

- GeneAnnotation objects
- 1 GenomeAnnotation object
- 1 GenomeAssembly object
- 1 OrganismTaxon object
- 1 Checksum object
Generate JSON-LD files for specimens, subjects, and their repective ancestors or descendants. Data is retrieved from the `BICAN Specimen Portal <https://brain-specimenportal.org/>`_.

Command Line
.............

``bkbit gff2jsonld``
``bkbit specimen2jsonld``
,,,,,,,,,,,,,,,,,,,,,

.. code-block:: bash
$ bkbit gff2jsonld [OPTIONS] GFF3_URL
Options
,,,,,,,,

``-a, --assembly_accession``
ID assigned to the genomic assembly used in the GFF3 file.
**Note: Must be provided when using ENSEMBL GFF3 files**

``-s, --assembly_strain``
Specific strain of the organism associated with the GFF3 file.

``-l, --log_level``
Logging level.

Default:
WARNING
Options:
DEBUG | INFO | WARNING | ERROR | CRITICIAL

``-f, --log_to_file``
Log to a file instead of the console.

Default:
FALSE

Arguments
,,,,,,,,

``GFF3_URL``
URL to the GFF3 file.

Examples
.........

Example 1: NCBI GFF3 file
,,,,,,,,,,,,,,,,,,,,,,,,,,

.. code-block:: bash
$ bkbit gff2jsonld 'https://ftp.ncbi.nlm.nih.gov/genomes/all/annotation_releases/9823/106/GCF_000003025.6_Sscrofa11.1/GCF_000003025.6_Sscrofa11.1_genomic.gff.gz' > output.jsonld
$ bkbit specimen2jsonld [OPTIONS] NHASH_ID_OR_FILE
Example 2: ENSEMBL GFF3 file
,,,,,,,,,,,,,,,,,,,,,,,,,,,,,

.. code-block:: bash
$ bkbit gff2jsonld -a 'GCF_003339765.1' 'https://ftp.ensembl.org/pub/release-104/gff3/macaca_mulatta/Macaca_mulatta.Mmul_10.104.gff3.gz' > output.jsonld
**Options**

``-d, --decendants``
A boolean flag that, when provided, generates BICAN objects for the given NHASH_ID and all of its descendants.
If this flag is not set (DEFAULT), then the ancestors will be processed.

Specimen Data
----------------------
Generate JSON-LD files for specimens, subjects, and their repective ancestors or descendants. Data is retrieved from the `BICAN Specimen Portal <https://brain-specimenportal.org/>`_.
**Arguments**

Command Line
.............
``NHASH_ID_OR_FILE``
The NHASH_ID of the specimen or a file containing a list of NHASH_IDs.
If a file is provided, the file should contain one NHASH_ID per line.

``bkbit specimen2jsonld``
``filemanifest2jsonld``
,,,,,,,,,,,,,,,,,,,,,

.. code-block:: bash
.. code-block:: bash
$ bkbit specimen2jsonld [OPTIONS] NHASH_ID_OR_FILE
$ bkbit specimen2jsonld [OPTIONS] NHASH_ID_OR_FILE
Options
,,,,,,,,
**Options**

``-d, --decendants``
A boolean flag that, when provided, generates BICAN objects for the given NHASH_ID and all of its descendants.
If this flag is not set (DEFAULT), then the ancestors will be processed.

Arguments
,,,,,,,,
**Arguments**

``NHASH_ID_OR_FILE``
The NHASH_ID of the specimen or a file containing a list of NHASH_IDs.
If a file is provided, the file should contain one NHASH_ID per line.


Environment Variables
.............

Expand Down Expand Up @@ -174,8 +122,3 @@ Example 4: Parse a file containing record(s) and their respective descendants
DO-WFFF3774.jsonld
DO-RMRL6873.jsonld
Structured Anatomical Data
----------------------------



0 comments on commit 02c6d57

Please sign in to comment.