Skip to content

Commit

Permalink
further documentation changes
Browse files Browse the repository at this point in the history
  • Loading branch information
glass-ships committed Apr 23, 2024
1 parent aafb61e commit 004332a
Show file tree
Hide file tree
Showing 4 changed files with 41 additions and 38 deletions.
2 changes: 1 addition & 1 deletion docs/Ingests/index.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
<sub>
(For CLI usage, see the [CLI commands](./CLI.md) page.)
(For CLI usage, see the [CLI commands](../Usage/CLI.md) page.)
</sub>

Koza is designed to process and transform existing data into a target csv/json/jsonl format.
Expand Down
62 changes: 33 additions & 29 deletions docs/Ingests/source_config.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,35 +4,39 @@ This YAML file sets properties for the ingest of a single file type from a withi

## Source Configuration Properties

| **Required properties** | |
| --------------------------- | --------------------------------------------------------------------------------------------------- |
| `name` | Name of the data ingest, as `<data source>_<type_of_ingest>`, <br/>ex. `hpoa_gene_to_disease` |
| `files` | List of files to process |
| | |
| **Optional properties** | |
| `file_archive` | Path to a file archive containing the file(s) to process <br/> Supported archive formats: zip, gzip |
| `format` | Format of the data file(s) (CSV or JSON) |
| `sssom_config` | Configures usage of SSSOM mapping files |
| `depends_on` | List of map config files to use |
| `metadata` | Metadata for the source, either a list of properties,<br/>or path to a `metadata.yaml` |
| `transform_code` | Path to a python file to transform the data |
| `transform_mode` | How to process the transform file |
| `global_table` | Path to a global translation table file |
| `local_table` | Path to a local translation table file |
| `field_type_map` | Dict of field names and their type (using the FieldType enum) |
| `filters` | List of filters to apply |
| `json_path` | Path within JSON object containing data to process |
| `required_properties` | List of properties that must be present in output (JSON only) |
| | |
| **CSV-Specific Properties** | |
| `delimiter` | Delimiter for csv files (**Required for CSV format**) |
| **Optional CSV Properties** | |
| `columns` | List of columns to include in output (CSV only) |
| `header` | Header row index for csv files |
| `header_delimiter` | Delimiter for header in csv files |
| `header_prefix` | Prefix for header in csv files |
| `comment_char` | Comment character for csv files |
| `skip_blank_lines` | Skip blank lines in csv files |
| **Required properties** | |
| --------------------------- | ------------------------------------------------------------------------------------------------------ |
| `name` | Name of the data ingest, as `<data source>_<type_of_ingest>`, <br/>ex. `hpoa_gene_to_disease` |
| `files` | List of files to process |
| | |
| `node_properties` | List of node properties to include in output |
| `edge_properties` | List of edge properties to include in output |
| **Note** | Either node or edge properties (or both) must be defined in the primary config yaml for your transform |
| | |
| **Optional properties** | |
| `file_archive` | Path to a file archive containing the file(s) to process <br/> Supported archive formats: zip, gzip |
| `format` | Format of the data file(s) (CSV or JSON) |
| `sssom_config` | Configures usage of SSSOM mapping files |
| `depends_on` | List of map config files to use |
| `metadata` | Metadata for the source, either a list of properties,<br/>or path to a `metadata.yaml` |
| `transform_code` | Path to a python file to transform the data |
| `transform_mode` | How to process the transform file |
| `global_table` | Path to a global translation table file |
| `local_table` | Path to a local translation table file |
| `field_type_map` | Dict of field names and their type (using the FieldType enum) |
| `filters` | List of filters to apply |
| `json_path` | Path within JSON object containing data to process |
| `required_properties` | List of properties that must be present in output (JSON only) |
| | |
| **CSV-Specific Properties** | |
| `delimiter` | Delimiter for csv files (**Required for CSV format**) |
| **Optional CSV Properties** | |
| `columns` | List of columns to include in output (CSV only) |
| `header` | Header row index for csv files |
| `header_delimiter` | Delimiter for header in csv files |
| `header_prefix` | Prefix for header in csv files |
| `comment_char` | Comment character for csv files |
| `skip_blank_lines` | Skip blank lines in csv files |

## Metadata Properties

Expand Down
2 changes: 1 addition & 1 deletion docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ See the [Ingests](./Ingests/index.md) page for information on how to configure i

Koza can be used as a Python library, or via the command line.
[CLI commands](./Usage/CLI.md) are available for validating and transforming data.
See the [API](./Usage/API.md) page for information on using Koza as a library.
See the [Module](./Usage/Module.md) page for information on using Koza as a library.

Koza also includes some examples to help you get started (see `koza/examples`).
### Basic Examples
Expand Down
13 changes: 6 additions & 7 deletions src/koza/model/config/source_config.py
Original file line number Diff line number Diff line change
Expand Up @@ -120,14 +120,14 @@ class DatasetDescription:
"""

# id: Optional[str] = None # Can uncomment when we have a standard
name: Optional[str] = None # If empty use source name
name: Optional[str] = None # If empty use source name
ingest_title: Optional[str] = None # Title of source of data, map to biolink name
ingest_url: Optional[str] = None # URL to source of data, maps to biolink iri
description: Optional[str] = None # Description of the data/ingest
ingest_url: Optional[str] = None # URL to source of data, maps to biolink iri
description: Optional[str] = None # Description of the data/ingest
# source: Optional[str] = None # Possibly replaced with provided_by
provided_by: Optional[str] = None # <data source>_<type_of_ingest>, ex. hpoa_gene_to_disease
provided_by: Optional[str] = None # <data source>_<type_of_ingest>, ex. hpoa_gene_to_disease
# license: Optional[str] = None # Possibly redundant, same as rights
rights: Optional[str] = None # License information for the data source
rights: Optional[str] = None # License information for the data source


@dataclass(config=PYDANTIC_CONFIG)
Expand Down Expand Up @@ -291,8 +291,7 @@ def __post_init__(self):
@dataclass(config=PYDANTIC_CONFIG)
class PrimaryFileConfig(SourceConfig):
"""
node_properties and edge_properties are used for configuring
the KGX writer
node_properties and edge_properties are used for configuring the KGX writer
"""

node_properties: Optional[List[str]] = None
Expand Down

0 comments on commit 004332a

Please sign in to comment.