tree2tabular

===============

Purpose:

Convert a tree structure to a tabular format.
Wrapper around treelib with a few additional features geared towards easier data modelling for analytics engineers.

Abstract:

Tree structures are often used to represent hierarchical data. However, they are not always easy to work with. This script converts a tree structure from a yaml file to a tabular format.
The script can also be used to generate unique ids for each node in the tree.
It reads from a yaml file and writes to a csv file or updated yml file with ids.

Usage:

The yaml file should have the structure described below:

Hierarchy: #Always start the yaml file with this line``
    name: category # This is the name of the dimension used as column for the tabular data
    id_generation: uuid # Possible values for generating node ids: 'name' -> use name, 'incremental' --> generate integers, 'uuid'->generate uuid ,'error'->throw error if no id provided
    # Keep this structure for the nodes
    childs:
        - name: subcategory
          childs:
              - name: subsubcategory
                childs:
                    - name: subsubsubcategory
        - name: subcategory2
          childs:
              - name: subsubcategory2

in python write:

from tree2tabular import TreeBuilder
fn = 'my_tree.yaml'
tree = TreeBuilder.from_yaml(fn)
tree.to_csv('my_tree.csv')

output: automatically generated ids and tree in tabular structure:

TXT_CATEGORY_LVL1	TXT_CATEGORY_LVL2	TXT_CATEGORY_LVL3	DIM_CATEGORY_LVL1	DIM_CATEGORY_LVL2	DIM_CATEGORY_LVL3
subcategory	subsubcategory	subsubsubcategory	7690c4	163eed	6d0573
subcategory2	subsubcategory2	subsubcategory2	3860c7	e7921e	e7921e

Others methods

Re-use as a dataframe:

df = tree.to_dataframe()

Export a new yaml file with ids:

tree.to_yaml('my_tree_with_ids.yaml')

Export a parent-child table:

df = tree.to_parent_child(use_names=True)

output:

parent	child
Grandma	Mom
Mom	Son
Mom	Daughter

Usage requirements:

Yaml structure

Always start with the keyword 'Hierarchy'
Provide under Hierarchy the following parameters: name, id_generation, childs
Each node can have three properties: name, id, childs
- The childs property is a list of nodes, can be null if no child nodes
- The id property is optional, if not provided, it will be generated based on the id_generation parameter. The 0 value is reserved for the root node.
- The name property is mandatory

name parameter

At the top of the hierarchy: is the name of the dimension used as column for the tabular data
- e.g. name: category will generate the columns DIM_CATEGORY_LVL1, DIM_CATEGORY_LVL2, etc.
Inside the hierarchy: is the name of the node

id_generation parameter

uuid : generate a unique id for each node if no id provided
name : use the name of the node as id if no id provided
error : raise an error if no id provided
incremental : generate an incremental id for each node if no id provided, expects only integer value

Output structure

Creation of column names using the `name` parameter

The output is a tabular structure with the following columns: DIM_CATEGORY_LVL1, DIM_CATEGORY_LVL2, etc. and TXT_CATEGORY_LVL1, TXT_CATEGORY_LVL2, etc.
The DIM_ columns contain the ids of the nodes.
The TXT_ columns contain the names of the nodes.

The higher the level, the deeper the node

The level 1 corresponds to the top of the hierarchy, the highest the level, the deeper the node is in the hierarchy
The primary key of the table is the DIM_ columns with the highest level, and the highest granularity.
There is no blank: if a node has no child, the DIM_ and TXT_ columns of the lowest level are filled with the id and name of the node

Example

You can find examples in the tests > demos* folders.

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
.github/workflows		.github/workflows
tests		tests
tree2tabular		tree2tabular
Readme.md		Readme.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.cfg		setup.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

tree2tabular

===============

Purpose:

Abstract:

Usage:

Others methods

Re-use as a dataframe:

Export a new yaml file with ids:

Export a parent-child table:

Usage requirements:

Yaml structure

name parameter

id_generation parameter

Output structure

Creation of column names using the `name` parameter

The higher the level, the deeper the node

Example

About

Releases 3

Packages

Contributors 2

Languages

ogierpaul/tree2tabular

Folders and files

Latest commit

History

Repository files navigation

tree2tabular

===============

Purpose:

Abstract:

Usage:

Others methods

Re-use as a dataframe:

Export a new yaml file with ids:

Export a parent-child table:

Usage requirements:

Yaml structure

name parameter

id_generation parameter

Output structure

Creation of column names using the name parameter

The higher the level, the deeper the node

Example

About

Topics

Resources

Stars

Watchers

Forks

Releases 3

Packages 0

Contributors 2

Languages

Creation of column names using the `name` parameter

Packages