Add properties: is_fork and parent #366

alextanski · 2019-07-16T08:20:12Z

While working on streamlining DP workflows I was looking for reliable ways to tell a dataset fork from its main/parent dataset (or the upstream dataset in Crunch terms) while also being able to return the latter one's id when working with it's fork.

I've already confirmed with Alessandro that we can use the dataset/{id}/actions/upstream_delta catalog (see https://docs.crunch.io/feature-guide/feature-versioning.html?highlight=fork), leading to two handy methods (should be properties if added to the scrunch dataset object):

To return the parent id if we are working on a fork:

def parent(dataset):
    """
    Return the parent's (upstream) dataset id if the passed dataset is a fork. 
    """
    try:
        upstream = dataset.resource.actions.upstream_delta
    except pycrunch.lemonpy.ClientError:
        return None
    return upstream.value.upstream.id

To simply test if we are working on a fork:

def is_fork(dataset):
    """
    Test if the current dataset is a fork.
    """
    return parent(dataset) is not None

Adding these two would help greatly. What do you think?

The text was updated successfully, but these errors were encountered:

alextanski added the enhancement label Jul 16, 2019

alextanski added this to the Wishlist milestone Jul 16, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add properties: is_fork and parent #366

Add properties: is_fork and parent #366

alextanski commented Jul 16, 2019

Add properties: is_fork and parent #366

Add properties: is_fork and parent #366

Comments

alextanski commented Jul 16, 2019