You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks for the generator based approach of bonobo. However, we hit several limitations with bonobo, most of them I could circumvent them. But the most recent seems to me like an important feature i have not seen in any ETL implemented natively yet.
If this is a feature request, please make sure you explain the context, the goal, and why it is something that would go into bonobo core. Drafting some bits of spec is a good idea too, even if it's very draft-y.
We are processing json documents read from jsonlines files. These documents do have the following structure:
however we need to process the sub-documents from the items array/list.
Imagine we do have a node which adds the date to the sub-document and another node adding an id based on the full documents name field and the position in the array/list.
simply speaking we could and will do this:
for i, v in enumerate(doc["items"]):
doc["items"][i]["date"] = datetime.now()
doc["items"][i]["id"] = doc["name"] + str(i)
but actually it would be much more valuable if we could separate the responsibilities into different nodes
the ultimate goal would be to be able to loop through the nodes in a chain based on the number of items in the document.
Of course I am not talking about bonobo inspecting the data but offering a step-in step-out visitor pattern like approach to control looping (more generally controlling the flow of a chain/node from a different nodes point of view)
Hi, so a little feed back.
I have implemented this (at least for our needs, incompatible with the bonobo "library") in sync and async.
It only is a straight forward chain without any branching (but it actually could be nested).
The more I look into this the more I believe the basic approach of assuming some "graph" is too academic.
please have a look at gstreamer where they are using sources and sinks to redirect data flow.
In some instances (e.g. Grouping, Counting ... ) the sinks need to know when the last element has been sent so their adjacent source can emit the computed result.
Thanks for the generator based approach of bonobo. However, we hit several limitations with bonobo, most of them I could circumvent them. But the most recent seems to me like an important feature i have not seen in any ETL implemented natively yet.
however we need to process the sub-documents from the items array/list.
Imagine we do have a node which adds the date to the sub-document and another node adding an id based on the full documents name field and the position in the array/list.
simply speaking we could and will do this:
but actually it would be much more valuable if we could separate the responsibilities into different nodes
the ultimate goal would be to be able to loop through the nodes in a chain based on the number of items in the document.
Of course I am not talking about bonobo inspecting the data but offering a step-in step-out visitor pattern like approach to control looping (more generally controlling the flow of a chain/node from a different nodes point of view)
The text was updated successfully, but these errors were encountered: