"A arbitrarily complex chain of processes that manipulate data where the output data of one process becomes the input to the next."
"A process to take raw data and transform in a way that usable by the entire organization."
"Data Processing Pipeline is a collection of instructions to read, transform or write data that is designed to be executed by a data processing engine."
A data pipeline can have these characteristics:
- 1 or more data inputs.
- 1 or more data outputs.
- Optional filtering.
- Optional transformation, including schema changes (adding or removing fields) and transforming the format.
- Optional aggregation, including group by, joins, and statistics.
- Other robustness features.
Generate, rely on, store, or Maintain large amounts or multiple sources of data.
Require real-time or highly sophisticated data analysis.
Store data in the cloud.
Most of the companies you interface with on a daily basis — and probably your own — would benefit from a data pipeline.
Backlinks:
list from [[Data Pipelines]] AND -"Changelog"