-
Notifications
You must be signed in to change notification settings - Fork 3
Glossary
A group of users who have access to a distinct set of projects. Users in tenant A cannot see or access projects in tenant B.
A grouping of one or more pipelines, which presumably have a shared context or purpose.
A data definition in the form of an Avro schema. Entity types will sometimes describe things in the real world (e.g. a person, household or vaccine), but they are sometimes more arbitrary (e.g. entity types that are generated by creating an identity contract). An entity type can be used in multiple pipelines within the same project. It is also theoretically possible that entity types can be shared between projects or even tenants, but this is not currently possible (except by copying and pasting schemas).
A pipeline consists of an input data definition (in the form of an Avro schema) and zero or more contracts.
Data that enters a pipeline. Each pipeline has a fixed input data definition, in the form of an Avro schema. Input data consists of submissions.
A data object that is submitted to a predefined URL in order to become input for a pipeline. In order for a submission to become input data, it must conform to the pipeline’s input data definition.
One or more entity types plus some mapping rules that define how the input data definition is mapped onto the entity type(s).
The output of a contract: a data object that conforms to an entity type.
The process of running submissions through mapping rules to create entities.
A contract that produces entities that are identical with the input data.
A mapping rule defines the source of the datum that will be used to populate a single field in an entity type. This definition is usually a reference to a single field in the input data definition (hence the name: a rule that defines how input data is mapped onto entity types), but it can also be a function call (e.g. #!uuid
). Mapping rules are writing in JSONPath.
Internal term for a pipeline.
The URL to which submissions must be submitted in order to be input data for a pipeline. Either the submission URL or the submitted data must contain a mapping set ID.
Extra information that is added to an entity type to provide context within a project. Contains information such as whether or not fields are mandatory. Can also add an aetherMaskingLevel
to define privacy level of individual fields.
The producer takes every entity as it is created and inserts it into a Kafka topic that corresponds to its entity type.
Kafka is a software platform for handling real-time data feeds.
Data feeds in Kafka are separated into topics. The producer creates a separate topic for each project / entity type combination (i.e. for each schema decorator). Each topic will only contain entities of a certain entity type.
A consumer reads entities from Kafka and feeds them to an external destination (e.g. CKAN, Elasticsearch, a relational database)
A discrete set of instructions that tell a consumer what to do with data from a given topic (i.e. entities of a given entity type). The configuration might specify the location of the external destination, a “masking emit level” that defines the privacy level of the emitted data, or arbitrary filtering of entities based on the field values (e.g. only emit entities with the name “Brian”).