We appreciate your contribution to this amazing project! Any form of engagement is welcome, including but not limiting to
- feature request
- documentation wording
- bug report
- roadmap suggestion
- ...and so on!
Please refer to the community contributing section for more details.
Before delving into the details to come up with your first PR, please familiarize yourself with the project structure of 🔮 Instill Core.
In 💧 Instill VDP, a pipeline is a DAG (Directed Acyclic Graph) consisting of multiple components.
flowchart LR
s[Trigger] --> c1[OpenAI Component]
c1 --> c2[Stability AI Component]
c1 --> c3[MySQL Component]
c1 --> e[Response]
c2 --> e
There are different types of component: Generic, AI, Data, Application, and Operator.
Note:
- For AI, Data, Application components, they are used by the pipeline to interact with an external service, you may need to introduce its setup details in the component
setup
properties.
- In order to prevent private keys from being unintentionally leaked when sharing a pipeline, the
setup
properties only take reference to a secret (e.g.${secrets.my-secret}
).- You can create secrets from the console settings or through an API call.
Generic components serve as the foundational elements that support other components within a pipeline to execute complex or combined tasks. For instance, an Iterator component processes each element of an array by applying an operation determined by a collection of nested components.
AI components play a crucial role in transforming unstructured data into
formats that are easy to interpret and analyze, thereby facilitating the
extraction of valuable insights. These components integrate with AI models from
various providers, whether it's the primary Instill Model or those from
third-party AI vendors. They are defined and initialized in the ai
package.
Data components play a crucial role in establishing connections with remote
data sources, such as IoT devices (e.g., IP cameras), cloud storage services
(e.g., GCP Cloud Storage, AWS S3), data warehouses, or vector databases (e.g.,
Pinecone). These connectors act as the bridge between VDP and various external
data sources. Their primary function is to enable seamless data exchange,
enhancing Instill VDP's capability to work with diverse data sources
effectively. They are defined and initialized in the data
package.
Application components are used to seamlessly integrate various 3rd-party
application services. They are defined and initialized in the
application
package.
Operator components perform data transformations inside the pipeline. They
are defined and initialized in the operator
package.
A pipeline recipe specifies how components are configured and how they are interconnected.
Recipes are represented by a JSON object:
{
"component": {
"<component-id>": {
"type": "<component-definition-id>",
"task": "<task>",
"input": {
// values for the input fields
},
"condition": "<condition>", // conditional statement to execute or bypass the component
"setup": {
// setup specification values, optional
}
}
},
"variable": {
// pipeline input fields
},
"output": {
// pipeline output fields
}
}
You can see an example recipe in the component development guide
sequenceDiagram
participant u as User
participant gw as api-gateway
participant p as pipeline-backend
participant db as pipeline-db
u ->> gw: POST /users/<user>/pipelines
gw ->> p: forward
p ->> db: Store pipeline and its recipe
When a pipeline is triggered, the DAG will be computed in order to execute components in topological order.
sequenceDiagram
participant u as User
participant gw as api-gateway
participant p as pipeline-backend
participant db as pipeline-db
participant c as component
u ->> gw: POST /users/<user>/pipelines/<pipeline-id>/trigger
gw ->> p: forward
p ->> db: Get recipe
db ->> p: Recipe
loop over topological order of components
p->>c: ExecuteWithValidation
end
This section will guide you through the steps to contribute with a new
component. You'll add and test an operator that takes a string target
as input
and returns a "Hello, ${target}!"
string as the component output
In order to add a new component, you need to:
- Define the component configuration. This will determine the tasks that can be performed by the component and their input and output parameters. The console frontend will use the configuration files to render the component in the pipeline editor page.
- Implement the component interfaces so
pipeline-backend
can execute the component without knowing its implementation details. - Initialize the component, i.e., include the implementation of the component
interfaces as a dependency in the
pipeline-backend
execution.
Start by cloning this repository:
$ git clone https://github.com/instill-ai/component
Although all the development will be done in this repository, if you want to see your component in action, you'll need to build VDP locally. First, launch the latest version of 🔮 Instill Core suite. Then, build and launch 💧 Instill VDP backend with your local changes.
If you want to know more, you can refer to the documentation in these repositories, which explains in detail how to set up the development environment. In short, here's what we'll need to do for this guide:
$ git clone https://github.com/instill-ai/instill-core && cd instill-core
$ make latest PROFILE=all
$ git clone https://github.com/instill-ai/pipeline-backend && cd pipeline-backend
$ make build
component
is a dependency in pipeline-backend
so, in order to take your
changes into account, you need reference them.
$ go mod edit -replace="github.com/instill-ai/component=../component"
Then, mount the component
directory when running the pipeline-backend
container. Add the -v $(PWD)/../component:/component
option to make dev
in
the Makefile:
dev: ## Run dev container
@docker compose ls -q | grep -q "instill-core" && true || \
(echo "Error: Run \"make latest PROFILE=pipeline\" in vdp repository (https://github.com/instill-ai/instill-core) in your local machine first." && exit 1)
@docker inspect --type container ${SERVICE_NAME} >/dev/null 2>&1 && echo "A container named ${SERVICE_NAME} is already running." || \
echo "Run dev container ${SERVICE_NAME}. To stop it, run \"make stop\"."
@docker run -d --rm \
-v $(PWD):/${SERVICE_NAME} \
-v $(PWD)/../component:/component \
-p ${SERVICE_PORT}:${SERVICE_PORT} \
--network instill-network \
--name ${SERVICE_NAME} \
instill/${SERVICE_NAME}:dev >/dev/null 2>&1
2 processes must know about the new component: main
and worker
. You'll need
to stop their 🔮 Instill Core version before running the local one.
$ docker rm -f pipeline-backend pipeline-backend-worker
$ make dev
$ docker exec -d pipeline-backend go run ./cmd/worker # run without -d in a separate terminal if you want to access the logs
$ docker exec pipeline-backend go run ./cmd/main
$ cd $WORKSPACE/component
$ mkdir -p operator/hello/v0 && cd $_
Components are isolated in their own packages under ai
, data
, application
or operator
. The package is versioned so, in case a breaking change needs to
be introduced (e.g. supporting a new major version in a vendor API), existing
pipelines using the previous version of the component can keep being triggered.
At the end of this guide, this will be the structure of the package:
operator/hello/v0
├──.compogen
│ └──extra-bottom.mdx
├──assets
│ └──hello.svg
├──config
│ ├──definition.json
│ └──tasks.json
├──main.go
├──operator_test.go
└──README.mdx
Create a config
directory and add the files definition.json
, tasks.json
,
and setup.json
(optional). Together, these files define the behavior of the
component.
The definition.json
file describes the high-level information of the
component.
{
"id": "hello",
"uid": "e05d3d71-779c-45f8-904d-e90a050ca3b2",
"title": "Hello",
"description": "'Hello, world' operator used as a template for adding components",
"spec": {},
"availableTasks": [
"TASK_GREET"
],
"documentationUrl": "https://www.instill.tech/docs/component/operator/hello",
"icon": "assets/hello.svg",
"version": "0.1.0",
"sourceUrl": "https://github.com/instill-ai/component/blob/main/operator/hello/v0",
"releaseStage": "RELEASE_STAGE_ALPHA",
"public": true
}
This file defines the component properties:
id
is the ID of the component. It must be unique.uid
is a UUID string that must not be already taken by another component. Once it is set, it must not change.title
: is the display title of the component.description
: is a short sentence describing the purpose of the component. It should be written in imperative tense.spec
contains the parameters required to configure the component and that are independent from its tasks. E.g., the API token of a vendor. In general, only AI, data or application components need such parameters.availableTasks
defines the tasks the component can perform.- When a component is created in a pipeline, one of the tasks has to be selected, i.e., a configured component can only execute one task.
- Task configurations are defined in
tasks.json
.
documentationUrl
points to the official documentation of the component.icon
is the local path to the icon that will be displayed in the console when creating the component. If left blank, a placeholder icon will be shown.version
must be a SemVer string. It is encouraged to keep a tidy version history.sourceUrl
points to the codebase that implements the component. This will be used by the documentation generation tool and also will be part of the component definition list endpoint.releaseStage
describes the release stage of the component. Unimplemented stages (RELEASE_STAGE_COMING_SOON
orRELEASE_STAGE_OPEN_FOR_CONTRIBUTION
) will hide the component from the console (i.e. they can't be used in pipelines) but they will appear in the component definition list endpoint.public
indicates whether the component is visible to the public.
The tasks.json
file describes the task details of the component. The key
should be in the format TASK_NAME
.
{
"TASK_GREET": {
"instillShortDescription": "Greet someone / something",
"title": "Greet",
"input": {
"description": "Input",
"instillUIOrder": 0,
"properties": {
"target": {
"instillUIOrder": 0,
"description": "The target of the greeting",
"instillAcceptFormats": [
"string"
],
"instillUpstreamTypes": [
"value",
"reference",
"template"
],
"instillUIMultiline": true,
"title": "Greeting target",
"type": "string"
}
},
"required": [
"target"
],
"title": "Input",
"type": "object"
},
"output": {
"description": "The greeting sentence",
"instillUIOrder": 0,
"properties": {
"greeting": {
"description": "A greeting sentence addressed to the target",
"instillUIOrder": 0,
"required": [],
"title": "Greeting",
"type": "string",
"instillFormat": "string"
}
},
"required": [
"greeting"
],
"title": "Output",
"type": "object"
}
}
}
This file defines the input and output schema of each task:
Properties within a Task
title
is used by the console to provide the title of the task in the component.description
andinstillShortDescription
are used by the console to provide a description of the task in the component. IfinstillShortDescription
does not exist, it will be the same asdescription
.input
is a JSON Schema that describes the input of the task.output
is a JSON Schema that describes the output of the task.
Properties within input
and output
Objects
required
indicates whether the property is required.type
: describes the JSON type of this field, which could beinteger
,number
,boolean
,string
,array
, orobject
.title
is used by the console to provide the title of the property in the component.description
is used by the console to provide information about this task in the component.instillShortDescription
: is a concise version ofdescription
, used to fit smaller spaces such as a component form field. If this value is empty, thedescription
value will be used.instillUIOrder
defines the order in which the properties will be rendered in the component.instillUIMultiline
indicates whether the text field in the component is multiline.
Properties within input
Objects
instillEditOnNodeFields
determines whether this field will appear at the forefront of the component. Optional properties can be set in the advanced configuration.instillAcceptFormats
is an array indicating the data types of acceptable input fields. It should be an array of Instill Format.instillUpstreamTypes
defines how an input property can be set: as a direct value, a reference to another value in the pipeline, or a combination of both (e.g.,${variable.name}
ormy dear ${variable.name}
).instillSecret
indicates the data must reference the secrets and cannot be used in plaintext.
Properties within output
Objects
instillFormat
indicates the data type of the output field, which should be one ofnumber
,integer
,string
,object
,boolean
, or MIME type. Please refer to Instill Format for more details.
See the example recipe to understand how these fields map to the recipe of a pipeline when configured to use this operator.
For components that need to set up some configuration before execution, such as
the api-key
required by the component, setup.json
can be used to describe
these configurations. The format is the same as the input
objects in
tasks.json
.
Pipeline communicates with components through the IComponent
interface,
defined in the base
package. This package also defines base
implementations for these interfaces, so the hello
component will only need to
override the following methods:
CreateExecution(ComponentExecution) (IExecution, error)
will return an implementation of theIExecution
interface. A base execution implementation is passed in order to define only the behaviour of theExecute
method.Execute(context.Context []*structpb.Struct) ([]*structpb.Struct, error)
is the most important function in the component. All the data manipulation will take place here.
Paste the following code into a main.go
file in operator/hello/v0
:
package hello
import (
"fmt"
"sync"
_ "embed"
"go.uber.org/zap"
"google.golang.org/protobuf/types/known/structpb"
"github.com/instill-ai/component/base"
)
const (
taskGreet = "TASK_GREET"
)
var (
//go:embed config/definition.json
definitionJSON []byte
//go:embed config/tasks.json
tasksJSON []byte
once sync.Once
comp *component
)
type component struct {
base.Component
}
type execution struct {
base.ComponentExecution
}
// Init returns an implementation of IComponent that implements the greeting
// task.
func Init(bc base.Component) *component {
once.Do(func() {
comp = &component{Component: bc}
err := comp.LoadDefinition(definitionJSON, nil, tasksJSON, nil)
if err != nil {
panic(err)
}
})
return comp
}
func (c *component) CreateExecution(x base.ComponentExecution) (base.IExecution, error) {
e := &execution{ ComponentExecution: x }
if x.Task != taskGreet {
return nil, fmt.Errorf("unsupported task")
}
return e, nil
}
func (e *execution) Execute(ctx context.Context, jobs []*base.Job) error {
return nil
}
The hello
operator created in the previous section doesn't implement any
logic. This section will add the greeting logic to the Execute
method.
Let's modify the following methods:
type execution struct {
base.ComponentExecution
execute func(*structpb.Struct) (*structpb.Struct, error)
}
func (c *component) CreateExecution(x base.ComponentExecution) (base.IExecution, error) {
e := &execution{ ComponentExecution: x }
// A simple if statement would be enough in a component with a single task.
// If the number of task grows, here is where the execution task would be
// selected.
switch x.Task {
case taskGreet:
e.execute = e.greet
default:
return nil, fmt.Errorf("unsupported task")
}
return e, nil
}
func (e *execution) Execute(ctx context.Context, jobs []*base.Job) error {
// An execution might take several inputs. One result will be returned for
// each one of them, containing the execution output for that set of
// parameters.
for i, job := range jobs {
input, err := job.Input.Read(ctx)
if err != nil {
return err
}
output, err := e.execute(input)
if err != nil {
return err
}
err = job.Output.Write(ctx, output)
if err != nil {
return err
}
}
return nil
}
func (e *execution) greet(in *structpb.Struct) (*structpb.Struct, error) {
out := new(structpb.Struct)
target := in.Fields["target"].GetStringValue()
greeting := "Hello, " + target + "!"
out.Fields = map[string]*structpb.Value{
"greeting": structpb.NewStringValue(greeting),
}
return out, nil
}
The errmsg
package allows
us to attach messages to our errors.
func (e *execution) greet(in *structpb.Struct) (*structpb.Struct, error) {
out := new(structpb.Struct)
greetee := in.Fields["target"].GetStringValue()
if greetee == "Voldemort" {
return nil, errmsg.AddMessage(fmt.Errorf("invalid greetee"), "He-Who-Must-Not-Be-Named can't be greeted.")
}
greeting := "Hello, " + greetee + "!"
out.Fields = map[string]*structpb.Value{
"greeting": structpb.NewStringValue(greeting),
}
return out, nil
}
The middleware in pipeline-backend
will capture error messages in order to
to return a human-friendly errors to the API clients and console users.
Before initializing testing your component in 💧 Instill VDP, we can unit test its
behaviour. The following covers the newly added logic by replicating how the
pipeline-backend
workers execute the component logic:
package hello
import (
"context"
"testing"
"go.uber.org/zap"
"google.golang.org/protobuf/types/known/structpb"
qt "github.com/frankban/quicktest"
"github.com/instill-ai/component/base"
)
func TestOperator_Execute(t *testing.T) {
c := qt.New(t)
ctx := context.Background()
bc := base.Component{Logger: zap.NewNop()}
component := Init(bc)
c.Run("ok - greet", func(c *qt.C) {
exec, err := component.CreateExecution(base.ComponentExecution{
Component: component,
Task: taskGreet,
})
c.Assert(err, qt.IsNil)
pbIn, err := structpb.NewStruct(map[string]any{"target": "bolero-wombat"})
c.Assert(err, qt.IsNil)
ir, ow, eh, job := base.GenerateMockJob(c)
ir.ReadMock.Return(&pbIn, nil)
ow.WriteMock.Optional().Set(func(ctx context.Context, output *structpb.Struct) (err error) {
// Check JSON in the output string.
greeting := output.Fields["greeting"].GetStringValue()
c.Check(greeting, qt.Equals, "Hello, bolero-wombat!")
return nil
})
eh.ErrorMock.Optional()
err = execution.Execute(ctx, []*base.Job{job})
c.Assert(err, qt.IsNil)
})
}
func TestOperator_CreateExecution(t *testing.T) {
c := qt.New(t)
bc := base.Component{Logger: zap.NewNop()}
operator := Init(bc)
c.Run("nok - unsupported task", func(c *qt.C) {
task := "FOOBAR"
_, err := operator.CreateExecution(base.ComponentExecution{
Component: component,
Task: task,
})
c.Check(err, qt.ErrorMatches, "unsupported task")
})
}
The last step before being able to use the component in 💧 Instill VDP is loading the
hello
operator. This is done in the Init
function in
component.go
:
package operator
import (
// ...
"github.com/instill-ai/component/operator/hello/v0"
)
// ...
func Init(logger *zap.Logger) *Store {
baseComp := base.component{Logger: logger}
once.Do(func() {
store = &Store{
componentUIDMap: map[uuid.UUID]*component{},
componentIDMap: map[string]*component{},
}
// ...
store.Import(hello.Init(baseComp))
})
return store
}
Re-run your local pipeline-backend
build:
$ make stop && make dev
$ docker exec -d pipeline-backend go run ./cmd/worker # run without -d in a separate terminal if you want to access the logs
$ docker exec pipeline-backend go run ./cmd/main
Head to the console at http://localhost:3000/ (default password is password
)
and create a pipeline.
- In the trigger component, add a
who
text field. - Create a hello operator and reference the trigger input field by
adding
${trigger.who}
to thetarget
field. - In the response component, add a
greeting
output value that references the hello output by introducing${hello-0.output.greeting}
.
If you introduce a Wombat
string value in the trigger component and
Run the pipeline, you should see Hello, Wombat!
in the response.
The created pipeline will have the following recipe:
{
"variable": {
"who": {
"title": "Who",
"description": "Who should be greeted?",
"instillFormat": "string",
"instillUIOrder": 0,
"instillUIMultiline": false
}
},
"output": {
"greeting": {
"title": "Greeting",
"description": "",
"value": "${hello-0.output.greeting}",
"instillUIOrder": 0
}
},
"component": {
"hello-0": {
"type": "hello",
"task": "TASK_GREET",
"input": {
"target": "${variable.who}"
},
"condition": ""
}
}
}
Documentation helps user to integrate the component in their pipelines. A good
component definition will have clear names for their fields, which will also
contain useful descriptions. The information described in definition.json
and
tasks.json
is enough to understand how a component should be used. compogen
is a tool that parses the component configuration and builds a README.mdx
file
document displaying its information in a human-readable way. To generate the
document, just add the following line on top of operator/hello/v0/main.go
:
//go:generate compogen readme ./config ./README.mdx
Then, go to the base of the component
repository and run:
$ make build-doc && make gen-doc
The documentation of the component can be extended with the --extraContents
flag:
$ mkdir -p operator/hello/.compogen
$ echo '### Final words
Thank you for reading!' > operator/hello/.compogen/extra-bottom.mdx
//go:generate compogen readme ./config ./README.mdx --extraContents bottom=.compogen/extra-bottom.mdx
Check compogen
's README for more information.
The version of a component is useful to track its evolution and to set expectations about its stability. When the interface of a component (defined by its configuration files) changes, its version should change following the Semantic Versioning guidelines.
- Patch versions are intended for bug fixes.
- Minor versions are intended for backwards-compatible changes, e.g., a new task or a new input field with a default value.
- Major versions are intended for backwards-incompatible changes.
- At this point, since there might be pipelines using the previous version, a
new package MUST be created. E.g.,
operator/json/v0
->operator/json/v1
.
- At this point, since there might be pipelines using the previous version, a
new package MUST be created. E.g.,
- Build and pre-release labels are discouraged, as components are shipped as part of 💧 Instill VDP and they aren't likely to need such fine-grained version control.
It is recommended to start a component at v0.1.0
. A major version 0 is
intended for rapid development.
The releaseStage
property in definition.json
indicates the stability of a
component.
- A component skeleton (with only the minimal configuration files and a dummy implementation of the interfaces) may use the Coming Soon or Open For Contribution stages in order to communicate publicly about upcoming components. The major and minor versions in this case MUST be 0.
- Alpha pre-releases are used in initial implementations, intended to gather feedback and issues from early adopters. Breaking changes are acceptable at this stage.
- Beta pre-releases are intended for stable components that don't expect breaking changes.
- General availability indicates production readiness. A broad adoption of the beta version in production indicates the transition to GA is ready.
The typical version and release stage evolution of a component might look like this:
Version | Release Stage |
---|---|
0.1.0 | RELEASE_STAGE_ALPHA |
0.1.1 | RELEASE_STAGE_ALPHA |
0.1.2 | RELEASE_STAGE_ALPHA |
0.2.0 | RELEASE_STAGE_ALPHA |
0.2.1 | RELEASE_STAGE_ALPHA |
0.3.0 | RELEASE_STAGE_BETA |
0.3.1 | RELEASE_STAGE_BETA |
0.4.0 | RELEASE_STAGE_BETA |
1.0.0 | RELEASE_STAGE_GA |
Please take these general guidelines into consideration when you are sending a PR:
- Fork the Repository: Begin by forking the repository to your GitHub account.
- Create a New Branch: Create a new branch to house your work. Use a clear
and descriptive name, like
<your-github-username>/<what-your-pr-about>
. - Make and Commit Changes: Implement your changes and commit them. We
encourage you to follow these best practices for commits to ensure an
efficient review process:
- Adhere to the conventional commits guidelines for meaningful commit messages.
- Follow the 7 rules of commit messages for well-structured and informative commits.
- Rearrange commits to squash trivial changes together, if possible. Utilize git rebase for this purpose.
- Push to Your Branch: Push your branch to your GitHub repository:
git push origin feat/<your-feature-name>
. - Open a Pull Request: Initiate a pull request to our repository. Our team will review your changes and collaborate with you on any necessary refinements.
When you are ready to send a PR, we recommend you to first open a draft
one.
This will trigger a bunch of tests
workflows
running a thorough test suite on multiple platforms. After the tests are done
and passed, you can now mark the PR open
to notify the codebase owners to
review. We appreciate your endeavour to pass the integration test for your PR to
make sure the sanity with respect to the entire scope of 🔮 Instill Core.
Your contributions make a difference. Let's build something amazing together!