Skip to content

Pipelines

Pipelines are responsible for the whole text generation process, from uploading data, through generating text, to delivering the results.

Each pipeline is completely isolated from others. You can have completely different data, rules, and results in each pipeline, but you can never reference data or blueprints from another pipeline.

A pipeline consists of:

Data Pools

Data pools help you organize different types of data that should not be mixed. If you only have one type of data, for example products, you only need one data pool. Learn more

Each data pool has its own upload storage, preprocessor and objects storage.

Uploads

The first stop of your (raw) data. Learn more

Preprocessor

Clean and transform your raw uploads with a preprocessor. Learn more

Objects

The data you uploaded and preprocessed, in the form of objects. Learn more

Fanout

Control how each data object is rendered. Learn more

Blueprints

Define rules for text generation. Learn more

Results

See the status of generation and its results. Learn more

TODO API, Settings

Studio vs Cockpit

While pipelines seem similar to projects in Cockpit, they are much more powerful and don't require you to think about collections or worry about what data to upload exactly. In Studio, you should end up with way fewer pipelines than you would have had projects in Cockpit.

Data Upload Flow

The data flow in a pipeline is as follows:

  1. Uploads: You upload your data to the pipeline. This data is stored in the uploads storage.
  2. Preprocessor: The preprocessor is run on each upload. It transforms the data into a format that is ready for text generation.
  3. Data: The preprocessed data is stored in the data storage. Each object in the data storage represents a single unit of data that can be used for text generation.
  4. If you have autogenerate enabled, each added or updated object automatically starts a text generation flow.

Text Generation Flow

If you either manually inititate text generation by pressing "Generate All", or if you have autogenerate enabled, the following steps are taken:

  1. Fanout: The fanout script is run for each object that has requested a render. The fanout script returns a list of render job definitions.
  2. Blueprints: For each render job, the specified blueprint is run with the specified language.
  3. Storage: The generated text is stored in the results storage.
  4. Delivery: The generated text is delivered to the user via webhook, if one is specified in the job config.

Through all these steps, you can monitor the progress and results in the results section of the pipeline.