Flows are configurations for delivering data to customers where you can specify a Collections scheduling, data destination and Collection/push timing.

The Flow configuration points to a Collection but may also be filtered to certain sets of data based on the paramters the source was run with.

Listing Flows

Flows can be listed by clicking the "Flows" menu item on the project sidebar.

You can filter and sort your Flows.

Creating a Data Flow

You can create a Data Flow by clicking the plus icon in the toolbar of the Data Flow list page.

When settings up a Data Flow you set:

  • a name

  • Flow Slug / ID

  • Type (SIMPLE or LEGACY)

  • Collection to execute

  • Source Filter by parameter value (SIMPLE Flows only)

  • S3 Configuration / Chunks

  • a cron expression for when to start the collection (in UTC)

  • the number of hours within which to aim to collect the data

  • the number of hours within which to aim to push the data

  • the number of hours after which to consider the delivery closed

  • whether or not a flow is active

A data flow can consist of just a single collection.

Simple Flow

A Simple Flow is managed by Workbench, meaning that your sources will be run for you. If a schedule is tied to the Flow, then we will run all of the sources according to the cron schedule. Otherwise, you can manually run the flow from the Flow Home Page.

If you choose a Simple Flow, you will have the option to define an S3 bucket as the source of your inputs, which is similar to the S3 Configuration on a Destination. You also haves variables available that you can use for the parameters on the S3 buckets.

Each time you change the input URL, the inputs will be shuffled.

S3 Input and chunk configuration

flows S3
Available Template Variables for S3:
  • :org

  • :project

  • :source

  • :flow

Available date/time Variables for S3:
  • :Y

  • :YYYY

  • :M

  • :MM

  • :W

  • :WW

  • also :w and :ww for a week that starts on Sunday instead of Monday (ISO standard)

  • :D

  • :DD

  • :H

  • :HH

  • :m

  • :mm

  • :s

  • :ss


If you choose to chunk your snapshots, each source’s run will be divided up into the amount of chunks specified, and will be scheduled sequentially.

  • the number of chunks

  • the hours to collect the chunk

  • hours to push the chunk data

Source Filters

You can filter the Sources included in your Flow by filtering with Source Parameters.

flows source filter

Legacy Flow

A Legacy Flow defines the starting and ending of a delivery. All sources that are run during this time, will be included in the open delivery. Legacy Flows must be run on the Source Snapshots List page or from the legacy platform. All schedules are managed by the user or by the legacy application.

You can only have one open delivery at a time for a Legacy Flow. It cannot be cancelled.

new legacy flow
You cannot schedule a Flow to run more than once an hour.
It is on the roadmap to bring chaining into workbench as a particular type of data flow.

We create a delivery object in the app automatically when the data flow starts. This tracks the completion and quality of the delivery overall.

Currently the schedules need to be set within app.import.io. It is planned for the starting of data collection to be done within workbench based upon this configuration.

When snapshots are either started or imported, we automatically link the snapshot to an active delivery if there is one applicable.

We will NOT automatically push data unless it is part of a flow.

When the data flow should be closed, we move the delivery to a CLOSED state, and any pending data collection is cancelled.

You can mark a Data Flow as inactive to remove it from action.

Deleting Flows

Flows cannot currently be deleted.