Deploying Sources

As of Extractor Studio CLI version 1 and higher, Extractor Developers can now deploy Extractors directly to Sources on Workbench.


Prior to deploying Sources from an Extractor library you must have the following:

User Token

Before running any of the CLI commands beginning with source:deploy you must first have a Workbench User Token configured on your CLI.

Once you have your token configure it by running.

import-io config

Org Config

Before you can deploy any Sources to a Collection, the Collection and its ID must first be registered in the Org’s config.yaml file under the Org’s directory within the Extractor library.

In the example below, the Collection with the ID 49ba21e0-5cc5-43f4-8f7d-40420907ffcf is configured with the alias product_details for the Org. product_details does not have to be the Collection slug. You could call it foo if you’d like.

legacyAccountId: 8d17b99a-9f09-4e50-9e3c-175d24cfcb1f
  product_details: 49ba21e0-5cc5-43f4-8f7d-40420907ffcf

Create the Source

All of the import-io source:deploy commands rely on deploying to existing Sources on Workbench. The commands will not create a Source on Workbench for you, it will instead fail if the --source slug you provided does not exist.

Creating a Source on Workbench does not require an Extractor ID be provided. You may create a Source with this field empty if you need to.


All of the following examples use the import-io source:deploy command which is defined as:

import-io source:deploy --org <org slug> --source <source slug> --collection <collection slug> --prefix <prefix> --robot <robot>

Before going through the examples it’s important to know all of the flags that are available on this command and understand what they do:

  • --prefix (-p): Directory prefix for the extractor.yaml files you wish to deploy

    • Does a glob pattern match with the prefix provided to search for all extractor.yaml files

    • For each extractor.yaml file found, build and deploys based off of other flags if provided (collection and source slugs)

    • Multiple prefixes can be provided

    • If no prefix provided, defaults to "." or "all"

  • --org (-o): Org you wish to deploy to

    • If no Org provided, will deploy to all

    • Is required when --collection is specified

  • --collection: Collection you wish to deploy to

    • If not provided, defaults to all Collections in each extractor.yaml

    • If provided, an --org flag must be provided

  • --source (-s): Source slug to deploy to

    • If not provided, deploys to all Sources an extractor.yaml is deployed to (within a collection)

      • For Example, maybe the same Extractor is deployed to 3 Sources within a Collection. Not providing a --source flag would deploy a new extractor version to all 3 sources.

  • --robot (-r): Robot filter

    • If an extractor.yaml’s `robot does not implement the --robot filter, it ignores the build and deployment.

    • Helps when applying a patch or bug fix to all sources that implement a robot.

    • Allows a broader prefix to be used

Deploying a Source

Deploying a single source can be achieved with a command similar to

import-io source:deploy --org <org slug> --source <source slug> --collection <collection slug> --prefix <prefix>

The following example will deploy the Source with the "slug" of store_1 to the "products" Collection, and will search in the directory prefix of products/store_1 for the Organization my_org:

import-io source:deploy --org my_org --source store_1 --collection products --prefix products/store_1

As the command runs, it will first build the assets and cache them in the dist directory of the library (which should be gitignored). After the assets are built it will deploy to the Source.

As a result the orgs/my_org/products/store_1/extractor.yaml file will look something like

robot: product/details
      - store_1

The source slug, store_1 is saved in the extractor.yaml. This helps us keep track of which Sources an extractor is deployed to

Deploying all Sources for a Collection

IMPORTANT The Extractor Library will not magically know which extractor.yaml files are deployed to which Sources on Workbench. The extractor.yaml file must first contain a reference to this Collection/Source before it knows it should try deploying there. Register the source by editing the extractor.yaml file directly or by running the command with the --source flag as shown above.

In the above example we deployed a single source to a Collection. What if we had the same extractor deployed to multiple sources within that Collection? Maybe you want two separate sources with different Source Parameters that dictate different behavior?

Luckily we can do that, take the example extractor.yaml file:

robot: product/details
      - store_1
      - store_2

This extractor is deployed to two sources within the same collection, we can deploy to both of them with

import-io source:deploy --org my_org --collection products --prefix products/store_1

This is nearly the same as the first example, except we dropped the --source flag. The absence of this flag indicates that we want to deploy to all sources within the Collection specified.

We could take this a step further and remove the --prefix flag

import-io source:deploy --org my_org --collection product_details

This will deploy all extractors that are tied to Sources within the product_details Collection for the my_org Organization

Deploying all Sources that use this Robot

What if we had a bug in a Robot or change in behavior that needed to be applied to all of our Extractors right away? We can do that easily by providing the --robot flag

import-io source:deploy --robot product/details

Though this might take a while in a large library, this will deploy all Extractors to any Sources registered in their extractor.yaml that implement the product/details robot.

The --robot flag can be added to any combination of the source:deploy command to further filter what extractors should get deployed.