Deploying Sources

As of Extractor Studio CLI version 2 and higher, Source Engineers can:

  • Deploy extractors directly to a source

  • Start crawl runs for a source (creates a snapshot)

New functionality will only be added to version 2 and higher.

Downloads:

Configure

In order to deploy your source, you will need to complete the following prerequisites:

User Token

Before running any of the CLI commands beginning with source:deploy, your DOC User Token will need to be configured locally.

Once you have your token configure it by running

import-io config

Create the Source

All of the import-io source:[action] commands rely on deploying to existing sources in DOC. The commands will not create a source for you, it will fail if the --source slug you provided does not exist.

Creating a source in DOC does not require an Extractor ID be provided. If you are deploying your extractor to a source for the first time, an extractor will be created, and its ID will be added to the specified source.

Deploying a Source

The import-io source:deploy command will:

  • Tag and push to the git repo (must be “origin” remote): deploy/${org}/${collectionSlug}/${sourceSlug}/${new Date().getTime()}/${path}

  • Update or create the runtime configuration

  • Update or create the policy

  • Update or create the extractor

  • Update the sample inputs linked to the extractor

import-io source:deploy --help
Deploy a source and update the sample inputs

USAGE
  $ import-io source:deploy

OPTIONS
  -c, --collection=collection  (required) collection to deploy to
  -e, --path=path              (required) path to extractor directory
  -h, --help                   show CLI help
  -o, --org=org                (required) org slug
  -p, --project=project        (required) project to deploy to
  -s, --source=source          (required) source slug

This command will fail if:

  • You don’t have the required permissions to deploy an extractor version

  • Your latest changes have not been committed to your current branch

Running a Source

In order to run an extractor, you need to run a source which will then run the underlying extractor. If --deploy is set to true (this is the default), the extractor will be updated before starting a run with the sample inputs. After a run is successfully started, a link to the newly created snapshot will be printed in your terminal.

import-io source:run --help
Run an extractor (creates a snapshot)

USAGE
  $ import-io source:run

OPTIONS
  -c, --collection=collection  (required) collection to deploy to
  -d, --deploy                 deploy before running, if false will run the currently saved extractor
  -e, --path=path              (required) path to extractor directory
  -h, --help                   show CLI help
  -o, --org=org                (required) org slug
  -p, --project=project        (required) project to deploy to
  -s, --source=source          (required) source slug
  -w, --wait                   whether or not to wait until the crawl run completes