Manage Cell Routing in DOC

Overview

Crawl runs belonging to a source or flow in DOC are started using the Run API by default. In order to opt into cell routing, an extraction cell can be defined at the system level in DOC. An extraction cell captures the information required to construct the cell route endpoint ( run.{cell number}.{environment}.{availability zone}.import.internal ).

An extraction cell must be marked as active to be considered for use.

Use

A source or flow will use a defined extraction cell under two conditions:

  1. An Env Cell Route correlating to a source’s or flow’s project’s environment is present.

  2. A Project Cell Route is defined for a source’s or flow’s project. (Optionally, a domain can be specified for a project cell route that references the domain parameter on a source)

Permissions

In order to have a read-only view of extraction cells, env cell routes, and project cell routes; a user must have a SUPER_ADMIN user role. Users with the DEVELOPER role (should be limited to engineering) can create/edit all three.

Env Cell Routes

Env Cell Routes are defined at the system level and are associated with a project environment (DEV, STAGING, PRODUCTION). Only one env cell can be defined per environment, and they can be deleted via the API if necessary. Once an environment cell is created, any project (system-wide) with the same environment will use that extraction cell for any starting sources/flows. (These are lower ranking than project cell routes, meaning if a project cell route is defined, it will be considered over the env cell route.)

envcellroutemodal

Env Cell Routes can be viewed from the extraction cells page, by expanding an extraction cell. Env Cell Routes belonging to an inactive cell will be disregarded when starting a crawl run from DOC.

extractioncellshome

Project Cell Routes

Linking an extraction cell at the project level is the most specific way to link, and takes precedence over an env cell route. At the project level, a cell route can be defined for the whole project, or based on domain (takes precedence). Domain refers to a collection parameter that is defined on a source. For instance, I could route all doom.import.io sources using a project cell route, and all other sources would use whatever default routing is available. Leaving domain blank implies that all flows/sources for your project will use the defined project cell route.

newprojectcell

Cell Route Utilization

If a crawl run uses a particular cell route, it will be listed on a snapshot and captured on the snapshot’s crawl run body (see cell endpoint on Snapshot below)

snapshotcellroute