Skip to main content

Extractor information

GET 

/extractors/:extractorId

Extractor information

Request

Path Parameters

    extractorId stringrequired

Query Parameters

    _apikey stringrequired

    You can find your API key in the Import.io dashboard under User Settings

Responses

Extractor information retrieved successfully

Schema

    guid uuid

    Unique identifier for the extractor.

    _meta

    object

    [internal] The metadata of the object

    timestamp int64required

    The timestamp of the last change to the object

    creationTimestamp int64

    The timestamp of the object's creation

    patchTimestamp int64

    The timestamp of the last patch to the object

    objectGuid uuid

    The identifier of the object

    lastEditorGuid uuid

    The identifier of the last editor of the object

    ownerGuid uuid

    The identifier of the owner of the object

    creatorGuid uuid

    The identifier of the creator of the object

    orgGuid uuid

    The identifier of the organization the object belongs to

    deleted boolean

    The flag indicating if the object is deleted

    name string

    The extractor's name.

    tags string[]

    List of tags associated with the extractor to be used for starting many extractors at once.

    fields

    object[]

    List of data fields the extractor is configured to retrieve.

  • Array [

  • id uuid

    Unique identifier for the data field within the extractor configuration.

    name string

    Name of the data field targeted for extraction (e.g., 'Price', 'Image').

    type string

    Type of data this field represents (e.g., IMAGE, TEXT), guiding the extraction process.

    captureLink boolean

    Specifies if hyperlinks associated with this field should also be captured during extraction.

    downloadContent stringdeprecated

    Specifies if the extractor should download this field content.

  • ]

  • latestConfigId uuid

    The last saved config.

    policyId uuid

    The associated policy.

    notifyMe boolean

    A flag if the user is to be notified of new CrawlRuns.

    schemaId int32

    The schema used by this project.

    training uuid

    [internal] An attachment for the serialized JSON of the project's extraction state.

    urlList uuid

    [internal] An attachment for line separated list of urls for this extracto.r

    inputs uuid

    [internal] An attachment for the line separated JSON of inputs for this extractor.

    nextCrawlRunId uuid

    The id of the next crawl run, that has been submitted.

    archived boolean

    Flag to suppress display of extractor.

    parentExtractorGuid uuid

    Parent extractor Guid to generate URLs.

    parentReportGuid uuid

    Parent Report Guid to generate URLs.

    parentTriggered boolean

    If set, extractor should be run when the parent extractor finishes.

    urlColumnId uuid

    Url column Id.

    isChained boolean

    Is extractor chained.

    crawlRunDiffConfigId uuiddeprecated

    Crawl run diff config.

    createDiff boolean

    True or false of whether a diff report should be generated when this Extractor is run.

    createDataReport boolean

    If true, generate a data report when this Extractor is run.

    duplicateOfExtractorId uuid

    Is extractor duplicate.

    maxRowsCrawlCount int32

    Maximum number of rows that this extractor can crawl over, currently a chainsaw only feature.

    authUrl string

    Authentication url.

    residential booleandeprecated

    If true, crawl runs using this extractor should use a residential proxy pool.

    honorRobots booleandeprecated

    If true, crawl runs will honor rules in /robots.txt.

    iso3Country string

    An ISO 3 digit country code for the Proxy.

    proxyType string

    Proxy Pool Type (eg DATA_CENTER, RESIDENTIAL).

    webhooks

    object[]

    Webhooks to call when extractor completes.

  • Array [

  • url stringrequired

    The url of the webhook

    headers

    object

    The headers to use for webhook notifications

    property name* object

    The headers to use for webhook notifications

    payload string

    The pre-configured payload to send with each notification

  • ]

  • chainingConfig

    object[]

    Chain configuration for interaction extractors.

  • Array [

  • parentInputType stringrequired

    The type of the parent

    parentInputSubtype string[]

    a list of subtypes of the parent.

    parentInputGuid uuidrequired

    The id of the parent extractor or report

    parentTriggered boolean

    Is the extractor parent triggered

    mapping

    object

    required

    The mapping for the chaining source

    property name*

    SourceColumn

    The mapping for the chaining source

    type string

    Extractor field type of source column

    columnId uuid

    ColumnId of source

    value object

    Optional override value for source column

  • ]

  • addInteractiveInputsToOutput boolean

    If true and the extractor uses interactive inputs, the inputs will be copied to the output.

    metricsNotificationSettings

    object[]

    [internal] Notification Setting for Crawl Run Metrics.

  • Array [

  • value double
    operation string

    Possible values: [LESS_THAN, GREATER_THAN, EQUALS, LESS_THAN_OR_EQUALS, GREATER_THAN_OR_EQUALS]

    period string

    Possible values: [AVERAGE_30_DAYS, LAST_CRAWL_RUN]

  • ]

  • credentialsGuid uuid

    Pointer to encrypted credentials.

Loading...