Skip to main content

List of crawlruns for an extractor

GET 

/extractors/:extractorId/crawlruns

List of crawlruns for an extractor

Request

Path Parameters

    extractorId uuidrequired

Query Parameters

    _apikey stringrequired

    You can find your API key in the Import.io dashboard under User Settings

    _perpage integer

    Number of items to return per page

    _page integer

    Page number to return (default is 1)

    _sort string

    Field name to sort by.

    To sort by created time, use _sort=meta_created_at.

    _sortDirection string

    Possible values: [ASC, DESC]

    Sort order.

    Default is DESC.

Responses

List of crawlruns retrieved successfully

Schema

  • Array [

  • guid uuid

    Unique identifier for the crawl run.

    _meta

    object

    [internal] The metadata of the object

    timestamp int64required

    The timestamp of the last change to the object

    creationTimestamp int64

    The timestamp of the object's creation

    patchTimestamp int64

    The timestamp of the last patch to the object

    objectGuid uuid

    The identifier of the object

    lastEditorGuid uuid

    The identifier of the last editor of the object

    ownerGuid uuid

    The identifier of the owner of the object

    creatorGuid uuid

    The identifier of the creator of the object

    orgGuid uuid

    The identifier of the organization the object belongs to

    deleted boolean

    The flag indicating if the object is deleted

    runtimeConfigId uuid

    [internal] Identifier for the runtime configuration used during the crawl run.

    extractorId uuid

    Identifier of the extractor that was executed.

    policyId uuid

    [internal] Policy used by this run. If null, the default policy is used.

    previousRunId uuid

    Chained crawl run that this run is based on. If null, this is the first run.

    startedAt date-time

    Timestamp marking the start of the crawl run, in milliseconds since Unix epoch.

    stoppedAt date-time

    Timestamp marking the end of the crawl run, in milliseconds since Unix epoch.

    inProgress int32

    Total number of urls in progress.

    requeued int32

    Total number of urls requeued.

    totalUrlCount int32

    Total number of urls to process.

    successUrlCount int32

    Number of successful urls processed process so far.

    failedUrlCount int32

    Number of failed urls processed so far.

    rowCount int32

    Number of rows returned so far.

    screenCaptureCount int32

    Number of screen captures taken.

    htmlExtractionCount int32

    Number of html extractions extracted.

    deniedByRobotsCount int32deprecated

    Number of urls denied by robots.txt.

    redactedRowCount int32deprecated

    Number of rows with redacted PII.

    queryCount int32

    Number of queries we used to process this run.

    proxyUsage int64

    Amount of data in bytes transferred via premium proxy during this run.

    state string

    Possible values: [PENDING, STARTED, CANCELLED, FINISHED, FAILED]

    State of this run, eg. STARTED, FINISHED, FAILED.

    urlListId uuid

    [internal] Id of Extractor urlList attachment when CrawlRun created

    inputsId uuid

    [internal] Id of Extractor inputs attachment when CrawlRun created

    triggerEvent string

    Possible values: [SCHEDULED, ADHOC]

    [internal] Event that triggered this run

    errorType string

    An enumerated error type for a failed run.

    errorMessage string

    Error message for a failed run.

    json uuid

    [internal] Attachment containing json of the rows.

    csv uuiddeprecated

    [internal] Attachment containing csv of the rows.

    xlsx uuiddeprecated

    [internal] Attachment containing xlsx of the rows.

    log uuiddeprecated

    [internal] Attachment containing log of the url results processed.

    diffId uuid

    [internal] Attachment containing new-line delimited json detailing the a diff between this run and the previous.

    sample uuiddeprecated

    [internal] Attachment containing json array of sample of rows for previewing the data.

    files uuid

    [internal] Attachment containing a zip file that contains assets downloaded as part of the crawl.

    downloadSummary

    object

    A summary of the downloaded assets for this crawl.

    totalFiles int64

    The number of files downloaded

    totalSizeBytes int64

    The total size of the files downloaded in bytes

    crawlRunInputsId uuid

    [internal] An attachment for the line separated JSON of inputs for this crawl run. Utilized in the API case, where inputs change run to run.

    crawlRunUrlListId uuid

    [internal] An attachment for line separated list of urls for this crawl run. Utilized in the API case, where url lists change run to run.

    crawlRunStats uuid

    [internal] An attachment for quality metrics for the crawl run.

    webhooks

    object[]

    Webhooks to call when run completes, overrides any extractor settings.

  • Array [

  • url stringrequired

    The url of the webhook

    headers

    object

    The headers to use for webhook notifications

    property name* object

    The headers to use for webhook notifications

    payload string

    The pre-configured payload to send with each notification

  • ]

  • jsonFile string

    [internal]

    csvFile string

    [internal]

    xlsxFile string

    [internal]

    logFile string

    [internal]

    sampleFile string

    [internal]

    archive string

    [internal]

    inputUri string

    [internal]

    notificationUri string

    [internal]

  • ]

Loading...