List of crawlruns for an extractor

GET /extractors/:extractorId/crawlruns

Request

Path Parameters

extractorId uuidrequired

Query Parameters

_apikey stringrequired

You can find your API key in the Import.io dashboard under User Settings

_perpage integer

Number of items to return per page

_page integer

Page number to return (default is 1)

_sort string

Field name to sort by.

To sort by created time, use _sort=meta_created_at.

_sortDirection string

Possible values: [ASC, DESC]

Sort order.

Default is DESC.

Responses

List of crawlruns retrieved successfully

application/json

Schema
Example (from schema)

Schema

Array [

guid uuid

Unique identifier for the crawl run.

_meta

object

[internal] The metadata of the object

timestamp int64required

The timestamp of the last change to the object

creationTimestamp int64

The timestamp of the object's creation

patchTimestamp int64

The timestamp of the last patch to the object

objectGuid uuid

The identifier of the object

lastEditorGuid uuid

The identifier of the last editor of the object

ownerGuid uuid

The identifier of the owner of the object

creatorGuid uuid

The identifier of the creator of the object

orgGuid uuid

The identifier of the organization the object belongs to

deleted boolean

The flag indicating if the object is deleted

runtimeConfigId uuid

[internal] Identifier for the runtime configuration used during the crawl run.

extractorId uuid

Identifier of the extractor that was executed.

policyId uuid

[internal] Policy used by this run. If null, the default policy is used.

previousRunId uuid

Chained crawl run that this run is based on. If null, this is the first run.

startedAt date-time

Timestamp marking the start of the crawl run, in milliseconds since Unix epoch.

stoppedAt date-time

Timestamp marking the end of the crawl run, in milliseconds since Unix epoch.

inProgress int32

Total number of urls in progress.

requeued int32

Total number of urls requeued.

totalUrlCount int32

Total number of urls to process.

successUrlCount int32

Number of successful urls processed process so far.

failedUrlCount int32

Number of failed urls processed so far.

rowCount int32

Number of rows returned so far.

screenCaptureCount int32

Number of screen captures taken.

htmlExtractionCount int32

Number of html extractions extracted.

deniedByRobotsCount int32deprecated

Number of urls denied by robots.txt.

redactedRowCount int32deprecated

Number of rows with redacted PII.

queryCount int32

Number of queries we used to process this run.

proxyUsage int64

Amount of data in bytes transferred via premium proxy during this run.

state string

Possible values: [PENDING, STARTED, CANCELLED, FINISHED, FAILED]

State of this run, eg. STARTED, FINISHED, FAILED.

urlListId uuid

[internal] Id of Extractor urlList attachment when CrawlRun created

inputsId uuid

[internal] Id of Extractor inputs attachment when CrawlRun created

triggerEvent string

Possible values: [SCHEDULED, ADHOC]

[internal] Event that triggered this run

errorType string

An enumerated error type for a failed run.

errorMessage string

Error message for a failed run.

json uuid

[internal] Attachment containing json of the rows.

csv uuiddeprecated

[internal] Attachment containing csv of the rows.

xlsx uuiddeprecated

[internal] Attachment containing xlsx of the rows.

log uuiddeprecated

[internal] Attachment containing log of the url results processed.

diffId uuid

[internal] Attachment containing new-line delimited json detailing the a diff between this run and the previous.

sample uuiddeprecated

[internal] Attachment containing json array of sample of rows for previewing the data.

files uuid

[internal] Attachment containing a zip file that contains assets downloaded as part of the crawl.

downloadSummary

object

A summary of the downloaded assets for this crawl.

totalFiles int64

The number of files downloaded

totalSizeBytes int64

The total size of the files downloaded in bytes

crawlRunInputsId uuid

[internal] An attachment for the line separated JSON of inputs for this crawl run. Utilized in the API case, where inputs change run to run.

crawlRunUrlListId uuid

[internal] An attachment for line separated list of urls for this crawl run. Utilized in the API case, where url lists change run to run.

crawlRunStats uuid

[internal] An attachment for quality metrics for the crawl run.

webhooks

object[]

Webhooks to call when run completes, overrides any extractor settings.

Array [

url stringrequired

The url of the webhook

headers

object

The headers to use for webhook notifications

property name* object

The headers to use for webhook notifications

payload string

The pre-configured payload to send with each notification

]

jsonFile string

[internal]

csvFile string

[internal]

xlsxFile string

[internal]

logFile string

[internal]

sampleFile string

[internal]

archive string

[internal]

inputUri string

[internal]

notificationUri string

[internal]

]

[
  {
    "guid": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
    "_meta": {
      "timestamp": 0,
      "creationTimestamp": 0,
      "patchTimestamp": 0,
      "objectGuid": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
      "lastEditorGuid": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
      "ownerGuid": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
      "creatorGuid": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
      "orgGuid": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
      "deleted": true
    },
    "runtimeConfigId": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
    "extractorId": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
    "policyId": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
    "previousRunId": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
    "startedAt": "2024-06-19T07:12:35.855Z",
    "stoppedAt": "2024-06-19T07:12:35.855Z",
    "inProgress": 0,
    "requeued": 0,
    "totalUrlCount": 0,
    "successUrlCount": 0,
    "failedUrlCount": 0,
    "rowCount": 0,
    "screenCaptureCount": 0,
    "htmlExtractionCount": 0,
    "queryCount": 0,
    "proxyUsage": 0,
    "state": "PENDING",
    "urlListId": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
    "inputsId": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
    "triggerEvent": "SCHEDULED",
    "errorType": "string",
    "errorMessage": "string",
    "json": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
    "diffId": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
    "files": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
    "downloadSummary": {
      "totalFiles": 0,
      "totalSizeBytes": 0
    },
    "crawlRunInputsId": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
    "crawlRunUrlListId": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
    "crawlRunStats": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
    "webhooks": [
      {
        "url": "string",
        "headers": {},
        "payload": "string"
      }
    ],
    "jsonFile": "string",
    "csvFile": "string",
    "xlsxFile": "string",
    "logFile": "string",
    "sampleFile": "string",
    "archive": "string",
    "inputUri": "string",
    "notificationUri": "string"
  }
]

Unauthorized

application/json

Schema
Example (from schema)

Schema

code int

Internal error code.

error stringdeprecated

A message containing a brief description of the error.

message string

A message containing a brief description of the error.

{
  "code": 0,
  "message": "string"
}

Not found

application/json

Schema
Example (from schema)

Schema

code int

Internal error code.

error stringdeprecated

A message containing a brief description of the error.

message string

A message containing a brief description of the error.

{
  "code": 0,
  "message": "string"
}

List of crawlruns for an extractor

/extractors/:extractorId/crawlruns

Request​

Path Parameters

Query Parameters

Responses​

Request

Responses