Crawlrun information
GET/crawlruns/:crawlrunId
Crawlrun information
Request
Path Parameters
Query Parameters
You can find your API key in the Import.io dashboard under User Settings
Responses
- 200
- 401
- 404
Crawlrun information retrieved successfully
- application/json
- Schema
- Example (from schema)
Schema
Array [
]
Unique identifier for the crawl run.
_meta
object
[internal] The metadata of the object
The timestamp of the last change to the object
The timestamp of the object's creation
The timestamp of the last patch to the object
The identifier of the object
The identifier of the last editor of the object
The identifier of the owner of the object
The identifier of the creator of the object
The identifier of the organization the object belongs to
The flag indicating if the object is deleted
[internal] Identifier for the runtime configuration used during the crawl run.
Identifier of the extractor that was executed.
[internal] Policy used by this run. If null, the default policy is used.
Chained crawl run that this run is based on. If null, this is the first run.
Timestamp marking the start of the crawl run, in milliseconds since Unix epoch.
Timestamp marking the end of the crawl run, in milliseconds since Unix epoch.
Total number of urls in progress.
Total number of urls requeued.
Total number of urls to process.
Number of successful urls processed process so far.
Number of failed urls processed so far.
Number of rows returned so far.
Number of screen captures taken.
Number of html extractions extracted.
Number of urls denied by robots.txt.
Number of rows with redacted PII.
Number of queries we used to process this run.
Amount of data in bytes transferred via premium proxy during this run.
Possible values: [PENDING
, STARTED
, CANCELLED
, FINISHED
, FAILED
]
State of this run, eg. STARTED, FINISHED, FAILED.
[internal] Id of Extractor urlList attachment when CrawlRun created
[internal] Id of Extractor inputs attachment when CrawlRun created
Possible values: [SCHEDULED
, ADHOC
]
[internal] Event that triggered this run
An enumerated error type for a failed run.
Error message for a failed run.
[internal] Attachment containing json of the rows.
[internal] Attachment containing csv of the rows.
[internal] Attachment containing xlsx of the rows.
[internal] Attachment containing log of the url results processed.
[internal] Attachment containing new-line delimited json detailing the a diff between this run and the previous.
[internal] Attachment containing json array of sample of rows for previewing the data.
[internal] Attachment containing a zip file that contains assets downloaded as part of the crawl.
downloadSummary
object
A summary of the downloaded assets for this crawl.
The number of files downloaded
The total size of the files downloaded in bytes
[internal] An attachment for the line separated JSON of inputs for this crawl run. Utilized in the API case, where inputs change run to run.
[internal] An attachment for line separated list of urls for this crawl run. Utilized in the API case, where url lists change run to run.
[internal] An attachment for quality metrics for the crawl run.
webhooks
object[]
Webhooks to call when run completes, overrides any extractor settings.
The url of the webhook
headers
object
The headers to use for webhook notifications
The headers to use for webhook notifications
The pre-configured payload to send with each notification
[internal]
[internal]
[internal]
[internal]
[internal]
[internal]
[internal]
[internal]
{
"guid": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
"_meta": {
"timestamp": 0,
"creationTimestamp": 0,
"patchTimestamp": 0,
"objectGuid": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
"lastEditorGuid": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
"ownerGuid": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
"creatorGuid": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
"orgGuid": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
"deleted": true
},
"runtimeConfigId": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
"extractorId": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
"policyId": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
"previousRunId": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
"startedAt": "2024-06-19T07:12:35.867Z",
"stoppedAt": "2024-06-19T07:12:35.867Z",
"inProgress": 0,
"requeued": 0,
"totalUrlCount": 0,
"successUrlCount": 0,
"failedUrlCount": 0,
"rowCount": 0,
"screenCaptureCount": 0,
"htmlExtractionCount": 0,
"queryCount": 0,
"proxyUsage": 0,
"state": "PENDING",
"urlListId": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
"inputsId": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
"triggerEvent": "SCHEDULED",
"errorType": "string",
"errorMessage": "string",
"json": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
"diffId": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
"files": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
"downloadSummary": {
"totalFiles": 0,
"totalSizeBytes": 0
},
"crawlRunInputsId": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
"crawlRunUrlListId": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
"crawlRunStats": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
"webhooks": [
{
"url": "string",
"headers": {},
"payload": "string"
}
],
"jsonFile": "string",
"csvFile": "string",
"xlsxFile": "string",
"logFile": "string",
"sampleFile": "string",
"archive": "string",
"inputUri": "string",
"notificationUri": "string"
}
Unauthorized
- application/json
- Schema
- Example (from schema)
Schema
Internal error code.
A message containing a brief description of the error.
A message containing a brief description of the error.
{
"code": 0,
"message": "string"
}
Not found
- application/json
- Schema
- Example (from schema)
Schema
Internal error code.
A message containing a brief description of the error.
A message containing a brief description of the error.
{
"code": 0,
"message": "string"
}