Extractor information
GET/extractors/:extractorId
Extractor information
Request
Path Parameters
Query Parameters
You can find your API key in the Import.io dashboard under User Settings
Responses
- 200
- 401
- 404
Extractor information retrieved successfully
- application/json
- Schema
- Example (from schema)
Schema
Array [
]
Array [
]
Array [
]
Array [
]
Unique identifier for the extractor.
_meta
object
[internal] The metadata of the object
The timestamp of the last change to the object
The timestamp of the object's creation
The timestamp of the last patch to the object
The identifier of the object
The identifier of the last editor of the object
The identifier of the owner of the object
The identifier of the creator of the object
The identifier of the organization the object belongs to
The flag indicating if the object is deleted
The extractor's name.
List of tags associated with the extractor to be used for starting many extractors at once.
fields
object[]
List of data fields the extractor is configured to retrieve.
Unique identifier for the data field within the extractor configuration.
Name of the data field targeted for extraction (e.g., 'Price', 'Image').
Type of data this field represents (e.g., IMAGE, TEXT), guiding the extraction process.
Specifies if hyperlinks associated with this field should also be captured during extraction.
Specifies if the extractor should download this field content.
The last saved config.
The associated policy.
A flag if the user is to be notified of new CrawlRuns.
The schema used by this project.
[internal] An attachment for the serialized JSON of the project's extraction state.
[internal] An attachment for line separated list of urls for this extracto.r
[internal] An attachment for the line separated JSON of inputs for this extractor.
The id of the next crawl run, that has been submitted.
Flag to suppress display of extractor.
Parent extractor Guid to generate URLs.
Parent Report Guid to generate URLs.
If set, extractor should be run when the parent extractor finishes.
Url column Id.
Is extractor chained.
Crawl run diff config.
True or false of whether a diff report should be generated when this Extractor is run.
If true, generate a data report when this Extractor is run.
Is extractor duplicate.
Maximum number of rows that this extractor can crawl over, currently a chainsaw only feature.
Authentication url.
If true, crawl runs using this extractor should use a residential proxy pool.
If true, crawl runs will honor rules in /robots.txt.
An ISO 3 digit country code for the Proxy.
Proxy Pool Type (eg DATA_CENTER, RESIDENTIAL).
webhooks
object[]
Webhooks to call when extractor completes.
The url of the webhook
headers
object
The headers to use for webhook notifications
The headers to use for webhook notifications
The pre-configured payload to send with each notification
chainingConfig
object[]
Chain configuration for interaction extractors.
The type of the parent
a list of subtypes of the parent.
The id of the parent extractor or report
Is the extractor parent triggered
mapping
object
required
The mapping for the chaining source
property name*
SourceColumn
The mapping for the chaining source
Extractor field type of source column
ColumnId of source
Optional override value for source column
If true and the extractor uses interactive inputs, the inputs will be copied to the output.
metricsNotificationSettings
object[]
[internal] Notification Setting for Crawl Run Metrics.
Possible values: [LESS_THAN
, GREATER_THAN
, EQUALS
, LESS_THAN_OR_EQUALS
, GREATER_THAN_OR_EQUALS
]
Possible values: [AVERAGE_30_DAYS
, LAST_CRAWL_RUN
]
Pointer to encrypted credentials.
{
"guid": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
"_meta": {
"timestamp": 0,
"creationTimestamp": 0,
"patchTimestamp": 0,
"objectGuid": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
"lastEditorGuid": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
"ownerGuid": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
"creatorGuid": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
"orgGuid": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
"deleted": true
},
"name": "string",
"tags": [
"string"
],
"fields": [
{
"id": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
"name": "string",
"type": "string",
"captureLink": true
}
],
"latestConfigId": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
"policyId": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
"notifyMe": true,
"schemaId": 0,
"training": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
"urlList": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
"inputs": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
"nextCrawlRunId": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
"archived": true,
"parentExtractorGuid": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
"parentReportGuid": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
"parentTriggered": true,
"urlColumnId": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
"isChained": true,
"createDiff": true,
"createDataReport": true,
"duplicateOfExtractorId": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
"maxRowsCrawlCount": 0,
"authUrl": "string",
"iso3Country": "string",
"proxyType": "string",
"webhooks": [
{
"url": "string",
"headers": {},
"payload": "string"
}
],
"chainingConfig": [
{
"parentInputType": "string",
"parentInputSubtype": [
"string"
],
"parentInputGuid": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
"parentTriggered": true,
"mapping": {}
}
],
"addInteractiveInputsToOutput": true,
"metricsNotificationSettings": [
{
"value": 0,
"operation": "LESS_THAN",
"period": "AVERAGE_30_DAYS"
}
],
"credentialsGuid": "3fa85f64-5717-4562-b3fc-2c963f66afa6"
}
Unauthorized
- application/json
- Schema
- Example (from schema)
Schema
Internal error code.
A message containing a brief description of the error.
A message containing a brief description of the error.
{
"code": 0,
"message": "string"
}
Not found
- application/json
- Schema
- Example (from schema)
Schema
Internal error code.
A message containing a brief description of the error.
A message containing a brief description of the error.
{
"code": 0,
"message": "string"
}