Skip to main content

Adding a List of URLs

Managing an Extractor's URL List

From the Inputs tab of an Extractor, you can manage the list of URLs extracted for when an Extractor starts a crawl run. You can either manually add URLs, import them from a file, or extract them from other pages with Chained Extractors.

Elements of the Inputs View

  1. Input source: Dropdown to set whether the Extractor uses URLs from an explicit list of URLs provided or URLs extracted by another Extractor.
  2. Clear All: Removes all the URLs from the list to start over.
  3. Remove Duplicate Rows: Removes any duplicate URLs from the list.
  4. Cleanup URLs: Removes invalid URLs and empty rows from the list.
  5. Download Inputs: Download a list of the URLs in CSV, Excel, JSON, or NDJSON format.
  6. Import Inputs: Import a list of URLs from a CSV or Excel (XLSX) file.
  7. Generate URLs: Opens URL generator to create URLs from a example URL.
  8. Add input row: Add blank row to list of inputs.
  9. Reset to saved inputs: Resets URL list to saved inputs.
  10. List view: Shows all of the URLs currently added.
  11. TextBox: Manually add URLs by inputting them in the textbox.
  12. Save: This saves any changes made to the URL list. When you add/remove/update URLs using the URLs Input, the changes will not be saved until you click Save.
  13. Run Inputs: Starts a new crawl run. If you have unsaved changes, this button will be disabled until you save your changes.
  14. Total Inputs: Display a count of URLs in the list. This is also how many queries a crawl run will use with that list of URLs (If screen capture is enabled then the total number of queries will be doubled).