Here at Import.io, we use some unique (and not-so-unique) terminology to describe our product. Before you dive into our world, here are some useful definitions to keep in mind.
If you already have a grasp of solid understanding of our terminology, visit our tutorial for building your first extractors to start extracting data from Yelp.
In Import.io, you build an extractor to select the data you want from a type of web page. For example, you can build an extractor to get the Product ID, Name, and Price from a product page; another extractor to get the Reviewer, Product, Name, and Rating from a review page; and a third extractor to get a list of Product Names on a product listings page.
Training is the act of creating and editing an extractor within Import.io. To train an extractor, you load in an example URL, and then select the data you want, using either Point-and-Click or our advanced tools like XPath and Page Interaction. When you run your extractor in the future, it uses your training to determine just what data you want.
Details and Listings Pages
Details and listings pages are distinct page structures that we run into so often, we decided to list and detail them here!
A details page is a page that contains the data for an individual object, such as a product or business page. An extraction on one details page returns a single row of data.
A listings page is a page that has a list of identically-structured items. Think of Google's search results, which contain a list of page titles, URLs and descriptions. A listings page returns multiple rows of data - one row for each item in the list.
Chaining occurs when one extractor gets the list of URLs it crawls from a second extractor's output. When this happens, we say the extractor receiving URLs is chained to the extractor providing the URLs.
A classic example is chaining an extractor trained on a product details page to a list of products output by a second extractor. This outputs one row for each product listed on the listings page, with each product's data stored across the cells in its row.
Import.io tracks your usage through the number of queries you use. The easiest way to think of a query is one page or URL. For example, if you run through 50 product pages, that would be considered 50 queries. For interactive extractors, a set of inputs is considered one query, two searches that used two different dates would still be two queries.
Each time your extractor runs, we call this a crawl run. Each crawl run has as input a set of URLs (and inputs for interactive extractors) it runs through. It returns structured data as its output in either Excel, CSV, or JSON format.
A Report in Import.io is the byproduct of one or more crawl runs. Currently, you can generate Data, Change, and/or Comparison Reports. Data Reports allow you to filter your column outputs. Change Reports allow you to monitor changes between two crawl runs. Comparison Reports allow you to compare data between two or more extractors. All reports can be published and shared via a shareable Report Portal.
A Data Report allows you to select which columns from an Extractor's crawl runs to include in a Data Report and adds some basic styling. Data Reports generated can be shared to your Import.io Portal, which allows other stakeholders to view them without having access to your account.
A Change Report allows you to monitor changes that occur between the latest crawl run for an Extractor and the previous crawl run. Change Reports generated can be shared to your Import.io Portal, which allows other stakeholders to view them without having access to your account.