Infinite Scroll and Pagination URLs That Don't Change

What is ?infinite scroll??

Many webpages consist of content that requires you to scroll the browser window or click Load More to see more information. Some webpages, search results for example, can go on seemingly forever, and thus the term ?infinite scroll?.

What?s the problem?

Handling infinite scroll can be tricky because the URL generally remains static (it doesn?t change, even when you?re on a different page). Websites handle this situation in different ways structurally. While it is not always possible to get around the problem, the key is finding the underlying URL pattern for the different pages or pagination, even when the pattern is not explicit in the URL.

Working through an example: Staples category

The http://www.staples.com/Notebooks-Pads/cat_CG3783 webpage consists of both infinite scroll initially and then a LOAD MORE button farther down the page.

To compensate for the infinite scroll, perform the following steps:

Step 1. Opening Chrome DevTools

To find the underlying URL, Import.io recommends using Google Chrome.

Open Chrome and navigate to the Staples webpage.
Right-click on the page and select Inspect. The DevTools inspector appears.

Step 2. Clearing the Network tab

Click the Network tab.
Click the Clear icon (next to the red circle near the upper left of the inspector window) to clear any existing activity.

Step 3. Locating the second page of content

Click the XHR tab (under the Filter search box), to view the XHR requests.
Scroll down the page, displaying more content, until the LOAD MORE button appears.
Click LOAD MORE. XHR requests appear in the inspector.

Step 4. Viewing the XHR request header

Hover over the items in the Name column and locate http://www.staples.com/asgard-node/v1/nad/staplesus/deals/html/BI1431814?rank=1&supercategory=&onlybopis=false&pagenum=2&zipcode=94566&productTile=secondaryDealsData.
Select this item. Several tabs appear to the right of the Name list.
Click the Headers tab.

Step 5. Identifying the page component of the URL

In the Request URL, notice pagenum=2. Pagenum is the URL parameter that contains the actual page number of the displayed page. Navigating to this URL directly skips straight to the second page of content in the underlying data structure.

Now you know how the website really paginates, and thus, how to create your extractor.

Step 6. Creating your extractor

Create your extractor using the following URL: http://www.staples.com/asgard-node/v1/nad/staplesus/deals/html/BI1431814?rank=1&supercategory=&onlybopis=false&pagenum=2&zipcode=94566&productTile=secondaryDealsData

Step 7. Adding the rest of the URLs and running the extractor

From the dashboard, click the Settings tab.
Add the rest of the URLs:

and so forth. Save and run your extractor.

What if I get a JSON response?

If a JSON response appears when you paste a URL in step 6 or step 7, add the following OPT parameter to the end of each URL:

#[!opt!]{"type":"json"}[/!opt!]

What if I run into a POST request or another issue?

If in step 4, the request method of the XHR request is POST (or you run into another issue), contact support@import.io for assistance.

Infinite Scroll and Pagination URLs That Don't Change

What is ?infinite scroll??

What?s the problem?

Working through an example: Staples category

Step 1. Opening Chrome DevTools​

Step 2. Clearing the Network tab​

Step 3. Locating the second page of content​

Step 4. Viewing the XHR request header​

Step 5. Identifying the page component of the URL​

Step 6. Creating your extractor​

Step 7. Adding the rest of the URLs and running the extractor​