Using the URL Generator
The URL generator is the quickest way to generate multiple URLs by using the patterns in the URLs. The following examples show how URL parameters might vary for items like categories, search terms, and page numbers.
Categories:
http://aviation.stackexchange.com/questions/tagged/engine
http://aviation.stackexchange.com/questions/tagged/weather
Search terms:
http://stackexchange.com/search?q=monkey
http://stackexchange.com/search?q=horse
Pages:
http://stackexchange.com/ (displays page 1)
http://stackexchange.com/?page=2
http://stackexchange.com/?page=3
http://aviation.stackexchange.com/questions/tagged/engine?page=4
Accessing the URL generator
- Click the Settings tab of the extractor
- Click on the Generate URLs button
The following window will appear:
Parameter types
The URL generator allows the two types of parameter values: Range of Numbers and List of Values.
Range of Numbers
The URL generator allows for numeric values. The following example demonstrates how use the Range of Numbers parameter type to generate a list of URLs by varying the page number.
- Click the Edit button to add the URL you want to parameterize. For instance, enter https://www.kiva.org/lend?page=1 in the text box.
- Highlight the variable of the URL or the text that follows the equal sign. In this case the 1 represents the page number. Once highlighted, the 1 changes to Parameter-1 and the options for the parameter appear.
- Select Range of numbers from the dropdown list to specify the parameter type.
- For the range box values, enter 1 and 100. This will generate pages from 1 to 100.
- Set step to 1. The step box specifies the value to add to each number when creating the list. For example, setting the step to 5 will access every fifth page.
- When you are satisfied with the generated URLs, click Add to list and click Save to save the extractor.
List of Values
The URL generator allows for non-numeric values. The following example demonstrates how use the List of values parameter type to generate a list of URLs by varying the restaurant location.
- Enter a URL in the URL generator URL box. For example, enter https://www.yelp.com/search?find_desc=Restaurants&find_loc=San+Francisco%2C+CA in the box.
- Highlight the variable portion of the URL, that is, the text that follows the equals sign. For this example, San+Francisco%2C+CA represents the variable portion of the restaurant location. San+Francisco%2C+CA changes to Parameter-1 and options for the parameter appear.
- Now that the value is a parameter, alter the parameter value to access multiple pages. This example alters the parameter to retrieve one page of Yelp restaurant results for each of San Francisco, Los Gatos, and London.
- Select List of values from the dropdown list to specify the parameter type.
- Enter a comma-separated list of values in the box. For this example, enter San+Francisco%2C+CA,Los+Gatos%2C+CA,London%2C+UK in the box. The number of generated URLs changes to 3 and the list of URLs appears in the URL preview box.
- When you are happy with the generated URLs, click Add to list.
Specifying multiple parameters
URLs can contain more than one parameter. To specify additional parameters in the URL generator, repeat the highlighting process. As you alter the parameter values, the list of URLs changes in the URL preview box underneath. The URL list contains all combinations of the parameters. For example, in the following screenshot, parameters specify three cities, three cuisine types, and five pages of each city-cuisine combination for a total of 45 URLs.
When you are satisfied with the generated URLs, click the Add to list button.
Removing parameters
To remove a parameter, click the X to the left of the parameter definition.
Editing the URL
To change the URL, click Edit at any time.