Collection Settings

Global Output

collection global output settings

Blank row for empty page

This settings will add rows into the internal and external snapshot output where there are no output rows for the page of results for the input, either because there was an error or because there was no error but no data was extracted.

error rows

Custom output

If in the destination you have selected "custom" output, this is where you can configure how that output gets generated at the time of publish.

custom output settings

It uses the famous jq to take the snapshot as JSON and to convert it to another output, such as JSON, CSV or TSV. Note that it is always done in streaming mode (you cannot use "slurp").

To develop a JQ expression, download the JSON output for a snapshot, and then iterate on the expression until you have what you need, for example:

head -n 1 ~/Desktop/vs.json \
| jq -rc '[._url,.site,.event_name,.event_date,(._input |fromjson |.ProdId),(._input |fromjson |.Token),._pageTimestamp]|@tsv'
We always run jq with the -rc flags on.

Data store

Provision table for collection

Here you can add a table to the datastore.

INFO: All the historic data that has passed QA is instantly available for query.

add table datastore

There is a schema created in your database for each project, which is named using the project slug.

There is a table created for the collection data, which is named using the collection slug.

For nested data (arrays, objects) there are SQL Extensions for Redshift that are used.

Get table name

datastore collection info