Quick start
Creating a project
Create a new project by clicking the + icon on your projects page.
The readme should provide context to team members who are involved in the project, and supports markdown syntax.
It is important that you pick good slugs, they cannot be changed later - here to example, they are used within the data lake and for the database schema name. |

Creating a destination
A destination is where the data that passes QA gets published to.
To create a new destination click the + icon on the destinations page.
Data will only be published to a destination if the destinaton is marked as active at the time that the data passes QA. |
Creating a schema
A schema defines the output format of the data.
To create a new schema click the + icon on the schemas page.

You can see a preview on the right of what the data would look like as JSON.

Currently if you use a nested schema you must push data in via the avro import method. |
Field settings
Setting | Description |
---|---|
Single Value |
Whether or not the value is an array |
Filter |
If the value is a falsy value (zero, false, blank) mark this row as filtered and do not include it in the data pushed to destinations |
Internal |
If this is true, the data is excluded from the data pushed to destinations |
Primary key |
The fields that are part of the composite primary key give the rows the |
Type |
The type gives the avro/parquet data types, and also controls how the system turns extracted text values into typed values. The locale parameter on a source is used when doing this conversion. |
Default value |
A textual default value for the column. Note that this should be in ISO format for date/time, and JSON format for numbers and booleans. |
Creating a collection
A collection is a group of sources that share the same schema and collection window.
Create a new collection by clicking the + icon on your collections page within the project section.

The readme should provide instructions beyond the schema on how the sources for this collection are built.
The parameters are important to distinguish sources. For example, I may have a domain
parameter for a number of sites that are being built.
The parameter values, along with the readme and schema give the people implementing the sources the required information to build the source.
There are two special parameters, locale and tz which are also used in the data typing if set.
Linking a destination to the collection
To add or remove links to destinations for a collection to push data to, visit the "Destinations" section from the collection homepage.
Setting up the data quality checks for the collection
To set up the data quality checks for a collection, visit the "Checks" section from the collection homepage.
If you want to escalate the failed snapshot to your L2 support automatically on fail you can choose to do so.

If there are no data quality checks, or every data quality check passes without human intervention when the data is imported, unless the source is in development or maintenance, QA will automatically pass, and the data will be pushed to the configured destinations. |
Creating a source
To create a source in a collection, visit the "Sources" section from the collection homepage, and click the + icon.
When a source is created it is in the QUEUED state.
If a user takes ownership of a QUEUED source, it moves to IN PROGRESS. Once they are ready to have the source checked, they can move to QA. The source will automatically move to ACTIVE if the QA passes, and the data published.
If a source fails QA, the source moves into a MAINTENANCE state.
You can change the state of the source manually by editing the source.
Linking an extractor to a source
You can link an extractor by ID to the source in the edit source view.