Schemas

A Schema in import.io defines the output shape of the data, along with the validation rules for the schema.

Listing Schemas

Schemas can be listed by clicking the "Schemas" menu item on the organization sidebar.

You can filter and sort your Schemas.

2019 11 21 10 15 28

Creating a schema

To create a new schema click the + icon on the schemas page.

2019 11 21 10 20 54

You can see a preview on the right of what the data would look like as JSON.

When a schema is first created, is is created as a "draft". This means that changes can be made to the schema which are not backwards compatible, and is ideal for if you are iterating on schema ideas. This means that when you import data, it cannot be pushed and is not saved to the import.io data lake as we cannot be sure it will be compatible with other data collected in future.

Currently if you use a nested schema you must push data in via the avro import method.

Column settings

Data settings

Setting Description

Single Value

Whether or not the value is an array

Filter

If the value is a falsy value (zero, false, blank) mark this row as filtered and do not include it in the data pushed to destinations

Internal

If this is true, the data is excluded from the data pushed to destinations

Primary key

The fields that are part of the composite primary key give the rows the _id metadata column - a generated UUID from the hash of the column values. The data pushed to destinations is deduped on this ID.

Type

The type gives the avro/parquet data types, and also controls how the system turns extracted text values into typed values. The locale parameter on a source is used when doing this conversion.

Default value

A textual default value for the column. Note that this should be in ISO format for date/time, and JSON format for numbers and booleans.

Validation settings

These settings contribute towards the validation error statistics for snapshots of data.

These settings are soft indicators, they do NOT filter data, the filter setting should be used in the field settings.

Viewing a schema

You can view a schema by clicking on its name in the list view, and you can see an easy to understand schema representation:

schema view
Icon Description

🔑

Part of the primary key

[ ]

Array field

⭐️

Required field

Publishing a schema

To publish a schema out of the "draft" state so it can be used to publish data to customers, click the "Publish" button on the view schema page.

Note that when a schema is published, you can add new fields to the root objected or a nested object, but you cannot change the data definition of existing fields.

schema backwards compat
If you want to stop using a field, you can mark a field as deprecated. This will remove it from the schema view.

Iterating on a published schema

When you are looking to upgrade a schema, you may want to be able to use an existing published schema as a starting point, and then merge the changes back into a published schema later.

Creating a new draft from a schema

Click the "Copy" button in the schema view:

published schema

This will take you to the "edit" page for the copied schema:

copy schema

Merging changes back into the published schema

This is currently only possible over the API
http https://workbench.import.io/api/orgs/test-org/schemas/$INPUT_ID Authorization:\ Bearer\ $IO_API_KEY \
| jq '{extractor: .extractor}' \
| http PATCH https://workbench.import.io/api/orgs/test-org/schemas/$OUTPUT_ID Authorization:\ Bearer\ $IO_API_KEY

The API will return a 400 with an error message if the schema is not backwards compatible.

Deleting Schemas

Schemas cannot currently be deleted.