Validation

how is data validated?

Overview

When collating data from multiple sources, it is possible that the data types of the columns do not match. For example, a column may be defined as a string in one source and as an int in another. Preen will attempt to coerce the data types of the columns to the most common data type across all sources. We do this by implementing a majority voting algorithm. If we are unable to determine the data type of a column, we will error out and require manual intervention.

Note: There will be cases where you need to manually cast the data types of the columns in your model.

We store the results of the validation step in a DuckDB table called preen_information_schema. You can use this table to inspect the results of the validation step and to cast the data types of the columns in your model.

CLI Commmands

preen source validate

Code References

Last updated