Incremental models
Transform only new or updated rows from upstream.
Overview
Created as tables in your data warehouse, incremental models initially transform all your source data rows. On subsequent runs, incremental models target specific rows based on a user-provided filter, usually pertaining to newly added or updated rows.

The incremental loading mechanism significantly reduces transformation runtime and compute costs, thereby leading to enhanced warehouse performance.
Building incremental models
Building an incremental model follows the same initial process as building a model that is materialized as a table or view. However, you will need to also include both the incremental configuration and the filtering criteria.
Here's an example to illustrate the structure of an incremental model:
Configure the model as incremental.
Define the materialization type as part of the config block.
Define the model's logic.
Provide the SQL logic that will act as the base for your incremental model.
Utilize the is_incremental
macro to filter out rows.
Use the is_incremental()
macro to filter for "new" rows, which are those added since the last model run. Use the {{ this }}
variable to easily query your target table for the latest timestamp.
Set an optional unique_key to identify rows that need to be updated.
The unique_key
parameter is optional. When set, Y42 merges the new data into the zero-copy-clone of the existing table, allowing for updates to existing rows.
If the unique_key
is not specified, Y42 will simply append the data to the end of the zero-copy-clone of the existing table, allowing the addition of new rows only.
Define the model's logic.
Provide the SQL logic that will act as the base for your incremental model.
Utilize the is_incremental
macro to filter out rows.
Use the is_incremental()
macro to filter for "new" rows, which are those added since the last model run. Use the {{ this }}
variable to easily query your target table for the latest timestamp.
Set an optional unique_key to identify rows that need to be updated.
The unique_key
parameter is optional. When set, Y42 merges the new data into the zero-copy-clone of the existing table, allowing for updates to existing rows.
If the unique_key
is not specified, Y42 will simply append the data to the end of the zero-copy-clone of the existing table, allowing the addition of new rows only.
Y42 vs. dbt incremental models execution
In Y42, the tables in your data warehouse (DWH) are synchronized with your code. This abstraction eliminates concerns about data loss due to errors overwriting a table in your DWH. Unlike dbt, expensive full-refreshes are not necessary. If a job runs with incorrect configurations, erroneous queries, or source issues, you can revert your changes and restore the model's data to its previous state.
In contrast, each dbt run overwrites your existing DWH table. If the table was incrementally built, overwriting it with incorrect data can be costly. You would have to perform a full-refresh, which undermines the efficiency of incremental models.
BigQuery limitations
In BigQuery, tables are not partitioned or clustered by default. This means that even if you use a WHERE statement to filter out data, the entire table is scanned. Consequently, incremental models like the following will read the whole source table (e.g., raw_app_data.events) with every run:
Therefore, the incremental model does not save any money, although it does save time because any filter or limit reduces execution time in BigQuery.