Stateful Data Assets
Learn about stateful data assets, the building blocks of Y42 data pipelines.
Data assets are the fundamental blocks of data pipelines. Each data asset is a set of persisted table objects that captures some understanding of the world over time, according to its definition written as code. By adding YAML configurations, we bind tests, dependencies and metadata to our asset.
As a whole, the persisted table snapshots created over time, including its methods and attributes, form a data asset.
In the realm of data, managing state it's the primary means to understanding the ever-changing world. Every time you transform the data, apply masking policies, set permissions, or re-run a model you are inherently manipulating the state of the data asset.
You can think of a data asset, similar to a river: it changes and evolves over time due to various factors, such as new data coming in, existing data being updated or deleted, or transformations being applied upstream that have to be propagated. With each change, a new variation of the asset is created in the form of a table.
The table in your data warehouse is a snapshot in time of your data asset, representing the state of the data asset at a specific point in time. While these snapshots remain static, the actual data asset, similar to a river, continues to flow and evolve over time as new data comes in.
If one of the builds fails — either because the asset cannot be materialized or because one of the linked tests doesn't pass — Y42 automatically rolls back to the last valid build, ensuring that users always access correct data.
To encapsulate all of these components into a single logical unit, we introduce a new stateful and declarative approach of building and managing data assets, called Stateful Data Assets.
Merging the concepts of statefulness and declarativeness leads us to a world where you can define the end state of how you want your data assets to look like, what tests to pass, or who should have access to, without providing step-by-step instructions, coupled with a complete auditable trail derived from the rich metadata collected at each stage.
This capability allows you to check the state of your asset in the data warehouse at any given point in time, and deploy or rollback instantly if necessary.
To better grasp the advantages, let's dissect them into the stages of an asset lifecycle, focusing on two planes: the data plane, which encapsulates the process of building the asset, and the control plane that governs the process.
Now that we've looked at the practical improvements in the Data Plane, let's turn our attention to the Control Plane. This section is integral to orchestrating the operations we observed in the Data Plane. We'll now explore how the Control Plane amplifies the utility and management of data assets, providing a streamlined approach to data observability.