Building a complete analytics infrastructure platform for the MDS.
The modern data stack's (MDS) promise of a more agile and cost-effective method for building data stacks remains largely unfulfilled. The principle of modularity, which allows companies to mix and match tools to better meet their needs, sounds great in theory. However, it introduces significant challenges:
Selection stage: Choosing the right tool stack can be a daunting task, with an overwhelming number of options to consider and their interoperability to figure out. This stage becomes even more time-consuming when you add in:
- Procurement: The process of getting tools approved and rigorously testing a multitude of solutions until you land on your final solution.
- Billing Complexity: The task of juggling numerous different bills as you work with a variety of tools and technologies.
Integration efforts/lengthy setups: Setting up your stack is just the beginning. Connecting all the tools and maintaining this complex structure can demand significant time and resources. It's a continuous process with no end in sight.
Fragemented workflow: Dealing with multiple interfaces and a ton of different tools is a real headache. It turns what should be a simple task into a roundabout tour of a toolset maze.
Training and expertise: Many tools in the modern data stack require specialized skills. In a competitive job market, finding or training staff with these skills not only increases costs but also adds to the difficulties of maintaining an efficient data solution.
Access management: Managing authentication and access control across multiple services increases complexity and security risks.
Observability challenges: With a dispersed set of tools in the mix, tracing data lineage, managing metadata, and monitoring system health become more complex and time-consuming.
At Y42, our mission is to build a complete analytics infrastructure platform. It's meticulously engineered to address the challenges outlined above, placing emphasis on improving the data developer experience.
Connect your data warehouse and object storage service, and you'll immediately have a fully functional, integrated data platform at your fingertips.
Utilize Airbyte or Fivetran for data ingestion, dbt for data transformations, and orchestrate your pipelines with Y42's simple dag selectors. No setup or maintenance is required.
Choose from several development modes to build and manage your data assets:
Code Editor - An in-browser VSCode editor supercharged by WebAssembly for your complex setups. Comes with built-in lineage preview, data preview, and handy features like code auto-suggestions and static code analysis.
Data Catalog: More of a visual person? With our Data Catalog, enjoy a UI-driven approach to build, test, modify, and schedule data assets. You can forget about writing YAML files.
Lineage view: Navigate data asset relationships with ease. Search, filter, or switch back to the previous modes as needed.
All modes are synchronized in real-time, allowing you to seamlessly transition between them as you build your assets.
Whether you're setting up your sources, manipulating data, or scheduling your builds, Y42 offers a consistent DX.
Launch the global command menu directly from the keyboard using CMD / CTRL + K to speed up your workflow.
The command menu provides quick access to the commonly used actions for building data pipelines and is accessible across all development modes - Data Catalog, Code Editor, and Lineage.
The Y42 Code Editor is more than just a development environment. Our enhanced Code OSS platform uses asset materialization data to provide superior auto-completion and auto-correction for SQL and YAML files.
Say goodbye to manual YAML files. With the Data Catalog mode, YAML files are generated directly from the asset definition, and additional details, such as tests, or descriptions can be seamlessly adjusted via UI or code.
When you create an account with Y42, a GitLab repository is automatically created and managed for you. Every operation you perform, whether it's defining a new source connection secret, modifying model definitions, adjusting orchestration groups, or managing alerts, gets automatically staged and is ready for commit.
Through our user interface, you have the capability to merge branches, stash changes, or revert modifications as per your needs. The power to manage your development process is at your fingertips.
Streamline your workflow with Virtual Data Builds, Y42's feature designed to handle both operational changes (like schema modifications and data updates) and logical changes (code alterations) via git. This method allows the creation of infinite virtual environments and instant deployments and rollbacks. All of this is possible due to the effective reuse of data assets and the elimination of data duplicates.
Whenever there's a need for a change, start by creating a branch. You can create it from any existing branch, even main. This allows you to work directly on production data without risking the operational integrity of the production environment.
Key advantages include:
- Less Mental Overhead: All you need to focus on is your code. Virtual Data Builds converts your code operations into assets, saving you the headache of managing environments. #EnviromentallyFriendly
- Efficient Deployments and Rollbacks: When merging or reverting your code changes, previously materialized data assets are reused with no additional interaction required beyond the standard Git process. #NoMoreRebuilds
- Reduced Data Warehouse Costs: The ability to swap views' pointers dramatically reduces storage and computation costs. Instead of rebuilding identical assets in all environments to accommodate multiple development streams, you operate with a single copy of your assets. #MinimalDWHCosts
Leverage Y42's stateful data assets to simplify the asset lifecycle from planning to activation. Standardized interfaces and centralized monitoring enhance asset oversight, facilitating straightforward integration and management of your data assets with real-time cost and health monitoring. Adapt and evolve with a data asset ecosystem designed for clarity and efficiency.
Manage your build tasks in Y42 with the
y42 build command. This runs all your committed assets or use
--exclude flags for targeted execution. With graph (
*) and set operators (
comma), you have granular control over the assets to include or exclude in your build DAG.
With Y42, you retain full control over your data pipeline. The entire process is defined as-code in a Y42-managed GitLab repository, allowing you to clone it to your local system at any time.
The dbt part of a Y42 project can be directly utilized as a standalone dbt project, without requiring any modifications to the data pipeline code. Any Y42-specific elements, such as ingestion and orchestration code, are seamlessly ignored by the dbt compiler, granting you the freedom to use the dbt command-line interface as you see fit.
Not sure where to begin? No worries, we've got you covered. If you're new to Y42, visit our website to request early access (opens in a new tab). If you already have an account, you can jump right in by following our Getting Started guide.