Docs
3. Define a source asset

Add a source

Load data from Google Sheets into your data warehouse.

Overview

In this step, you will add an Airbyte Google Sheets source asset that will be used to ingest the Jaffle Shop sample data into the data warehouse. Assets have four distinct lifecycle stages:

  1. Definition: This is the initial phase where the asset is defined either directly as code or through the Y42 Data Catalog interface.

  2. Version Control and CI/CD: After defining the asset, it's essential to save and track changes by committing and pushing the local modifications to the remote repository and running pre-commit hooks. This ensures that the asset is properly stored and that any changes won't break the project.

  3. Building: Finally, once the changes have been successfully applied to the remote repository, the asset must be compiled. This can be done using the specific y42 build commands, which take the newly defined or modified asset and transform it into a functional asset.

  4. Publication: Once tested and verified, a specific version of an asset can be published on different branches, such as main or develop. Read, write or discover permissions can be set on the organization, space or asset-type level to restrict access to published assets as necessary.

Through these stages, the asset lifecycle enables a smooth process for creating, modifying, managing and publishing assets, providing a structured and efficient way to handle these important components of a system.

Instructions

Create dummy data to be ingested

To start, we'll need some data. Click on the link below to copy the Jaffle Shop Google Sheets. Make sure to take note of the account you are using to make the copy. This will be the source data we'll ingest.

https://docs.google.com/spreadsheets/d/1EcWR2omOWJwrCTWi_FKiaqlVrAyV6gZLmjMMBmUBz6k/copy (opens in a new tab)

Definition: Defining our first source asset

The first step is to create a source asset. A source asset allows us to either reference data from our data warehouse or load new data using Y42 managed Airbyte connectors or external Fivetran connectors.

For this tutorial, we will use the Y42 managed Airbyte Google Sheet Connector:

  1. Press on + in the left file navigator
  2. Select Source
  3. Select Google Sheets (Airbyte)
  4. Name the source: raw_jaffle_shop_data

Now, we'll configure the source asset to point to the right Google Sheet and pull in all of the Jaffle Shop dummy dataset.

Add a connection secret

  1. In the Authorization panel, click on the Search for a secret box and then click `Add new
  2. Name the secret google_sheets_jaffle_shop_data_{YOUR_NAME}, replacing YOUR_NAME with your name
  3. Insert spreadsheet link (URL of the spreadsheet, not Google Drive link)
  4. Leave Row batch size as is
  5. Leave Columns name conversion as is
  6. Select Authenticate via Google (OAuth)
  7. Authorize with the Google account that you used to copy the sheet

Once, authorized, you should be automatically redirected to Y42 and the secret should be automatically selected as a means of authentication.

Select the the tables and columns you would like to import

  1. Open the Schema panel
  2. Wait for the schema to load, this can take a couple of minutes
  3. Select all three tables
    1. raw_customers
    2. raw_orders
    3. raw_payments
  4. You can open the dropdown of a single table to see the individual columns and types

Version Control and CI/CD: Applying changes to the remote

Our first source is fully configured. Before we can build the tables and import data we need to apply our changes to the remote.

Commit your newly created asset

  1. Click on the Commit & Push button at the top
  2. Name your Commit: adding Jaffle Shop data
  3. Commit your changes
  4. Wait for Y42's pre-configured checks

Up next

With the new source asset defined, you're ready to materialize it in order to load the data into your data warehouse.