Add a source
Load data from Google Sheets into your data warehouse.
Overview
In this step, you will add an Airbyte Google Sheets source asset that will be used to ingest the Jaffle Shop sample data into the data warehouse. Assets have four distinct lifecycle stages:
-
Definition: This is the initial phase where the asset is defined either directly as code or through the Y42 Data Catalog interface.
-
Version Control and CI/CD: After defining the asset, it's essential to save and track changes by committing and pushing the local modifications to the remote repository and running pre-commit hooks. This ensures that the asset is properly stored and that any changes won't break the project.
-
Building: Finally, once the changes have been successfully applied to the remote repository, the asset must be compiled. This can be done using the specific
y42 build
commands, which take the newly defined or modified asset and transform it into a functional asset. -
Publication: Once tested and verified, a specific version of an asset can be published on different branches, such as main or develop. Read, write or discover permissions can be set on the organization, space or asset-type level to restrict access to published assets as necessary.
Through these stages, the asset lifecycle enables a smooth process for creating, modifying, managing and publishing assets, providing a structured and efficient way to handle these important components of a system.
Instructions
Create dummy data to be ingested
To start, we'll need some data. Click on the link below to copy the Jaffle Shop Google Sheets. Make sure to take note of the account you are using to make the copy. This will be the source data we'll ingest.
Definition: Defining our first source asset
The first step is to create a source asset. A source asset allows us to either reference data from our data warehouse or load new data using Y42 managed Airbyte connectors or external Fivetran connectors.
For this tutorial, we will use the Y42 managed Airbyte Google Sheet Connector:
- Press on + in the left file navigator
- Select
Source
- Select
Google Sheets (Airbyte)
- Name the source:
raw_jaffle_shop_data
Now, we'll configure the source asset to point to the right Google Sheet and pull in all of the Jaffle Shop dummy dataset.
Add a connection secret
- In the
Authorization
panel, click on theSearch for a secret
box and then click `Add new - Name the secret
google_sheets_jaffle_shop_data_{YOUR_NAME}
, replacing YOUR_NAME with your name - Insert spreadsheet link (URL of the spreadsheet, not Google Drive link)
- Leave
Row batch size
as is - Leave
Columns name conversion
as is - Select
Authenticate via Google (OAuth)
- Authorize with the Google account that you used to copy the sheet
Once, authorized, you should be automatically redirected to Y42 and the secret should be automatically selected as a means of authentication.
Select the the tables and columns you would like to import
- Open the
Schema
panel - Wait for the schema to load, this can take a couple of minutes
- Select all three tables
raw_customers
raw_orders
raw_payments
- You can open the dropdown of a single table to see the individual columns and types
Version Control and CI/CD: Applying changes to the remote
Our first source is fully configured. Before we can build the tables and import data we need to apply our changes to the remote.
Commit your newly created asset
- Click on the
Commit & Push
button at the top - Name your Commit:
adding Jaffle Shop data
- Commit your changes
- Wait for Y42's pre-configured checks
Up next
With the new source asset defined, you're ready to materialize it in order to load the data into your data warehouse.