DBT is not one of the first technical skills most Data Scientists or Analysts think to learn. It’s not as exciting as machine learning algorithms, and it’s not as easy to show off as a fancy data visualization. Beyond that, it’s not as widely known, so there is less awareness of what DBT can do. But from my experience, DBT is an absolutely fundamental skill for any Data Scientist or Analyst due to all of its capabilities.
Because DBT is so flexible, there are almost an endless amount of ways you can integrate DBT into your data architecture. Below, I’ve identified some of the most common ways it can be used to easily improve data structures.
Create SQL transformations that use consistent logic
DBT allows you to create SQL transformations on your data and include these transformations in your architecture. Oftentimes, you’ll only want to analyze your data after you’ve cleaned it up with some transformations. Instead of doing these transformations on a one-off basis every time you are starting a new analysis project, DBT allows you to write these transformations once and include them in your data architecture. This means that you’ll end up with a cleaned up table in your data warehouse with the transformations already applied, so you’ll be ready to start analyzing the data right away.
Test SQL transformations and underlying data
One of the most useful features that DBT provides is the ability to test your transformations. You can very easily set up basic tests, such as ensuring a column is not null, unique, or only contains specific values. Additionally, you can also create any arbitrary, custom test just with SQL. For example, if you have an aggregated table that calculates weekly sales, you can create a test that checks if the sales are greater than 0. If not, you can have DBT either fail or set a warning that alerts you that the sales are not greater than 0. These types of custom tests not only help make sure your aggregations are correct, but they also help check the underlying data doesn’t have any issues.
Run transformations on a schedule
Using DBT Cloud, you can set your SQL transformations to run on a schedule. They can run on any custom schedule you’d like – weekly, hourly, or at specific time intervals. As you can imagine, this is an incredibly important feature to integrate into your data architecture. By specifying the appropriate schedule, you can ensure that business intelligence tools that rely on the transformed data will be updated regularly.
Add snapshots for slowly changing dimensional tables
One of the most complicated data issues that many companies face is how to keep track of their slowly changing dimension tables. A perfect example of this is a customer table. When a customer’s address changes, the new value replaces the old one in the customer table. However, it’s often still valuable to store the old customer data somewhere. This is where DBT snapshots come in. DBT snapshots allow you to specify specific tables that you’d like to track how their data changes over time. It also gives you the ability to know when a specific value was valid. Using the customer address as an example, snapshots will allow you to know exactly when the address changed. Setting up DBT snapshots for slowly changing dimensions is an incredible way to enrich the quality of data in your data architecture.
Test your code in a dev environment
Finally, DBT allows you to run your code and your tests in a dev environment to ensure you don’t break anything in production. As I’m sure you know, this is a critical component to any data architecture. As an analyst, it’s often useful to run your code and evaluate the results to make sure everything is working as expected. DBT allows you to do this by customizing your environment so that your test runs only affect your dev environment. Then, after you’ve QA-ed your work and you’re confident all of your transformations are working as expected, you can bring your updated logic into the production environment. All of this can be done seamlessly using your local environment, Github, and DBT Cloud, and it’s an essential feature that every analyst using DBT should take advantage of.
There are plenty of other ways DBT can add value to your data architecture, and they are always improving. DBT even just raised almost $13M, and hundreds of companies like Hubspot and Gitlab take advantage of their amazing technology. Because the only real prerequisites are SQL and Github experience, it’s definitely a skill that all Data Scientists and Analysts should add to their toolbox.
If you’re ready to start your DBT journey and learn DBT from scratch all in under four hours, sign up for my course here.