SQL is one of the most important skills a Data Scientist or Analyst can have as it allows you to query data from a relational database. After working with graduates from Data Science bootcamps, I’ve noticed that many are lacking the necessary SQL skills to succeed on the job. It seems as if many bootcamps are quickly covering SQL, but not emphasizing it enough. However, in almost every interview, candidates will be asked to show off their SQL skills and many are not prepared. So we’re going to discuss the basic fundamentals candidates should have mastered in order to ace the technical interview.
One other thing to note is that there are multiple types of SQL, but in the examples here we’re going to be using PostgreSQL. Most of the versions of SQL are very similar though, so once you are comfortable with one, it’s very easy to switch to another if needed.
Getting Started
Before we dive into some of the more advanced topics, it’s important to cover the first set of SQL skills you should have mastered. These include creating a query with the SELECT, FROM, and WHERE keywords. Other topics include AND, OR, NOT, IN, IS NULL, LIKE, and comparison operators. If you are not comfortable with any of these parts of SQL, I would definitely review them before continuing. Below is an example of a query using just the basics.
SELECT *
FROM public.jobs
WHERE id > 100
AND price < 50
AND customer_id IS NOT NULL
AND paid_with_cc IS TRUE;
Aggregations
SQL allows you to aggregate data using aggregate functions. This means you can do calculations such as summing, counting, or finding the average of a given column. You can perform the aggregation over the whole table, or you can group it by a given field. For example, you could find the sum of the price column in a given table, or you could find the sum of the price column for each year in the table. The two queries would look as follows:
SELECT sum(price)
FROM public.jobs;
SELECT
year,
sum(price)
FROM public.jobs
GROUP BY year
HAVING SUM(price) > 100;
As you can see, in the second query we’ve used the GROUP BY keyword to sum up the prices just in a given year. We’ve also included a HAVING condition to only keep in years that have the sum of the price > 100. Some of the aggregation keywords that you can use in SQL include SUM, COUNT, MIN, MAX, AVG, and MEDIAN.
Joins
It’s important in SQL to be able to join two or more tables together. When joining tables, you specify which column(s) in one table should match the column(s) in another. You also need to know the type of join you’d like to perform. The most common joins include INNER and LEFT, though I would recommend learning FULL OUTER as well. You should be able to understand the difference in the following queries.
SELECT *
FROM public.jobs
JOIN public.customers
ON jobs.customer_id = customers.id;
SELECT *
FROM public.jobs
LEFT JOIN public.customers
ON jobs.customer_id = customers.id;
In the first example, because it is an inner join, the query will only return rows where the job had a customer id that matched the id in the customers table. In the second example, the query will return all jobs, regardless of if they had a customer id that matched the id in the customer table.
Case
Using the CASE statement in SQL allows you to write logic equivalent to the IF / THEN logic you’d find elsewhere. The CASE statement adds lots of flexibility to writing queries and is very common in SQL. Note when using the CASE statement, whichever CASE conditions is met first for a given row will be used. Below is an example of two different CASE statements in a query.
SELECT
CASE WHEN price > 100 then ‘expensive’ WHEN price > 50 then ‘medium’ ELSE ‘cheap’ end as example_1,
CASE WHEN price > 0 then ‘everyone’ WHEN price > 50 then ‘no one’ WHEN price > 100 then ‘no one’ end as example_2
FROM public.jobs
In the example_1 column, the CASE statement will work properly. Anytime a price is over $100, it will have the result ‘expensive’. When the price is between $50 and $100, it will return the text ‘medium’. In all other cases, even if the price is NULL, it will return the text ‘cheap’. In the example_2 column, every row with a price > 0 will return the text ‘everyone’. The only rows that will return the text ‘no one’ are those with a NULL price or a price of $0. That’s because anytime the price is greater than $0, the first condition in the CASE statement is true and it returns the result ‘everyone’ for that row.
Union
Unlike using a JOIN, the UNION allows you to stack one dataset on top of another. In order to use a UNION, each of the tables you use need to have the same column names and data types. UNIONs are useful when your data is broken up into multiple tables. Below is an example using a UNION.
SELECT id, price
FROM public.jobs
UNION
SELECT id, price
FROM public.old_jobs
Window Functions
Window functions are similar to aggregate functions, except they allow you perform a calculation across a set of rows that are related to the current row. Window functions are incredibly useful when creating a column that relies on cumulative metrics. They vary widely depending on the type of SQL you are using, but in the example below we’ll be using PostgreSQL to create a column for cumulative sales by year.
SELECT DISTINCT
year,
SUM(price) OVER (ORDER by year) as cumulative_sales
FROM public.jobs
Note, you’ll often be using DISTINCT with window functions in order to remove duplicate rows.
Subqueries
Subqueries allow you to write SQL queries that would require multiple steps. For example, if you need to do an aggregation of an aggregation, you could do that using subqueries. Subqueries are incredibly helpful because similar to CASE statements, they provide you with much needed flexibility. Below are two examples of queries that use a subquery.
SELECT avg(price)
FROM (
SELECT
year,
sum(price) as price
FROM public.jobs
GROUP by year
) year_sales;
with year_sales AS (
SELECT
year,
sum(price) as price
FROM public.jobs
GROUP by year
)
SELECT avg(price)
FROM year_sales;
While there are some other SQL topics that we haven’t covered here, these are some of the most important fundamentals to be comfortable with when preparing for interviews. If you’d like to learn more about the SQL portion of the interview, or practice some SQL questions that you’re likely to see in interviews, sign up for free resources here.
Recent Comments