;

SQL is one of the most important skills a Data Scientist or Analyst can have as it allows you to query data from a relational database. After working with graduates from Data Science bootcamps, I’ve noticed that many are lacking the necessary SQL skills to succeed on the job. It seems as if many bootcamps are quickly covering SQL, but not emphasizing it enough. However, in almost every interview, candidates will be asked to show off their SQL skills and many are not prepared. So we’re going to discuss the basic fundamentals candidates should have mastered in order to ace the technical interview.

One other thing to note is that there are multiple types of SQL, but in the examples here we’re going to be using PostgreSQL. Most of the versions of SQL are very similar though, so once you are comfortable with one, it’s very easy to switch to another if needed.

Getting Started

Before we dive into some of the more advanced topics, it’s important to cover the first set of SQL skills you should have mastered. These include creating a query with the SELECT, FROM, and WHERE keywords. Other topics include AND, OR, NOT, IN, IS NULL, LIKE, and comparison operators. If you are not comfortable with any of these parts of SQL, I would definitely review them before continuing. Below is an example of a query using just the basics.

SELECT *

FROM public.jobs

WHERE id > 100

AND price < 50

AND customer_id IS NOT NULL

AND paid_with_cc IS TRUE;

Aggregations

SQL allows you to aggregate data using aggregate functions. This means you can do calculations such as summing, counting, or finding the average of a given column. You can perform the aggregation over the whole table, or you can group it by a given field. For example, you could find the sum of the price column in a given table, or you could find the sum of the price column for each year in the table. The two queries would look as follows:

SELECT  sum(price)

FROM public.jobs; 

SELECT

year,

sum(price)

FROM public.jobs

GROUP BY year

HAVING SUM(price) > 100;

As you can see, in the second query we’ve used the GROUP BY keyword to sum up the prices just in a given year. We’ve also included a HAVING condition to only keep in years that have the sum of the price > 100. Some of the aggregation keywords that you can use in SQL include SUM, COUNT, MIN, MAX, AVG, and MEDIAN.

Joins

It’s important in SQL to be able to join two or more tables together. When joining tables, you specify which column(s) in one table should match the column(s) in another. You also need to know the type of join you’d like to perform. The most common joins include INNER and LEFT, though I would recommend learning FULL OUTER as well. You should be able to understand the difference in the following queries.

SELECT  *

FROM public.jobs

JOIN public.customers

ON jobs.customer_id = customers.id;

SELECT  *

FROM public.jobs

LEFT JOIN public.customers

ON jobs.customer_id = customers.id;

In the first example, because it is an inner join, the query will only return rows where the job had a customer id that matched the id in the customers table. In the second example, the query will return all jobs, regardless of if they had a customer id that matched the id in the customer table.

Case

Using the CASE statement in SQL allows you to write logic equivalent to the IF / THEN logic you’d find elsewhere. The CASE statement adds lots of flexibility to writing queries and is very common in SQL. Note when using the CASE statement, whichever CASE conditions is met first for a given row will be used. Below is an example of two different CASE statements in a query.

SELECT 

CASE WHEN price > 100 then ‘expensive’ WHEN price > 50 then ‘medium’ ELSE ‘cheap’ end as example_1,

CASE WHEN price > 0 then ‘everyone’ WHEN price > 50 then ‘no one’ WHEN price > 100 then ‘no one’ end as example_2

FROM public.jobs 

In the example_1 column, the CASE statement will work properly. Anytime a price is over $100, it will have the result ‘expensive’. When the price is between $50 and $100, it will return the text ‘medium’. In all other cases, even if the price is NULL, it will return the text ‘cheap’. In the example_2 column, every row with a price > 0 will return the text ‘everyone’. The only rows that will return the text ‘no one’ are those with a NULL price or a price of $0. That’s because anytime the price is greater than $0, the first condition in the CASE statement is true and it returns the result ‘everyone’ for that row.

Union

Unlike using a JOIN, the UNION allows you to stack one dataset on top of another. In order to use a UNION, each of the tables you use need to have the same column names and data types. UNIONs are useful when your data is broken up into multiple tables. Below is an example using a UNION.

SELECT id, price

FROM public.jobs  

UNION

SELECT id, price

FROM public.old_jobs

Window Functions

Window functions are similar to aggregate functions, except they allow you perform a calculation across a set of rows that are related to the current row. Window functions are incredibly useful when creating a column that relies on cumulative metrics. They vary widely depending on the type of SQL you are using, but in the example below we’ll be using PostgreSQL to create a column for cumulative sales by year.

SELECT  DISTINCT

year,

SUM(price) OVER (ORDER by year) as cumulative_sales

FROM public.jobs

 Note, you’ll often be using DISTINCT with window functions in order to remove duplicate rows. 

Subqueries

Subqueries allow you to write SQL queries that would require multiple steps. For example, if you need to do an aggregation of an aggregation, you could do that using subqueries. Subqueries are incredibly helpful because similar to CASE statements, they provide you with much needed flexibility. Below are two examples of queries that use a subquery.

SELECT avg(price)

FROM (

SELECT 

year, 

sum(price) as price

FROM public.jobs  

GROUP by year

) year_sales;

with year_sales AS (

SELECT 

year, 

sum(price) as price

FROM public.jobs 

GROUP by year

) 

SELECT avg(price)

FROM year_sales;

While there are some other SQL topics that we haven’t covered here, these are some of the most important fundamentals to be comfortable with when preparing for interviews. If you’d like to learn more about the SQL portion of the interview, or practice some SQL questions that you’re likely to see in interviews, sign up for free resources here.