Advanced SQL queries you must know

The Top 15 Advanced SQL commands for Data Analysis and Data Engineering.

SQL system query

Table of index

Window functions
Common Table Expressions (CTEs)
Recursive queries
Pivot tables
Analytic functions
Unpivot
Conditional aggregation
Date functions
Merge statements
Case statements
String functions
Grouping sets
Cross joins
Inline views
Set operators

If you are into Data Analysis for a while now, you must be already aware of the basic commands such as SELECT, INSERT, UPDATE and DELETE etc, though they may be the most used it is also good to know the below queries for some deep diving into data.

1. Window functions

Window functions are used for calculations across a set of rows related to the current row. Let’s consider an example where we calculate the running total of sales using the SUM( ) function with the OVER( ) clause. Let us imagine we have a sales data table ‘Sales_Data’ that records the sales amounts on various dates. We want to calculate the running total of sales for each date, which means the total sales up to and including each date.

SELECT

    date, sales, SUM(sales) OVER (ORDER BY date) AS running_total

FROM

    sales_data;

This query provides a running total of sales over time, making it easy to track cumulative sales growth.

Output:

date       |sales |running_total

2023–01–01 | 100  | 100

2023–01–02 | 150  | 250

2023–01–03 | 200  | 450

2023–01–04 | 250  | 700

Window functions can be used for various tasks such as calculating running totals, moving averages, ranks, and more, without collapsing the result set into a single row per group.

2. Common Table Expressions (CTEs)

CTEs give a way to create temporary result sets that can be referenced within a query. They improve readability and simplify complex queries. Here’s how we can use a CTE to calculate the total revenue for each product category.

WITH category_revenue AS

(

    SELECT category, SUM(revenue) AS total_revenue

    FROM sales

    GROUP BY category

)

SELECT * FROM category_revenue;

The query defines a CTE named ‘category_revenue’. It calculates the total revenue for each category by summing up the revenue from the sales table and grouping the results by the category column.The main query selects all columns from the ‘category_revenue’ CTE, effectively displaying the calculated total revenue for each category.

Output:

category | total_revenue

A        | 5000

B        | 7000

C        | 4500

3. Recursive queries

Recursive queries enable travel of hierarchical data structures like organisational charts or bill of materials. Suppose we have a table representing employee relationships, and we want to find all the subordinates of a given manager.

WITH RECURSIVE subordinates AS

(

    SELECT employee_id, name, manager_id

    FROM employees

    WHERE manager_id = ‘manager_id_of_interest’

    UNION ALL

    SELECT e.employee_id, e.name, e.manager_id

    FROM employees e

    JOIN subordinates s ON e.manager_id = s.employee_id

)

SELECT * FROM subordinates;

This recursive CTE finds all employees who report directly or indirectly to a specific manager ‘manager_id_of_interest’. It starts with employees directly reporting to the manager and then recursively finds their subordinates, building the hierarchy.

Output:

employee_id | name    | manager_id

2           | Alice   | manager_id_of_interest

3           | Bob     | 2

4           | Charlie | 3

4. Pivot tables

Pivot tables transform rows into columns, summarising data in a tabular format. Let’s say we have a table containing sales data, and we want to pivot the data to display total sales for each product in different months.

SELECT

    product,

    SUM(CASE WHEN month = ‘Jan’ THEN sales ELSE 0 END) AS Jan,

    SUM(CASE WHEN month = ‘Feb’ THEN sales ELSE 0 END) AS Feb,

    SUM(CASE WHEN month = ‘Mar’ THEN sales ELSE 0 END) AS Mar

FROM

    sales_data

GROUP BY

    product;

This query aggregates sales data for each product by month using conditional aggregation. It sums the sales for January, February, and March separately for each product, resulting in a table showing total sales per product for these months.

Output:

product   | Jan | Feb | Mar

Product A | 100 | 200 | 150

Product B | 80  | 190 | 220

Product C | 60  | 140 | 130

5. Analytic functions

Analytic functions compute aggregate values based on a group of rows. For example, we can use the ROW_NUMBER( ) function to assign a unique row number to each record in a dataset.

SELECT

    customer_id, order_id,

    ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY order_date) AS order_rank

FROM

    orders;

This query assigns a unique rank to each order per customer based on the order date using the ROW_NUMBER( ) window function. The result shows the sequence of orders placed by each customer.

Output:

customer_id | order_id | order_rank

1           | 101      | 1

1           | 102      | 2

2           | 201      | 1

2           | 202      | 2

2           | 203      | 3

6. Unpivot

Unpivot is the opposite of pivoting, where columns are transformed into rows. Let’s say we have a table with sales data aggregated by month, and we want to unpivot it to analyse trends over time.

SELECT

    product, month, sales

FROM

    sales_data

UNPIVOT (sales FOR month IN (sales_jan AS ‘Jan’, sales_feb AS ‘Feb’, sales_mar AS ‘Mar’)) AS unpivoted_sales;

This query transforms the monthly sales columns into rows, making it easier to analyse trends over time by product. Each row represents a product’s sales for a specific month.

Output:

product  | month | sales

ProductA | Jan   | 100

ProductA | Feb   | 150

ProductA | Mar   | 200

ProductB | Jan   | 200

ProductB | Feb   | 250

ProductB | Mar   | 300

7. Conditional aggregation

Conditional aggregation involves applying aggregate functions conditionally based on specified criteria. For example, we might want to calculate the average sales amount only for orders placed by repeat customers.

SELECT

    customer_id,

    AVG(CASE WHEN order_count > 1 THEN order_total ELSE NULL END) AS avg_sales_repeat_customers

    FROM

    (

        SELECT customer_id, COUNT(*) AS order_count, SUM(order_total) AS order_total

        FROM orders

        GROUP BY customer_id

    ) AS customer_orders;

This query calculates the average order total for customers who have placed more than one order. It aggregates the order count and total order amount for each customer, then computes the average for repeat customers.

Output:

customer_id | avg_sales_repeat_customers

1           | 250

2           | 150

3           | 300

8. Date functions

Date functions in SQL allow manipulation and extraction of date-related information. For instance, we can use the DATE_TRUNC( ) function to group sales data by month.

SELECT

    DATE_TRUNC(‘month’, order_date) AS month, SUM(sales_amount) AS total_sales

FROM

    sales

GROUP BY

    DATE_TRUNC(‘month’, order_date);

This output shows the total sales amount total_sales aggregated for each month as month, where each month is represented by the first day of that month (e.g., 2023–01–01 for January). The total sales are summed up for each respective month.

Output:

month      | total_sales

2023–01–01 | 15000

2023–02–01 | 20000

2023–03–01 | 17500

2023–04–01 | 22000

9. Merge statements

Merge statements (also known as UPSERT or ON DUPLICATE KEY UPDATE) enable us to insert, update, or delete records in a target table based on the results of a join with a source table. Let’s say we want to synchronise two tables containing customer data.

MERGE INTO

    customers_target t

USING

    customers_source s

ON

    t.customer_id = s.customer_id

WHEN MATCHED THEN

    UPDATE SET t.name = s.name, t.email = s.email

WHEN NOT MATCHED THEN

    INSERT (customer_id, name, email) VALUES (s.customer_id, s.name, s.email);

Let’s consider an example of the data in the customers_target and customers_source tables.

customers_target (before merge):

customer_id | name       | email

1           | John Doe   | john@example.com

2           | Jane Smith | jane@example.com

customers_source

customer_id | name         | email

2           | Jane Johnson | jane.j@example.com

3           | Alice Brown  | alice@example.com

Output:

customers_target (after merge):

customer_id | name         | email

1           | John Doe     | john@example.com

2           | Jane Johnson | jane.j@example.com

3           | Alice Brown  | alice@example.com

The MERGE statement updates the customers_target table based on the customers_source table. If a customer_id in customers_source matches one in customers_target, the name and email are updated. If there is no match, a new row is inserted.

10. Case statements

Case statements allow for conditional logic within SQL queries. For example, we can use a case statement to categorize customers based on their total purchase amount.

SELECT

    customer_id,

CASE

    WHEN total_purchase_amount >= 1000 THEN ‘Platinum’

    WHEN total_purchase_amount >= 500 THEN ‘Gold’

ELSE

    ‘Silver’

END AS customer_category

FROM

(

    SELECT customer_id, SUM(order_total) AS total_purchase_amount

    FROM orders

    GROUP BY customer_id

) AS customer_purchases;

let’s consider an example data set and explain the output.

Example data from orders table:

customer_id | order_total

1           | 200

1           | 300

2           | 800

3           | 150

3           | 400

4           | 1200

Output:

customer_id | customer_category

1           | Gold

2           | Gold

3           | Silver

4           | Platinum

The query classifies customers into categories based on their total purchase amount. Customers with a total purchase amount of $1000 or more are labeled ‘Platinum’, those with $500 to $999 are labeled ‘Gold’, and those with less than $500 are labeled ‘Silver’.

11. String functions

String functions in SQL enable manipulation of text data. For instance, we can use the CONCAT( ) function to concatenate first and last names.

SELECT CONCAT(first_name, ‘ ‘, last_name) AS full_name

FROM employees;

Let’s consider an example data set and explain the output.

Example Data in the employees table:

first_name | last_name

John       | Doe

Jane       | Smith

Alice      | Johnson

Bob        | Brown

Output:

full_name

John Doe

Jane Smith

Alice Johnson

Bob Brown

The query concatenates the first_name and the last_name columns from the employees table with a space in between, creating a full_name for each employee.

12. Grouping sets

Grouping sets allow for the aggregation of data at multiple levels of granularity in a single query. Let’s say we want to calculate the total sales revenue by month and year.

SELECT

    YEAR(order_date) AS year, MONTH(order_date) AS month, SUM(sales_amount) AS total_revenue

FROM

    sales

GROUP BY

    GROUPING SETS ((YEAR(order_date), MONTH(order_date)), YEAR(order_date), MONTH(order_date));

Example Data in the sales table:

order_date | sales_amount

2023–01–15 | 1000

2023–01–20 | 1500

2023–02–10 | 2000

2023–03–05 | 2500

2024–01–10 | 3000

2024–01–20 | 3500

2024–02–25 | 4000

Output:

year | month | total_revenue

2023 | 1     | 2500

2023 | 2     | 2000

2023 | 3     | 2500

2024 | 1     | 6500

2024 | 2     | 4000

2023 | NULL  | 7000

2024 | NULL  | 10500

NULL | 1     | 9000

NULL | 2     | 6000

NULL | 3     | 2500

The query groups sales data by year and month, by year only, and by month only using GROUPING SETS. This results in subtotals for each month of each year, overall totals for each year, and overall totals for each month across all years.

13. Cross joins

Cross joins produce the Cartesian product of two tables, resulting in a combination of every row from each table. For example, we can use a cross join to generate all possible combinations of products and customers.

SELECT

    p.product_id, p.product_name, c.customer_id, c.customer_name

FROM

    products p

CROSS JOIN

    customers c;

Let’s consider an example data set for the products and customers tables.

products table:

product_id | product_name

1          | Product A

2          | Product B

customers table:

customer_id | customer_name

101         | Customer X

102         | Customer Y

Output:

product_id | product_name | customer_id | customer_name

1          | Product A    | 101         | Customer X

1          | Product A    | 102         | Customer Y

2          | Product B    | 101         | Customer X

2          | Product B    | 102         | Customer Y

The query performs a CROSS JOIN between the PRODUCTS and CUSTOMERS tables, resulting in a Cartesian product. This means every product is paired with every customer, generating all possible combinations of products and customers.

14. Inline views

Inline views (also known as derived tables) enable the creation of temporary result sets within a SQL query. Let’s say we want to find customers who made purchases exceeding the average order value.

SELECT

    customer_id, order_total

FROM

(

    SELECT customer_id, SUM(order_total) AS order_total

    FROM orders

    GROUP BY customer_id

) AS customer_orders

WHERE

    order_total >

    (

        SELECT AVG(order_total) FROM orders

    );

orders table:

customer_id | order_total

1           | 100

1           | 200

2           | 500

3           | 300

3           | 200

4           | 700

It calculates total order for each customer:

customer_id | order_total

1           | 300

2           | 500

3           | 500

4           | 700

then it calculates average order total across all orders.

Finally, filters customers with total orders above average:

Output:

customer_id | order_total

2           | 500

3           | 500

4           | 700

15. Set operators

Set operators like UNION, INTERSECT, and EXCEPT allow for combining the results of two or more queries. For example, we can use a UNION operator to merge the results of two queries into a single result set.

SELECT product_id, product_name FROM products

UNION

SELECT product_id, product_name FROM archived_products;

This query combines the results from the products and archived_products tables, eliminating any duplicate entries, to create a unified list of product IDs and names. The UNION operator ensures that each product appears only once in the final output.

Output:

product_id | product_name

1          | Chocolate Bar

2          | Dark Chocolate

3          | Milk Chocolate

4          | White Chocolate

5          | Almond Chocolate

Using these 15 advanced SQL techniques, you can solve complex data problems with ease and accuracy. Whether you’re a data analyst, engineer, or scientist, improving your SQL skills will make you much better at handling data.

If you found this post useful, make sure to clap, comment, subscribe and follow me for more such data related content on medium.com.

Happy Data Analysis!