Search  
Always will be ready notify the world about expectations as easy as possible: job change page
Articles
Jun 14

Advanced SQL queries you must know

Advanced SQL queries you must know
Source:
Views:
949

The Top 15 Advanced SQL commands for Data Analysis and Data Engineering.

SQL system query

Table of index

  1. Window functions
  2. Common Table Expressions (CTEs)
  3. Recursive queries
  4. Pivot tables
  5. Analytic functions
  6. Unpivot
  7. Conditional aggregation
  8. Date functions
  9. Merge statements
  10. Case statements
  11. String functions
  12. Grouping sets
  13. Cross joins
  14. Inline views
  15. Set operators

If you are into Data Analysis for a while now, you must be already aware of the basic commands such as SELECT, INSERT, UPDATE and DELETE etc, though they may be the most used it is also good to know the below queries for some deep diving into data.

1. Window functions

Window functions are used for calculations across a set of rows related to the current row. Let’s consider an example where we calculate the running total of sales using the SUM( ) function with the OVER( ) clause. Let us imagine we have a sales data table ‘Sales_Data’ that records the sales amounts on various dates. We want to calculate the running total of sales for each date, which means the total sales up to and including each date.

SELECT
    date, sales, SUM(sales) OVER (ORDER BY date) AS running_total
FROM
    sales_data;

This query provides a running total of sales over time, making it easy to track cumulative sales growth.

Output:

date       |sales |running_total
2023–01–01 | 100  | 100
2023–01–02 | 150  | 250
2023–01–03 | 200  | 450
2023–01–04 | 250  | 700

Window functions can be used for various tasks such as calculating running totals, moving averages, ranks, and more, without collapsing the result set into a single row per group.

2. Common Table Expressions (CTEs)

CTEs give a way to create temporary result sets that can be referenced within a query. They improve readability and simplify complex queries. Here’s how we can use a CTE to calculate the total revenue for each product category.

WITH category_revenue AS
(
    SELECT category, SUM(revenue) AS total_revenue
    FROM sales
    GROUP BY category
)
SELECT * FROM category_revenue;

The query defines a CTE named ‘category_revenue’. It calculates the total revenue for each category by summing up the revenue from the sales table and grouping the results by the category column.The main query selects all columns from the ‘category_revenue’ CTE, effectively displaying the calculated total revenue for each category.

Output:

category | total_revenue
A        | 5000
B        | 7000
C        | 4500

3. Recursive queries

Recursive queries enable travel of hierarchical data structures like organisational charts or bill of materials. Suppose we have a table representing employee relationships, and we want to find all the subordinates of a given manager.

WITH RECURSIVE subordinates AS
(
    SELECT employee_id, name, manager_id
    FROM employees
    WHERE manager_id = ‘manager_id_of_interest’
    UNION ALL
    SELECT e.employee_id, e.name, e.manager_id
    FROM employees e
    JOIN subordinates s ON e.manager_id = s.employee_id
)
SELECT * FROM subordinates;

This recursive CTE finds all employees who report directly or indirectly to a specific manager ‘manager_id_of_interest’. It starts with employees directly reporting to the manager and then recursively finds their subordinates, building the hierarchy.

Output:

employee_id | name    | manager_id
2           | Alice   | manager_id_of_interest
3           | Bob     | 2
4           | Charlie | 3

4. Pivot tables

Pivot tables transform rows into columns, summarising data in a tabular format. Let’s say we have a table containing sales data, and we want to pivot the data to display total sales for each product in different months.

SELECT
    product,
    SUM(CASE WHEN month = ‘Jan’ THEN sales ELSE 0 END) AS Jan,
    SUM(CASE WHEN month = ‘Feb’ THEN sales ELSE 0 END) AS Feb,
    SUM(CASE WHEN month = ‘Mar’ THEN sales ELSE 0 END) AS Mar
FROM
    sales_data
GROUP BY
    product;

This query aggregates sales data for each product by month using conditional aggregation. It sums the sales for January, February, and March separately for each product, resulting in a table showing total sales per product for these months.

Output:

product   | Jan | Feb | Mar
Product A | 100 | 200 | 150
Product B | 80  | 190 | 220
Product C | 60  | 140 | 130

5. Analytic functions

Analytic functions compute aggregate values based on a group of rows. For example, we can use the ROW_NUMBER( ) function to assign a unique row number to each record in a dataset.

SELECT
    customer_id, order_id,
    ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY order_date) AS order_rank
FROM
    orders;

This query assigns a unique rank to each order per customer based on the order date using the ROW_NUMBER( ) window function. The result shows the sequence of orders placed by each customer.

Output:

customer_id | order_id | order_rank
1           | 101      | 1
1           | 102      | 2
2           | 201      | 1
2           | 202      | 2
2           | 203      | 3

6. Unpivot

Unpivot is the opposite of pivoting, where columns are transformed into rows. Let’s say we have a table with sales data aggregated by month, and we want to unpivot it to analyse trends over time.

SELECT
    product, month, sales
FROM
    sales_data
UNPIVOT (sales FOR month IN (sales_jan AS ‘Jan’, sales_feb AS ‘Feb’, sales_mar AS ‘Mar’)) AS unpivoted_sales;

This query transforms the monthly sales columns into rows, making it easier to analyse trends over time by product. Each row represents a product’s sales for a specific month.

Output:

product  | month | sales
ProductA | Jan   | 100
ProductA | Feb   | 150
ProductA | Mar   | 200
ProductB | Jan   | 200
ProductB | Feb   | 250
ProductB | Mar   | 300

7. Conditional aggregation

Conditional aggregation involves applying aggregate functions conditionally based on specified criteria. For example, we might want to calculate the average sales amount only for orders placed by repeat customers.

SELECT
    customer_id,
    AVG(CASE WHEN order_count > 1 THEN order_total ELSE NULL END) AS avg_sales_repeat_customers
    FROM
    (
        SELECT customer_id, COUNT(*) AS order_count, SUM(order_total) AS order_total
        FROM orders
        GROUP BY customer_id
    ) AS customer_orders;

This query calculates the average order total for customers who have placed more than one order. It aggregates the order count and total order amount for each customer, then computes the average for repeat customers.

Output:

customer_id | avg_sales_repeat_customers
1           | 250
2           | 150
3           | 300

8. Date functions

Date functions in SQL allow manipulation and extraction of date-related information. For instance, we can use the DATE_TRUNC( ) function to group sales data by month.

SELECT
    DATE_TRUNC(‘month’, order_date) AS month, SUM(sales_amount) AS total_sales
FROM
    sales
GROUP BY
    DATE_TRUNC(‘month’, order_date);

This output shows the total sales amount total_sales aggregated for each month as month, where each month is represented by the first day of that month (e.g., 2023–01–01 for January). The total sales are summed up for each respective month.

Output:

month      | total_sales
2023–01–01 | 15000
2023–02–01 | 20000
2023–03–01 | 17500
2023–04–01 | 22000

9. Merge statements

Merge statements (also known as UPSERT or ON DUPLICATE KEY UPDATE) enable us to insert, update, or delete records in a target table based on the results of a join with a source table. Let’s say we want to synchronise two tables containing customer data.

MERGE INTO
    customers_target t
USING
    customers_source s
ON
    t.customer_id = s.customer_id
WHEN MATCHED THEN
    UPDATE SET t.name = s.name, t.email = s.email
WHEN NOT MATCHED THEN
    INSERT (customer_id, name, email) VALUES (s.customer_id, s.name, s.email);

Let’s consider an example of the data in the customers_target and customers_source tables.

customers_target (before merge):

customer_id | name       | email
1           | John Doe   | john@example.com
2           | Jane Smith | jane@example.com

customers_source

customer_id | name         | email
2           | Jane Johnson | jane.j@example.com
3           | Alice Brown  | alice@example.com

Output:

customers_target (after merge):

customer_id | name         | email
1           | John Doe     | john@example.com
2           | Jane Johnson | jane.j@example.com
3           | Alice Brown  | alice@example.com

The MERGE statement updates the customers_target table based on the customers_source table. If a customer_id in customers_source matches one in customers_target, the name and email are updated. If there is no match, a new row is inserted.

10. Case statements

Case statements allow for conditional logic within SQL queries. For example, we can use a case statement to categorize customers based on their total purchase amount.

SELECT
    customer_id,
CASE
    WHEN total_purchase_amount >= 1000 THEN ‘Platinum’
    WHEN total_purchase_amount >= 500 THEN ‘Gold’
ELSE
    ‘Silver’
END AS customer_category
FROM
(
    SELECT customer_id, SUM(order_total) AS total_purchase_amount
    FROM orders
    GROUP BY customer_id
) AS customer_purchases;

let’s consider an example data set and explain the output.

Example data from orders table:

customer_id | order_total
1           | 200
1           | 300
2           | 800
3           | 150
3           | 400
4           | 1200

Output:

customer_id | customer_category
1           | Gold
2           | Gold
3           | Silver
4           | Platinum

The query classifies customers into categories based on their total purchase amount. Customers with a total purchase amount of $1000 or more are labeled ‘Platinum’, those with $500 to $999 are labeled ‘Gold’, and those with less than $500 are labeled ‘Silver’.

11. String functions

String functions in SQL enable manipulation of text data. For instance, we can use the CONCAT( ) function to concatenate first and last names.

SELECT CONCAT(first_name, ‘ ‘, last_name) AS full_name
FROM employees;

Let’s consider an example data set and explain the output.

Example Data in the employees table:

first_name | last_name
John       | Doe
Jane       | Smith
Alice      | Johnson
Bob        | Brown

Output:

full_name
John Doe
Jane Smith
Alice Johnson
Bob Brown

The query concatenates the first_name and the last_name columns from the employees table with a space in between, creating a full_name for each employee.

12. Grouping sets

Grouping sets allow for the aggregation of data at multiple levels of granularity in a single query. Let’s say we want to calculate the total sales revenue by month and year.

SELECT
    YEAR(order_date) AS year, MONTH(order_date) AS month, SUM(sales_amount) AS total_revenue
FROM
    sales
GROUP BY
    GROUPING SETS ((YEAR(order_date), MONTH(order_date)), YEAR(order_date), MONTH(order_date));

Example Data in the sales table:

order_date | sales_amount
2023–01–15 | 1000
2023–01–20 | 1500
2023–02–10 | 2000
2023–03–05 | 2500
2024–01–10 | 3000
2024–01–20 | 3500
2024–02–25 | 4000

Output:

year | month | total_revenue
2023 | 1     | 2500
2023 | 2     | 2000
2023 | 3     | 2500
2024 | 1     | 6500
2024 | 2     | 4000
2023 | NULL  | 7000
2024 | NULL  | 10500
NULL | 1     | 9000
NULL | 2     | 6000
NULL | 3     | 2500

The query groups sales data by year and month, by year only, and by month only using GROUPING SETS. This results in subtotals for each month of each year, overall totals for each year, and overall totals for each month across all years.

13. Cross joins

Cross joins produce the Cartesian product of two tables, resulting in a combination of every row from each table. For example, we can use a cross join to generate all possible combinations of products and customers.

SELECT
    p.product_id, p.product_name, c.customer_id, c.customer_name
FROM
    products p
CROSS JOIN
    customers c;

Let’s consider an example data set for the products and customers tables.

products table:

product_id | product_name
1          | Product A
2          | Product B

customers table:

customer_id | customer_name
101         | Customer X
102         | Customer Y

Output:

product_id | product_name | customer_id | customer_name
1          | Product A    | 101         | Customer X
1          | Product A    | 102         | Customer Y
2          | Product B    | 101         | Customer X
2          | Product B    | 102         | Customer Y

The query performs a CROSS JOIN between the PRODUCTS and CUSTOMERS tables, resulting in a Cartesian product. This means every product is paired with every customer, generating all possible combinations of products and customers.

14. Inline views

Inline views (also known as derived tables) enable the creation of temporary result sets within a SQL query. Let’s say we want to find customers who made purchases exceeding the average order value.

SELECT
    customer_id, order_total
FROM
(
    SELECT customer_id, SUM(order_total) AS order_total
    FROM orders
    GROUP BY customer_id
) AS customer_orders
WHERE
    order_total >
    (
        SELECT AVG(order_total) FROM orders
    );

orders table:

customer_id | order_total
1           | 100
1           | 200
2           | 500
3           | 300
3           | 200
4           | 700

It calculates total order for each customer:

customer_id | order_total
1           | 300
2           | 500
3           | 500
4           | 700

then it calculates average order total across all orders.

Finally, filters customers with total orders above average:

Output:

customer_id | order_total
2           | 500
3           | 500
4           | 700

15. Set operators

Set operators like UNION, INTERSECT, and EXCEPT allow for combining the results of two or more queries. For example, we can use a UNION operator to merge the results of two queries into a single result set.

SELECT product_id, product_name FROM products
UNION
SELECT product_id, product_name FROM archived_products;

This query combines the results from the products and archived_products tables, eliminating any duplicate entries, to create a unified list of product IDs and names. The UNION operator ensures that each product appears only once in the final output.

Output:

product_id | product_name
1          | Chocolate Bar
2          | Dark Chocolate
3          | Milk Chocolate
4          | White Chocolate
5          | Almond Chocolate

Using these 15 advanced SQL techniques, you can solve complex data problems with ease and accuracy. Whether you’re a data analyst, engineer, or scientist, improving your SQL skills will make you much better at handling data.

If you found this post useful, make sure to clap, comment, subscribe and follow me for more such data related content on medium.com.

Happy Data Analysis!

Similar
Jun 7
Author: Dev Leader
In software engineering, composition plays an important role in object-oriented programming. If you’ve come across other articles or videos I’ve put out about object-oriented programming, you’ll know that I try to push composition over inheritance as much as I can....
Jun 13, 2023
One of the things that we may struggle with as developers when working on a green field project is our stack. Choosing the right tech to solve a problem can be a harrowing experience. Databases, in particular, can be a...
Nov 22
Author: Sylvain Tiset
Earlier I presented one useful design pattern to migrate to a monolithic application to microservices. This pattern is the Strangler Fig pattern and the article can be found here. Here some other specific microservices design patterns will be presented. What...
Aug 16
Author: Eli McGarvie
Everyone hates being ghosted, but in the tech world, when we're talking recruiters and candidates, some of whom maybe some of the best in the world at what they do, it's next-level frustrating. For them, they spend days crafting the...
Send message
Type
Email
Your name
*Message