Introduction
A common table expression (CTE) is a temporary named result set that you can reference within a SELECT, INSERT, UPDATE, or DELETE statement in SQL Server. CTEs were introduced in SQL Server 2005 and are similar to derived tables and views.
CTEs are defined by specifying a CTE name and query expression after a WITH clause. The CTE query executes first, populates the CTE with data, and then you can use the CTE name throughout the main query as if it were a table or view.
Some key advantages of CTEs
- Modularization and reuse: CTE queries can be modularized into separate CTE blocks that can be easily reused in different parts of the query. This improves readability and maintainability.
- Recursive queries: CTEs allow you to write recursive queries where a CTE references itself. This is useful for hierarchical or tree data.
- Simpler semantics: CTEs use less complex SQL syntax than derived tables and are easier to read and write.
- Better performance: CTEs handle optimizations better than nested views and subqueries. The optimizer can materialize CTE results in tempdb improving performance.
When to use CTEs?
Here are some common use cases where CTEs are helpful:
- Breaking down complex queries - as mentioned, CTEs allow you to break down complex logic into simpler modular blocks, improving readability.
- Reusable query logic - once you define a CTE, you can reference it multiple times in the query. This eliminates repetitive logic.
- Hierarchical queries - CTEs can recursively reference themselves to query hierarchical data like org charts, folders, etc.
- Replace views - in some cases, a CTE can achieve the same goal as a view but with better performance, as the optimizer can tailor the CTE query better.
- Replace derived tables - derived tables can be replaced by CTEs to simplify the query syntax.
- Data investigation/exploration - since CTE definitions are localized to one statement, they can be useful for ad hoc data investigation before permanent table storage.
Now let's look at some examples to demonstrate how to write and use CTEs.
Syntax
WITH CTE_Name (Column1, Column2)
AS
(
-- CTE Definition using SELECT
)
SELECT *
FROM CTE_Name
The CTE is defined using a WITH clause before the main SELECT statement. You specify the CTE name and optional column list after the WITH keyword. The AS keyword signifies the start of the CTE definition query.
After defining the CTE, you can reference it in the main SELECT statement as if it were a regular table or view.
Let's look at a simple example.
WITH Sales_CTE (SalesPerson, SalesAmount)
AS
(
SELECT SalesPerson, SUM(SalesAmount)
FROM Sales
GROUP BY SalesPerson
)
SELECT *
FROM Sales_CTE
This CTE summarizes sales per person into a temporary result set named Sales_CTE. The main query simply selects from the CTE to display the sales summary.
Multiple CTEs
You can define multiple CTEs in a single query by listing them sequentially after the WITH clause:
WITH Sales_CTE (SalesPerson, SalesAmount)
AS
(
SELECT SalesPerson, SUM(SalesAmount)
FROM Sales
GROUP BY SalesPerson
),
TopSales_CTE (TopSalesPerson, TopSalesAmount)
AS
(
SELECT TOP 1 SalesPerson, SalesAmount
FROM Sales_CTE
ORDER BY SalesAmount DESC
)
SELECT *
FROM TopSales_CTE
Here we define two CTEs - Sales_CTE and TopSales_CTE. The second CTE references the first CTE. The main query selects the top salesperson from the second CTE.
Recursive CTE example
One of the key benefits of CTEs is the ability to write recursive queries. Here is an example to find all managers and employees in a hierarchy.
WITH Managers_CTE (EmployeeID, ManagerID, EmployeeName, ManagerName, Level)
AS
(
-- Anchor member
SELECT e.EmployeeID, e.ManagerID, e.EmployeeName, m.EmployeeName,
0 AS Level
FROM Employees e
INNER JOIN Employees m
ON e.ManagerID = m.EmployeeID
UNION ALL
-- Recursive member that references CTE name
SELECT e.EmployeeID, e.ManagerID, e.EmployeeName, m.EmployeeName,
Level + 1
FROM Employees e
INNER JOIN Managers_CTE m
ON e.ManagerID = m.EmployeeID
)
-- Outer query
SELECT *
FROM Managers_CTE
- The anchor member defines the root level of the hierarchy.
- The recursive member joins back to the CTE name to get to the next level.
- UNION ALL combines each round of recursion.
- Outer query returns the final resultset.
This builds the org hierarchy iteratively until all levels are retrieved.
CTE with INSERT example
In addition to SELECT, CTEs can be used with data modification statements like INSERT.
WITH Sales_CTE (SalesID, SalesPersonID, SalesAmount)
AS
(
SELECT SalesID, SalesPersonID, SalesAmount
FROM Sales
WHERE SalesDate = '20180901'
)
INSERT INTO SalesByDay (SalesDate, SalesPersonID, SalesAmount)
SELECT '20180901', SalesPersonID, SalesAmount
FROM Sales_CTE
This inserts sales for a specific date into a separate SalesByDay table using a CTE as the data source.
CTE with UPDATE example
You can also leverage CTEs with updated statements.
WITH Sales_CTE (SalesID, SalesAmount)
AS
(
SELECT SalesID, SalesAmount
FROM Sales
WHERE SalesDate = '20180901'
)
UPDATE SalesByDay
SET SalesAmount = Sales_CTE.SalesAmount
FROM SalesByDay
INNER JOIN Sales_CTE
ON SalesByDay.SalesID = Sales_CTE.SalesID
WHERE SalesByDay.SalesDate = '20180901'
Here we populate matching rows in another table using values from the CTE.
CTE with DELETE example
Similarly, CTEs can be utilized with DELETE statements.
WITH InactiveSales_CTE (SalesID, SalesDate)
AS
(
SELECT SalesID, SalesDate
FROM Sales
WHERE SalesDate < '20180101'
)
DELETE SalesByDay
FROM SalesByDay
WHERE SalesID IN (SELECT SalesID
FROM InactiveSales_CTE)
This deletes related rows in another table based on inactive sales data from the CTE.
Temporary CTE benefits
A key benefit of CTEs is that they are temporary named result sets that only exist during query execution. This provides several advantages.
- No need to persist CTEs in the database, unlike views or permanent tables. This reduces storage overhead.
- Can reference CTEs multiple times in a statement without repetitive subqueries or joins. Improves maintainability.
- Optimizer can tailor a temporary CTE query plan, unlike a persisted view which has a fixed query plan.
- Can replace inline derived tables and views to simplify and improve query semantics.
- Great for ad hoc data investigation before determining permanent tables.
In summary, CTEs are very useful in SQL Server for simplifying complex logic, improving query readability, handling recursive queries, and temporarily staging data transformations for business reporting and analysis. As you gain more experience with SQL Server, be sure to add CTEs to your development toolbox.
Features of CTEs
- Improved readability and maintainability: CTEs enhance the readability of complex queries by breaking them into smaller logical sections. This is especially useful when dealing with queries involving multiple joins, subqueries, or complex calculations. The segmented structure makes it easier to understand and troubleshoot the query.
- Modularity and reusability: CTEs enable the creation of modular SQL code. You can define CTEs for specific tasks or calculations and then reuse them across different parts of the main query. This promotes code reusability, reduces redundancy, and simplifies the modification of specific parts of the query.
- Recursive queries: CTEs are ideal for building recursive queries, where a query references itself to traverse hierarchical or recursive data structures. This is commonly used for tasks like navigating organizational charts, product categories, or tree-like data.
- Self-Joins and Window Functions Simplification: When dealing with self-joins or complex calculations involving window functions, CTEs provide a clearer and more organized way to express the logic. They break down intricate operations into manageable steps, leading to more concise and readable code.
- Code organization and reusability: Complex subqueries can be defined within CTEs, allowing for cleaner code organization. This organization makes it easier to understand the purpose of each part of the query, leading to improved maintainability.
- Optimization opportunities: In some cases, SQL Server's query optimizer can optimize CTEs more effectively than equivalent subqueries. This optimization can lead to better execution plans and potentially improved performance.
Limitations of CTEs
- Single-Statement scope: CTEs are scoped to a single SQL statement. They cannot be referenced across different statements in the same batch. This limitation can restrict their use in complex scenarios that involve multiple related statements.
- Performance considerations: While CTEs enhance query organization, they may not always result in the most optimal execution plans. In certain cases, complex CTEs can lead to performance issues, especially when dealing with large datasets or intricate queries.
- Memory usage: Recursive CTEs, which are used for hierarchical or recursive queries, can consume significant memory, particularly when dealing with deep hierarchies. This can lead to performance degradation if memory usage is not managed effectively.
- Lack of indexing support: CTEs do not support indexing. Unlike temporary tables, CTEs do not allow you to create indexes to improve query performance. This can be a limitation when working with large datasets that require efficient access patterns.
- Nested CTEs and complexity: Nesting multiple CTEs within each other can lead to complex and challenging-to-maintain code. Overuse of nesting can make the query difficult to understand, debug, and optimize.
- Limited use in Stored Procedures: CTEs are more commonly used in ad-hoc queries. While they can be used within stored procedures, their single-statement scope can sometimes be restrictive when working with multiple statements in a procedure.
- Complexity management: While CTEs enhance query readability, they can also introduce complexity, especially when dealing with deeply nested or highly recursive queries. Overusing CTEs might lead to code that is harder to understand and maintain.
Conclusion
Common Table Expressions (CTEs) in SQL Server offer valuable features that enhance the readability, modularity, and organization of complex queries. Their ability to handle recursive operations and simplify self-joins and window functions makes them a versatile tool for developers. However, it's crucial to be aware of the limitations, such as single-statement scope, performance considerations, and lack of indexing support. By understanding both the features and limitations of CTEs, developers can leverage them effectively to create optimized and maintainable SQL code. Properly using CTEs requires a balance between leveraging their advantages and mitigating potential drawbacks.