Nobody likes dealing with a slow query. Besides being stressfull to wait for, it can be a huge impact in your database.
Some simpler approaches can help resolve this.
I want to start a series of three articles explaining about Tuning and architecture.
To begin, I want to start with the basics that work, some changes in the Query, sometimes, are already enough.
This is article 01 of 03:
- SQL Tuning — The Basic That Works
- SQL Tuning — SQL Server Architecture
- SQL Tuning — Analyzing and Improving Queries with Query Execution Plan
All examples were with the AdventureWorks2019 databases, and you can download it here.
Before any changes, I want to advance that the recommendation is to read the query Execution Plan and understand why the slowness occurs. Thus, we can take an initiative to solve the problem. In the third article in this series, I’ll take your hand, and we’ll understand, step by step, how to analyze a heavy query.
• • •
How to improve my query now?
Use TOP (SELECT TOP)
Helps limit the number of rows returned by the query, which can be useful to improve performance and reduce system load.
It can help when checking quick queries or only checking the latest records, without having to fetch the entire table.
It can avoid LOCKS and prevent the database from freezing, limiting the query won’t read the entire table.
Example of the last 100 records:
SELECT TOP 100
*
FROM [AdventureWorks2019].[Production].[TransactionHistory]
ORDER BY [TransactionDate] DESC
Avoid using DISTINCT
The use of DISTINCT can be costly in terms of performance. To avoid duplicate data, it is preferable to have a redesign in the table structure or indexes, adding columns that avoid duplicate fields.
When using DISTINCT, it is preferable that you understand what you are looking for and develop the query efficiently from that.
If DISTINCT is necessary, try to search for fields that are indexed.
Example of DISTINCT:
SELECT DISTINCT
[ProductID]
FROM [AdventureWorks2019].[Production].[TransactionHistory]
Avoid using OR in WHERE, use IN
The IN operator is more efficient than OR, especially when dealing with large data sets.
The OR operator is used to combine conditions in a WHERE clause. A query can have a condition like column = value1 OR column = value2, where records can meet any of the conditions.
The IN operator allows you to specify multiple values in a single condition. Instead of using multiple OR conditions, you can write something like column IN (value1, value2), which is a more efficient way.
Example of IN:
SELECT TOP 100
*
FROM [AdventureWorks2019].[Production].[TransactionHistory]
WHERE [ProductID] IN (358, 378)
-- WHERE [ProductID] = 358 or [ProductID] = 378
Avoid using *, use columns (SELECT [name], [size])
Specifying only the necessary columns is a good practice for performance issues. When you select all columns using SELECT *, you are retrieving all available information for each record, generating more traffic.
Also, by specifying the columns, you make the code clearer and less prone to errors when the table structure is modified, as the order or number of columns will not affect the query.
Example with selected columns:
SELECT TOP 1000
[ProductID],
[TransactionDate],
[ActualCost],
[Quantity]
FROM [AdventureWorks2019].[Production].[TransactionHistory]
WHERE [ProductID] IN (358, 378)
Use HAVING carefully
Filtering results in WHERE, whenever possible, is more efficient than using HAVING.
The WHERE clause is used to filter rows before they are grouped and before they are returned as the result of the query. With this, you reduce the amount of data that the database needs to process and group.
The HAVING clause is used to filter results after the grouping phase, which can cost much more. Now, if you need to filter aggregated values, HAVING is the correct option.
HAVING example:
SELECT TOP 1000
[ProductID],
[TransactionDate],
[ActualCost],
[Quantity],
SUM([ActualCost] * [Quantity]) AS TotalCost
FROM [AdventureWorks2019].[Production].[TransactionHistory]
WHERE [ProductID] IN (358, 378)
GROUP BY [ProductID], [TransactionDate], [ActualCost], [Quantity]
HAVING SUM([ActualCost] * [Quantity]) > 300
Check if the columns in your WHERE or JOIN have indexes
When you use a column in a WHERE clause to filter records, an index on that column can improve query performance. Indexes help the database access records more quickly, reducing the amount of data to be examined.
Columns used in join operations (JOIN) also benefit from indexes. If the columns used in joins are indexed, the database can perform the join more efficiently, avoiding the need to scan the entire table to identify values.
Example of how to check if the table has indexes:
EXEC sp_helpindex '[Production].[TransactionHistoryArchive]';
Use DESC in non-clustered indexes with date columns
Descending order in a non-clustered index on the date column can allow faster and more efficient access to the latest data, as the index is organized in that order.
Queries requesting the latest data can directly benefit from this organization, reducing the need for additional sorting during query execution.
Example of sorting:
CREATE NONCLUSTERED INDEX IX_TransactionHistory_TransactionDate_Desc
ON [Production].[TransactionHistory] ([TransactionDate] DESC);
Create non-clustered indexes on frequently accessed fields
This is particularly useful in queries that filter data based on specific values in these columns, as the non-clustered index points directly to the information.
I recommend monitoring the most executed queries in your database and identifying which columns are frequently accessed. It is exactly in these columns that you should create non-clustered indexes.
You can add multiple fields in the same index creation, and you can also add dependent fields in the INCLUDE.
Example of creating a non-clustered index with more than one field and INCLUDE:
CREATE NONCLUSTERED INDEX [IX_TransactionHistory_Fields_Include]
ON [Production].[TransactionHistory] ([TransactionDate] DESC, [ModifiedDate] DESC)
INCLUDE ([Quantity], [ActualCost])
This approach is better than creating an index for each field.
If you want to know more about index creation, click here!
Have a routine for maintaining indexes
It is not enough just to create indexes and abandon them; they also need attention.
When you use them a lot every day, they start to fragment, and statistics become outdated.
Therefore, having a routine that reorganizes indexes and updates statistics is essential to maintain query performance.
Example of REBUILD and REORGANIZE:
ALTER INDEX cix_your_table_id ON [your_table] REBUILD; -- CLUSTERED
ALTER INDEX ncix_your_table_name_date ON [your_table] REBUILD; -- NONCLUSTERED
ALTER INDEX cix_your_table_id ON [your_table] REORGANIZE; -- CLUSTERED
ALTER INDEX ncix_your_table_name_date ON [your_table] REORGANIZE; -- NONCLUSTERED
If you want to know more about fragmentation, click here!
To complement this topic, another tip is to run queries and maintenance routines outside peak hours.
Peak hours often mean higher competition for database resources, such as CPU, memory, and disk I/O. Executing intensive queries or maintenance routines during these periods can result in conflicts.
Partition your tables
Partitioning allows operations like insertions, updates, and deletions to affect only a part of the table, resulting in less impact and faster operations. Partitions can be distributed across different storage locations, disks, or even different servers.
Imagine a scenario where more recent and frequently accessed data only stays in partitions on SSD, and older data only stays in partitions on HDD. Brilliant, isn’t it?
It’s a more advanced technique, but well-used to handle large volumes of data.
I have an article explaining exactly how to do the solution I mentioned above, you can test and replicate it in your DB — click here!
• • •
And it doesn’t stop here
I gathered some learnings, tips from people, and other experiences while creating this article and decided to add the next tips as a bonus.
Create Stored Procedures for queries that feed reports or Dashboards
Standard queries are compiled and optimized every time they are executed. Stored Procedures have their execution plans optimized during creation, which can save processing time, and also give you time to write the query in the best way.
Not only performance, with Stored Procedures, we have benefits such as security, organization, and easier maintenance. Only those with access to the Stored Procedure can update it.
Example of Stored Procedure:
CREATE PROCEDURE GetTransactionByDate
@date DATETIME
AS
BEGIN
SELECT
[ProductID],
[TransactionDate],
[TransactionType],
[Quantity],
[ActualCost],
[ModifiedDate]
FROM [AdventureWorks2019].[Production].[TransactionHistoryArchive]
WHERE [TransactionDate] = @date;
END;
Remember that we created non-clustered indexes on these fields? This Procedure will run beautifully.
Create your Stagings using In-Memory OLTP
In-Memory OLTP allows storing tables in main memory instead of on disk.
By using In-Memory OLTP to store temporary data needed for reports, you can achieve significant performance gains, as reading and writing data directly in memory is faster than accessing data on disk.
This is particularly useful for reports and processes that use large volumes of data.
NOLOCK
It is widely used to specify that a SELECT statement should not lock resources while reading data from a table.
The main purpose of NOLOCK is to improve performance by allowing “non-blocking” reads, meaning the query can read data without waiting for exclusive locks to be released.
NOLOCK example:
SELECT [TransactionID]
,[ProductID]
,[ReferenceOrderID]
,[ReferenceOrderLineID]
,[TransactionDate]
,[TransactionType]
,[Quantity]
,[ActualCost]
,[ModifiedDate]
FROM [AdventureWorks2019].[Production].[TransactionHistoryArchive] WITH (NOLOCK)
Somewhat important, and we must take into consideration when using NOLOCK are dirty reads and phantom reads. This happens because when using NOLOCK, the query can read data that is in the process of being modified by other transactions.
There is an article that demonstrates the impact well — click here!
Check the use of MAXDOP (Degree of Parallelism)
Adjusting MAXDOP can optimize parallel queries, balancing the load on the server.
You can use it directly in the query or change the settings in the DBMS (Recommended)
MAXDOP example:
SELECT [TransactionID]
,[ProductID]
,[ReferenceOrderID]
,[ReferenceOrderLineID]
,[TransactionDate]
,[TransactionType]
,[Quantity]
,[ActualCost]
,[ModifiedDate]
FROM [AdventureWorks2019].[Production].[TransactionHistoryArchive]
OPTION (MAXDOP 8);
I recommend understanding your environment better, but this MAXDOP has already saved my weekend.
I hope this article can help you sleep and relieve the headache of query wait times, but remember: the best way to solve this problem is by understanding the Execution Plan and taking action on it — I am looking forward to the next articles, my friends!
Thanks and see ya!!! 💥