How do you improve SQL query performance? That is a big question, and one that we get asked all the time. There is no one answer, but there is a process that we apply to make a difference in query performance. In this post, I will discuss some of the questions we ask, some of the diagnostics we run, and some of the steps we take to reduce the amount of time a query takes.
The questions to ask are similar for any relational database software, so this discussion will apply to Oracle, SQL Server, MySQL, PostgreSQL, and others. I may mention tools or processes by a database-vendor specific name but, for the most part, each software vendor has something that is equivalent.
Query tuning is a complex and iterative process, so no blog post, including this one, would be comprehensive. The objective is to help you understand how to think about tuning from a broader perspective rather than looking only at the query in question, and is more about concepts than syntax.
Questions to Ask When Looking to Improve SQL Query Performance
To narrow down where the problems are with a SQL query, we start with some basic questions about the query and how it is being executed. I will discuss each question and talk about why we ask it, and what information the answer might give us. None of these questions will tell us definitively what the problem is, but they can point us quickly in the right direction and save precious time when a client is waiting for improved response time.
Timeframe
Is the query that we are interested in (hereafter referred to as “our query”) executed during a period when the system is heavily taxed by other processes?
- Why we ask: If our query is executed during a very busy time, then the problem may not be with our query at all. Reducing load on the system by examining other queries first (using this same strategy) may be more effective. So we would start by identifying and examining the most resource intensive queries first, to try to reduce overall system load.
Proximity and Size
Does our query take the same amount of time whether it is executed locally or remotely?
- Why we ask: If our query is executed remotely (executed in a browser or application on a server other than the database server) and if it returns a large number of rows, then it is possible that the data transfer is the bottleneck, rather than the retrieval of the data from the database. Asking this question may help us take the network out of the equation.
Result Set Characteristics
When our query completes, does it return a large number (millions?) of rows?
- Why we ask: When executing our query locally, if it takes a long time to complete, there are two possibilities. Either it takes a long time for the database software to find the data to return, or it takes a long time to return the data to the screen or the application. The former can be fixed by tuning the query; the latter may mean that our query is returning too many rows to be practical. In the latter case, we should revisit the intent of the query to see if an aggregated form of the data would be more usable, or if breaking the result set up into more manageable chunks makes sense. Also, a very large result set may be an indication of an error in the query itself, perhaps a missing join, or missing criteria resulting in a Cartesian product. In this case, we would look at the logic being expressed in the query and ensure that it matches the intent of the query.
Is the result set both large and aggregated or sorted?
- Why we ask: Aggregation and sorting on large result sets require significant temporary space. If this is a significant part of the query operations, we want to look at the management of memory buffers, and temp space (System Global Area (SGA), Program Global Area (PGA) and temporary segments or their equivalents). We want to make sure that enough memory is allocated so that we are not excessively writing out to temp space, and that temp space is optimally sized and located.
Is the result set a (relatively) small subset of a large amount of data?
- Why we ask: If the database is very large, and if our query returns a small subset of the data, there are two broad solutions that may be applicable: adding or optimizing indexes, and adding or optimizing partitioning. Up to a certain data size, proper indexing alone can provide adequate performance. When data gets very large, however, a combination of indexes and partitions will be necessary to provide adequate performance when querying a subset of the data.
Historical
Has the performance of the query degraded over time?
- Why we ask: If the query performed well in the past, but no longer does, look at the growth rates of data in the tables referenced by the query. If the amount of data has increased significantly, new indexes may be required that were not necessary when less data was referenced. Significant data growth may also result in optimizer statistics that no longer reflect the characteristics of the data, requiring a refresh of these statistics if they are not automatically refreshed.
Does the data being queried involve many updates or deletes (as opposed to mostly inserts)?
- Why we ask: Data that is frequently updated may result in index or tablespace fragmentation. This may also result in invalid statistics as in the case of significant data growth.
Conclusion
Query tuning is an iterative process and there are many other questions to ask as we get into the details. But the above questions help us see the big picture and can steer us in the right direction very quickly and help prevent us from going down the wrong path and wasting time.
If you have any other questions that you like to ask when tuning that you’d like to share, or if you have an interesting tuning story, please share in the comments.