7 Ways To Improve SQL Query Performance

7 Ways To Improve SQL Query Performance

How do you improve SQL query performance? That is a big question, and one that we get asked all the time. There is no one answer, but there is a process that we apply to make a difference in query performance. In this post, I will discuss some of the questions we ask, some of the diagnostics we run, and some of the steps we take to reduce the amount of time a query takes. 

The questions to ask are similar for any relational database software, so this discussion will apply to Oracle, SQL Server, MySQL, PostgreSQL, and others. I may mention tools or processes by a database-vendor specific name but, for the most part, each software vendor has something that is equivalent. 

Query tuning is a complex and iterative process, so no blog post, including this one, would be comprehensive. The objective is to help you understand how to think about tuning from a broader perspective rather than looking only at the query in question, and is more about concepts than syntax.

Questions to Ask When Looking to Improve SQL Query Performance

To narrow down where the problems are with a SQL query, we start with some basic questions about the query and how it is being executed. I will discuss each question and talk about why we ask it, and what information the answer might give us. None of these questions will tell us definitively what the problem is, but they can point us quickly in the right direction and save precious time when a client is waiting for improved response time.

Timeframe 

Is the query that we are interested in (hereafter referred to as “our query”) executed during a period when the system is heavily taxed by other processes?

  • Why we ask: If our query is executed during a very busy time, then the problem may not be with our query at all.  Reducing load on the system by examining other queries first (using this same strategy) may be more effective. So we would start by identifying and examining the most resource intensive queries first, to try to reduce overall system load. 

Proximity and Size

Does our query take the same amount of time whether it is executed locally or remotely?

  • Why we ask: If our query is executed remotely (executed in a browser or application on a server other than the database server) and if it returns a large number of rows, then it is possible that the data transfer is the bottleneck, rather than the retrieval of the data from the database. Asking this question may help us take the network out of the equation.

Result Set Characteristics 

When our query completes, does it return a large number (millions?) of rows?

  • Why we ask: When executing our query locally, if it takes a long time to complete, there are two possibilities. Either it takes a long time for the database software to find the data to return, or it takes a long time to return the data to the screen or the application. The former can be fixed by tuning the query; the latter may mean that our query is returning too many rows to be practical. In the latter case, we should revisit the intent of the query to see if an aggregated form of the data would be more usable, or if breaking the result set up into more manageable chunks makes sense. Also, a very large result set may be an indication of an error in the query itself, perhaps a missing join, or missing criteria resulting in a Cartesian product. In this case, we would look at the logic being expressed in the query and ensure that it matches the intent of the query. 

Is the result set both large and aggregated or sorted?

  • Why we ask:  Aggregation and sorting on large result sets require significant temporary space. If this is a significant part of the query operations, we want to look at the management of memory buffers, and temp space (System Global Area (SGA), Program Global Area (PGA) and temporary segments or their equivalents). We want to make sure that enough memory is allocated so that we are not excessively writing out to temp space, and that temp space is optimally sized and located.

Is the result set a (relatively) small subset of a large amount of data?

  • Why we ask:  If the database is very large, and if our query returns a small subset of the data, there are two broad solutions that may be applicable: adding or optimizing indexes, and adding or optimizing partitioning. Up to a certain data size, proper indexing alone can provide adequate performance. When data gets very large, however, a combination of indexes and partitions will be necessary to provide adequate performance when querying a subset of the data. 

Historical

Has the performance of the query degraded over time?

  • Why we ask:  If the query performed well in the past, but no longer does, look at the growth rates of data in the tables referenced by the query. If the amount of data has increased significantly, new indexes may be required that were not necessary when less data was referenced. Significant data growth may also result in optimizer statistics that no longer reflect the characteristics of the data, requiring a refresh of these statistics if they are not automatically refreshed.

Does the data being queried involve many updates or deletes (as opposed to mostly inserts)?

  • Why we ask: Data that is frequently updated may result in index or tablespace fragmentation. This may also result in invalid statistics as in the case of significant data growth. 

Conclusion

Query tuning is an iterative process and there are many other questions to ask as we get into the details. But the above questions help us see the big picture and can steer us in the right direction very quickly and help prevent us from going down the wrong path and wasting time.

If you have any other questions that you like to ask when tuning that you’d like to share, or if you have an interesting tuning story, please share in the comments. 

Database Patch News — March 2021 (Issue 7)

Database Patch News — March 2021 (Issue 7)

Welcome to Database Patch News, Buda Consulting’s newsletter of current patch information for Oracle and Microsoft SQL Server. Here you’ll find information recently made available on patches—including security patches—and desupported versions.

Why should you care about patching vulnerabilities and bugs? Two big reasons:

  1. Unpatched systems are a top cyber attack target. Patch releases literally advertise vulnerabilities to the hacker community. The longer you wait to patch, the greater your security risk. 
  2. Along with running a supported database version, applying the latest patches ensures that you can get support from the vendor in case of an issue. Patching also helps eliminate downtime and lost productivity associated with bugs. 

Here are the latest patch updates for Oracle and SQL Server:

Oracle Patches:

January 19, 2021 Quarterly Patch Updates:

21c – Released January 13, 2021, Version 21.1; no Quarterly patch yet

19c – Release Update 19.10 is available (32218494 and 321266828)

18c – Release Update 18.13 is available (32204699 and 32126855)

12cR2 – Release Update 210119 is available (32228578 and 32126871)

Regular support ends in Mar 2023 and extended support ends in Mar 2026.

12cR1 – Release Update 210119 is available (32132231 and 32126908)

Regular support ended in July 2019 and extended support ends in July 2021.

11gR4 – Patch Set Update 201020 is available (31720776)

Regular support ended in October 2018 and extended support ended December 31, 2020.

 

SQL Server Patches:

SQL Server 2019

Cumulative update 9 (Latest build) Released Feb 2, 2021
Mainstream support ends Jan 7, 2025
Extended support ends Jan 8, 2030


SQL Server 2017

Cumulative update 23 (Latest build) Released Feb 24, 2021
Mainstream support ends Oct 11, 2022|
Extended support ends Oct 12, 2027


SQL Server 2016 Service Pack 2

Cumulative update 16 Release date: Feb 11, 2021
Mainstream support ends Jul 13, 2021
Extended support ends Jul 14, 2026


SQL Server 2014 Service Pack 3

Cumulative update 4 Release date: Jan 12, 2021
Mainstream support ended Jul 9, 2019
Extended support ends Jul 9, 2024


SQL Server 2012 Service Pack 4

Release date: Oct 5, 2017
Mainstream support ended Jul 11, 2017
Extended support ends Jul 12, 2022

Note: All other SQL Server versions not mentioned are no longer supported.

 

Relational Database Design: It’s All About The Set

Relational Database Design: It’s All About The Set

The Lost Science Of Relational Algebra And Set Theory

I originally wrote this post in 2011. Much has changed in the database technology landscape since then. Big Data Technologies such as Hadoop have gone mainstream, cloud technology and is changing how and where we think about hosting our databases.

But relational databases are still relied upon as the best option for rich transactional data.

So, since this technology is still the foundation of our mission critical systems, we should understand how to take advantage of one of the foundational elements of relational technology: The Set.

The SQL language (Structured Query Language) was built upon relational algebra. This rigorous approach to query definition is largely about set theory. This post is not a detailed technical discussion of relational algebra or set theory, instead it is about the way that relational databases are often misused.

The purpose of this article is to discuss the central theme of relational database technology and one of its greatest strengths. One that is often overlooked by those practicing Oracle Database Design or SQL Server Database Design and Database Development. I am talking about Set Theory. Relational Databases like Oracle and SQL Server are built and optimized to process sets of rows, as opposed to individual rows. Many application developers, even those that use these relational tools, struggle to think beyond the individual row. That is why the major relational database vendors have created very powerful procedural languages such as PL/SQL and T/SQL.

In many cases, developers use these tools to step row by row through a dataset (by using cursors) because they may not understand how the set operators work. This approach leads to unnecessary development and higher maintenance costs, as well a poor performance.

There are definitely times when a procedural process is necessary. But often times there are set-based alternatives that would be more efficient and much easier to develop.

In this post, I will focus on three core set operators: Union, Intersect, and Difference.

First some definitions:

Given two sets, Set A and Set B

Union:  All records from set A and all records from Set B.  If a record exists in both sets, it will only appear once in the Union. (Areas A, B, and C in figure 1).

Intersection: The unique set of records that exist in both set A and set B (Area C in figure 1).

Difference: The difference between Set A and Set B are all the records in Set A except those that also exist in Set B. (Area A in figure 1).

Vendor Differences

Relational databases implement these operators in different ways, but they all provide a relatively simple way to combine and compare similar sets of data. Oracle has the Union, Intersect, and Minus operators. SQL Server has Union, Intersect, and Except operators.

MySql has the ability to perform these operations as well, but it is more complex. For example, in order to do a difference operation, you must use a not exists or not in operator, resulting in a more complex sql statement.

Example

Lets examine how Oracle implements each of these set operations with a simple example.

This post is intended to discuss the concepts so I did not include the data and the actual query results in the post. But you can download the script to create and populate the tables with test data and run the queries here: set_tables_sql

Suppose you collect bank account events (debits, credits) from multiple sources. You place them into one common table, but also maintain the original source records in separate tables for historical purposes.  The original source records never change, but the events in the common table can be modified as necessary by the users.

Now suppose that occasionally you need to compare the transactional data in the common table to the original source data to see which rows have been changed. This is very easy using set operators.

The tables that we will use for this example follow. I used different column names in each table to illustrate that the column names do not need to be the same in each set that you are comparing. However, the number of columns in each query and the data types in each query must be the same.

Table Definitions

CREATE TABLE Event
(
Event_Id NUMBER,
Event_Name VARCHAR2(30),
Event_Description VARCHAR2(255),
Data_Source_location VARCHAR2(30),
Event_Date DATE
);

CREATE TABLE Event_Source_1
(
Event_Id_Orig NUMBER,
Event_Name_Orig VARCHAR2(30),
Event_Description_Orig VARCHAR2(255),
Data_Source_location_Orig VARCHAR2(30),
Event_Date_Orig DATE
);

CREATE TABLE Event_Source_2
(
Event_Id_Orig NUMBER,
Event_Name_Orig VARCHAR2(30),
Event_Description_Orig VARCHAR2(255),
Data_Source_location_Orig VARCHAR2(30),
Event_Date_Orig DATE
);

Example 1 — Union: Now suppose you needed to display all event names that appear in Event Source 1 and Event Source 2. The Union operator will display records from both tables, but records appearing in both tables will only appear once (unless the union all operator is specified, in which case duplicates will be displayed).

SELECT Event_Name_Orig FROM Event_Source_1
UNION
SELECT Event_Name_Orig FROM Event_Source_2;

Example 2 — Intersection: Now suppose you needed to display only events from Source 1 that have remained unchanged in the Event table. This can be done with an intersection between Event and Event_Source_1.

SELECT Event_Name,Event_Description,Data_Source_Location FROM Event
INTERSECT
SELECT Event_Name_Orig,Event_Description_Orig,Data_Source_Location_Orig FROM Event_Source_1;

Example 3  —  Difference: Now suppose you want to know all Data Source Locations that appear in the original Data Source 2 data but not in the original Data Source 1 data. This can be done by using the difference operation, implemented with the Minus operator by Oracle. This will take all the records from one set and subtract those that also exist in another set.

SELECT Event_Name_Orig,Event_Description_Orig,Data_Source_Location_Orig FROM Event_Source_1
MINUS
SELECT Event_Name_Orig,Event_Description_Orig,Data_Source_Location_Orig FROM Event_Source_2

Database Design Considerations

These powerful operators can be used to reduce or eliminate the need for cursors in many cases. The usefulness of these operators is dependent on sound database design and a well-normalized table structure. For example, a table that has repeating columns designating the same data element (as opposed to using multiple rows) will render these operators much less useful.

Conclusion

With careful database design and a good understanding of the Set management tools provided by the relational vendors, we can simplify and speed development and reduce maintenance costs. Lets think in terms in sets and get the most out of our relational database investment!

If you would like to discuss set theory or relational database design, please give me a call at (888) 809-4803 x 700 and if you have further thoughts on the topic, please add comments!

If you enjoyed this article please like and share!

Database Patch News — March 2021 (Issue 7)

Database Patch News — February 2021 (Issue 6)

Welcome to Database Patch News, Buda Consulting’s newsletter of current patch information for Oracle and Microsoft SQL Server. Here you’ll find information recently made available on patches—including security patches—and desupported versions.

Why should you care about patching vulnerabilities and bugs? Two big reasons:

  • Unpatched systems are a top cyber attack target. Patch releases literally advertise vulnerabilities to the hacker community. The longer you wait to patch, the greater your security risk.
  • Along with running a supported database version, applying the latest patches ensures that you can get support from the vendor in case of an issue. Patching also helps eliminate downtime and lost productivity associated with bugs.

Here are the latest patch updates for Oracle and SQL Server:

Oracle Patches:

January 19, 2021 Quarterly Patch Updates:
21c – Released January 13, 2021, Version 21.1; no Quarterly patch yet

19c – Release Update 19.10 is available (32218494 and 321266828)

18c – Release Update 18.13 is available (32204699 and 32126855)

12cR2 – Release Update 210119 is available (32228578 and 32126871)
Regular support ends in Mar 2023 and extended support ends in Mar 2026.

12cR1 – Release Update 210119 is available (32132231 and 32126908)
Regular support ended in July 2019 and extended support ends in July 2021.

11gR4 – Patch Set Update 201020 is available (31720776)
Regular support ended in October 2018 and extended support ended December 31, 2020.

SQL Server Patches:

SQL Server 2019
Cumulative update 8 (Latest build) Released Oct 1, 2020
Mainstream support ends Jan 7, 2025
Extended support ends Jan 8, 2030

SQL Server 2017
Cumulative update 22 (Latest build) Released Sept 10, 2020
Mainstream support ends Oct 11, 2022
Extended support ends Oct 12, 2027

SQL Server 2016 Service Pack 2
Cumulative update 15 Release date: Sept 28, 2020
Mainstream support ends Jul 13, 2021
Extended support ends Jul 14, 2026

SQL Server 2014 Service Pack 3
Cumulative update 4 Release date: Feb 11, 2019
Mainstream support ended Jul 9, 2019
Extended support ends Jul 9, 2024

SQL Server 2012 Service Pack 4
Release date: Oct 5, 2017
Mainstream support ended Jul 11, 2017
Extended support ends Jul 12, 2022

Note: All other SQL Server versions not mentioned are no longer supported.

Database Patch News — March 2021 (Issue 7)

Database Patch News — December 2020 (Issue 5)

Welcome to Database Patch News, Buda Consulting’s newsletter of current patch information for Oracle and Microsoft SQL Server. Here you’ll find information recently made available on patches—including security patches—and desupported versions.

Why should you care about patching vulnerabilities and bugs? Two big reasons:

  1. Unpatched systems are a top cyber attack target. Patch releases literally advertise vulnerabilities to the hacker community. The longer you wait to patch, the greater your security risk. 
  2. Along with running a supported database version, applying the latest patches ensures that you can get support from the vendor in case of an issue. Patching also helps eliminate downtime and lost productivity associated with bugs. 

Here are the latest patch updates for Oracle and SQL Server:

Oracle Patches:

October 20, 2020 Quarterly Patch Updates:

19c – Release Update 19.9 is available (31771877 & 31668882)

18c – Release Update 18.12 is available (31730250 & 31668892)

12cR2 – Release Update 201020 is available (31741641 & 31668898)

Regular support ends in Mar 2023 and extended support ends in Mar 2026.

12cR1 – Release Update 201020 is available (31550110 & 31668915)

Regular support ended in July 2019 and extended support ends in July 2021.

11gR4 – Patch Set Update 201020 is available (31720776)

Regular support ended in October 2018 and extended support ends in December 2020.

 

SQL Server Patches:

SQL Server 2019

Cumulative update 8 (Latest build) Released Oct 1, 2020

Mainstream support ends Jan 7, 2025

Extended support ends Jan 8, 2030

 

SQL Server 2017

Cumulative update 22 (Latest build) Released Sept 10, 2020

Mainstream support ends Oct 11, 2022

Extended support ends Oct 12, 2027

 

SQL Server 2016 Service Pack 2

Cumulative update 15 Release date: Sept 28, 2020

Mainstream support ends Jul 13, 2021

Extended support ends Jul 14, 2026

 

SQL Server 2014 Service Pack 3

Cumulative update 4 Release date: Feb 11, 2019

Mainstream support ended Jul 9, 2019

Extended support ends Jul 9, 2024

 

SQL Server 2012 Service Pack 4

Release date: Oct 5, 2017

Mainstream support ended Jul 11, 2017

Extended support ends Jul 12, 2022


Note: All other SQL Server versions not mentioned are no longer supported.

 

SQL Server Always Encrypted—Is It Right for My Data?

lock icon, encryptionChances are your company holds sensitive transactional data like personal, financial or health records. This data is often subject to regulations and is also coveted by cybercriminals. Unfortunately, unless encrypted it is vulnerable to compromise both at rest on the server that stores it, and in transit to/from clients that request it.

For Microsoft SQL Server users, Transparent Data Encryption (TDE) has long been available to protect data at rest in the event that database files or backups are compromised. But TDE still leaves encrypted data vulnerable to a privileged user who can access it via the database.

Further, TDE offers nothing to secure SQL Server data in transit. HTTPS can help, but still leaves data exposed to man-in-the-middle (MITM) and other attacks.

To help mitigate these security weaknesses, SQL Server 2016 introduced the Always Encrypted feature to protect sensitive data at rest, in motion and even in memory. Outside an application client’s connection, the data remains encrypted—across the network, server, storage and database.

With Always Encrypted, only users and applications with access to valid keys can decrypt the data, so MTM attacks, insider threats, etc. are nullified. In addition, Always Encrypted lets you encrypt data at the column level, whereas TDE requires encrypting the entire database.

Who should use Always Encrypted?

New privacy regulations and other compliance and data governance mandates are driving more and more SQL Server users to leverage Always Encrypted. It can be a big help with separating those who own and can view the data from those who manage the data but should otherwise not access it.

For example, Always Encrypted lets you protect Personal Information (PI) that you store in the cloud. Even when the data is outside your direct control, the cloud service provider, third-party DBAs, your on-premises admins and other privileged users can’t access it.

Because it is available with all SQL Server 2016 and newer editions (as of SP1), you can try Always Encrypted without purchasing additional licenses. However, the real investment lies in modifying the client application, which will handle the encryption/decryption outside the SQL Server or Azure SQL Database environment.

Always Encrypted cryptography

Always Encrypted is aptly named: from the standpoint of the database engine the data is literally “always encrypted.” But some queries on encrypted data are still supported, depending on the column-level encryption setting.

You can choose either of two encryption types:

  1. Deterministic encryption, which always generates the same encrypted values. This enables the database engine to perform specific operations (point lookups, grouping, equality joins, and also supports indexing.
  2. Randomized encryption, which is even more secure but makes the data write/display-only. Thus, it can’t be evaluated or used in any database operations and cannot be indexed.

The downside of deterministic encryption is that unauthorized users could potentially extrapolate information about the data in encrypted columns by examining patterns within the encrypted values, especially if the set of possible values is small (e.g., true/false). Randomized encryption generates a less predictable result, but precludes indexing, searching, grouping or joining operations on encrypted columns.

For columns of sensitive data that will be used as search or grouping parameters (e.g., US Social Security Numbers), you’ll need to use deterministic encryption. Randomized encryption can be used for data that isn’t grouped with other records and won’t be used to join tables.

History and enhancements

As noted above, Always Encrypted has been available across all SQL Server and Azure SQL Database editions and SQL Database service tiers since SQL Server 2016 (13.x) SP1. Prior to that, it was only available in the Enterprise and Developer Editions.

Beginning with SQL Server 2019 (15.x), the new Always Encrypted with secure enclaves feature extends the database operations you can perform on encrypted data to include pattern matching, other comparison operators and in-place encryption.

A secure enclave is basically a protected memory area specifically for processing Always Encrypted data within SQL Server. To the rest of the SQL Server engine, and to other processes on the host machine, a secure enclave is a “black box” that shields all code and data inside from outside processes, including debuggers.

To validate a secure enclave before sending it encryption keys, etc., both the client-side driver and SQL Server must contact an external attestation service. The process also relies on “enclave-enabled” column master and encryption keys.

While not without complexities and performance overhead, secure enclaves significantly extend the kinds of operations you can perform on Always Encrypted data.

How Always Encrypted works

Before it can process Always Encrypted data, a client application must be configured to use an approved driver, which will automatically encrypt and decrypt columns of sensitive data. The driver encrypts the data before passing it to the database engine. It also needs to automatically rewrite queries to preserve their semantics. When the database returns query results, the driver transparently decrypts these before storing them.

This works because only the client-side application, not the SQL Server environment, has access to the encryption keys. Other applications can retrieve the encrypted values but can do nothing with them.

To encrypt a column of data with Always Encrypted, you must first generate a column encryption key and a column master key. The former encrypts the data and the latter encrypts the former.

The column encryption key resides on the SQL Server instance, while the database engine stores metadata pointing to the column master key’s location. The column master key itself resides in the Windows Certificate Store, Azure Key Vault, a hardware security module or some other trusted external key store. The database engine never sees or stores either key as plaintext.

Always Encrypted use cases

What are some good reasons to try Always Encrypted? Here a few of the top use cases:

  • To improve security of data in transit beyond what SSL can provide.
  • To meet the demands of regulated industries like financial services and telecommunications around protecting Personally Identifiable Information (PII) like credit card numbers, customer names/addresses, etc.
  • To improve security when outsourcing SQL Server DBA services, Always Encrypted ensures separation of duties between third-party DBAs and in-house application administrators.
  • To improve security in scenarios where on-premises client instances need to access sensitive data stored on Microsoft Azure. Because the column master key resides on-premises in a trusted key store, Microsoft admins cannot access the cloud-based data.
  • To improve security in situations where both the client application and database environment are hosted on Azure (as SQL Database or SQL Server instances running in a virtual machine). In this case, both the data and keys are potentially exposed to Microsoft admins on the cloud platform hosting the client tier. However, the data is still always encrypted in the cloud database.

In short, Always Encrypted makes the most sense for protecting PII and other types of data that you need to store, but don’t want to search on or display to application users.

Limitations with Always Encrypted

As noted above, Always Encrypted limits the kinds of operations you can perform on data; e.g., copying data between columns using UPDATE, SELECT INTO, etc. It also restricts the data types you can encrypt. Some of the excluded types include XML, IMAGE, TEXTNTEXT and GEOGRAPHY, as well as user-defined data types.

If you need to query Always Encrypted data from multiple applications, such as reporting tools and data warehouses, you will need to give them access to the appropriate database drivers and decryption certificates. This can be more challenging if you need to replicate encrypted data to other SQL Server instances.

Performance can also be a limiting factor with using Always Encrypted. Encryption/decryption obviously involve compute and storage overhead that will impact both the duration of queries and the volume of storage your application needs. You’ll probably want to do some testing with your specific CPU, RAM and disk setup to gauge the production impact.

Next steps

Want to talk with a database security expert before you dive into Always Encrypted? Contact Buda Consulting to schedule a free consultation.