Computed Column Performance in SQL Server

The translation of the article was prepared especially for students of the course "MS SQL Server Developer" .










Calculated columns can cause difficult-to-diagnose performance issues. This article discusses a number of problems and some ways to solve them.



Calculated columns are a convenient way to embed calculations in table definitions. But they can cause performance problems, especially as expressions get more complex, applications become more demanding, and data volumes continue to grow.



A calculated column is a virtual column whose value is calculated based on the values โ€‹โ€‹in other columns in the table. By default, the computed value is not physically stored, but instead SQL Server computes it on every column request. This increases the load on the processor, but reduces the amount of data that needs to be retained when the table changes.



Often non-persistent computed columns are CPU intensive, slowing queries and freezing applications. Fortunately, SQL Server provides several ways to improve the performance of calculated columns. You can create persisted computed columns, index them, or do both.



For the demonstration, I created four similar tables and populated them with identical data from the WideWorldImporters demo database. Each table has the same computed column, but two tables have it persist and two have an index. The result is the following options:



  • Table Orders1is an unsaved computed column.
  • Table Orders2is a persistent calculated column.
  • The table Orders3is a non-persistent computed column with an index.
  • Table Orders4is a persistent computed column with an index.


The calculated expression is quite simple and the dataset is very small. However, it should be sufficient to demonstrate the principles of persistent and indexed computed columns and how this helps in solving performance problems.



Unsaved computed column



Perhaps in your situation you might want non-persistent computed columns to avoid storing data, creating indexes, or for use with a non-deterministic column. For example, SQL Server will treat a scalar UDF as non-deterministic if WITH SCHEMABINDING is missing from the function definition. If you try to create a persistent calculated column using this function, you will get an error that the persisted column cannot be created.



However, it should be noted that custom functions can create their own performance problems. If the table contains a calculated column with a function, then the Query Engine will not use concurrency (unless you are using SQL Server 2019). Even in a situation where the calculated column is not specified in the query. For a large dataset, this can have a big performance impact. Functions can also slow down the execution of UPDATEs and affect how the optimizer calculates the cost of a query on a computed column. This does not mean that you should never use functions on a computed column, but it definitely should be treated with caution.



Whether you use functions or not, creating a non-persistent computed column is pretty straightforward. Next instructionCREATE TABLEdefines a table Orders1that includes a calculated column Cost.



USE WideWorldImporters;
GO
DROP TABLE IF EXISTS Orders1;
GO
CREATE TABLE Orders1(
  LineID int IDENTITY PRIMARY KEY,
  ItemID int NOT NULL,
  Quantity int NOT NULL,
  Price decimal(18, 2) NOT NULL,
  Profit decimal(18, 2) NOT NULL,
  Cost AS (Quantity * Price - Profit));

INSERT INTO Orders1 (ItemID, Quantity, Price, Profit)
SELECT StockItemID, Quantity, UnitPrice, LineProfit
FROM Sales.InvoiceLines
WHERE UnitPrice IS NOT NULL
ORDER BY InvoiceLineID;


To define a calculated column, specify its name followed by the AS keyword and expression. In our example, we are multiplying Quantityby Priceand subtracting Profit. After creating the table, we fill it with INSERT using data from Sales.InvoiceLinesthe WideWorldImporters database table. Next, we execute SELECT.



SELECT ItemID, Cost FROM Orders1 WHERE Cost >= 1000;


This query should return 22,973 rows, or all rows that you have in the WideWorldImporters database. The execution plan for this query is shown in Figure 1.





Figure 1. The execution plan for the query on the Orders1 table



The first thing to note is the Clustered Index Scan, which is not an efficient way to get the data. But this is not the only problem. Let's take a look at the number of logical reads (Actual Logical Reads) in the properties of the Clustered Index Scan (see figure 2).





Figure 2. Logical reads to query the Orders1 table



The number of logical reads (in this case 1108) is the number of pages that have been read from the data cache. The goal is to try to reduce this number as much as possible. Therefore, it is useful to remember and compare it with other options.



The number of logical reads can also be obtained by running the statement SET STATISTICS IO ONbefore executing the SELECT. To view the CPU and total time - SET STATISTICS TIME ONor view the properties of the SELECT statement in the query execution plan.



Another point worth noting is that there are two Compute Scalar statements in the execution plan. The first (the one on the right) is the computation of the computed column value for each returned row. Since column values โ€‹โ€‹are calculated on the fly, you cannot avoid this step with non-persistent calculated columns unless you create an index on that column.



In some cases, a non-persistent computed column provides the required performance without storing it or using an index. This not only saves storage space, but it also avoids the overhead of updating calculated values โ€‹โ€‹in a table or index. However, more often than not, a non-persistent computed column leads to performance problems, and then you should start looking for an alternative.



Persistent calculated column



One technique often used to solve performance problems is to define a computed column as persisted. With this approach, the expression is calculated in advance and the result is stored along with the rest of the table data.



For a column to be persistent, it must be deterministic, that is, the expression must always return the same result for the same input. For example, you cannot use the GETDATE function in a column expression, because the return value is always changing.



To create a persistent calculated column, you must add a keyword to the column definition PERSISTED, as shown in the following example.



DROP TABLE IF EXISTS Orders2;
GO
CREATE TABLE Orders2(
  LineID int IDENTITY PRIMARY KEY,
  ItemID int NOT NULL,
  Quantity int NOT NULL,
  Price decimal(18, 2) NOT NULL,
  Profit decimal(18, 2) NOT NULL,
  Cost AS (Quantity * Price - Profit) PERSISTED);

INSERT INTO Orders2 (ItemID, Quantity, Price, Profit)
SELECT StockItemID, Quantity, UnitPrice, LineProfit
FROM Sales.InvoiceLines
WHERE UnitPrice IS NOT NULL
ORDER BY InvoiceLineID;


The table is Orders2almost identical to the table Orders1, except that the column Costcontains the keyword PERSISTED. SQL Server automatically populates this column as rows are added or modified. Of course, this means that the table Orders2will take up more space than the table Orders1. This can be verified using a stored procedure sp_spaceused.



sp_spaceused 'Orders1';
GO
sp_spaceused 'Orders2';
GO


Figure 3 shows the output of this stored procedure. The data size in the table Orders1is 8,824 KB, and in the table Orders2- 12,936 KB. 4,112 KB more to store the calculated values.





Figure 3. Comparison of the size of the Orders1 and Orders2 tables



Although these examples are based on a fairly small dataset, you can see how the amount of stored data can grow rapidly. However, this can be a trade-off if performance improves.



To see the difference in performance, do the following SELECT.



SELECT ItemID, Cost FROM Orders2 WHERE Cost >= 1000;


This is the same SELECT I used for the Orders1 table (except for the name change). Figure 4 shows the execution plan.





Figure 4. Execution plan for a query to the Orders2 table.



This also starts with the Clustered Index Scan. But this time, there is only one Compute Scalar statement because the computed columns no longer need to be computed at runtime. In general, the fewer steps the better. Although this is not always the case.



The second query generated 1593 logical reads, which is 485 more than 1108 reads for the first table. Despite this, it runs faster than the first. Although only about 100ms, and sometimes much less. The processor time has also decreased, but also not by much. Most likely, the difference would be much larger on larger volumes and more complex calculations.



Index on non-persistent computed column



Another technique that is commonly used to improve the performance of a computed column is indexing. To be able to create an index, the column must be deterministic and precise, which means that the expression cannot use the float and real types (if the column is not persistent). There are also restrictions on other data types as well as on SET parameters. For a complete list of restrictions, see the SQL Server documentation, Indexes on Computed Columns .



You can check if a non-persistent computed column is suitable for indexing through its properties. Let's use the function to view the properties COLUMNPROPERTY. The properties IsDeterministic, IsIndexable and IsPrecise are important to us.



DECLARE @id int = OBJECT_ID('dbo.Orders1')
SELECT
  COLUMNPROPERTY(@id,'Cost','IsDeterministic') AS 'Deterministic',
  COLUMNPROPERTY(@id,'Cost','IsIndexable') AS 'Indexable',
  COLUMNPROPERTY(@id,'Cost','IsPrecise') AS 'Precise';


The SELECT statement must return 1 for each property so that the calculated column can be indexed (see Figure 5).





Figure 5. Verifying that the index



can be created After verification, you can create a nonclustered index. Instead of modifying the table, Orders1I created a third table ( Orders3) and included the index in the table definition.



DROP TABLE IF EXISTS Orders3;
GO
CREATE TABLE Orders3(
  LineID int IDENTITY PRIMARY KEY,
  ItemID int NOT NULL,
  Quantity int NOT NULL,
  Price decimal(18, 2) NOT NULL,
  Profit decimal(18, 2) NOT NULL,
  Cost AS (Quantity * Price - Profit),
  INDEX ix_cost3 NONCLUSTERED (Cost, ItemID));

INSERT INTO Orders3 (ItemID, Quantity, Price, Profit)
SELECT StockItemID, Quantity, UnitPrice, LineProfit
FROM Sales.InvoiceLines
WHERE UnitPrice IS NOT NULL
ORDER BY InvoiceLineID;


I created a non-clustered covering index that includes both columns from ItemIDand Costfrom a SELECT query. After creating and populating the table and index, you can execute the following SELECT statement similar to the previous examples.



SELECT ItemID, Cost FROM Orders3 WHERE Cost >= 1000;


Figure 6 shows the execution plan for this query, which now uses the ix_cost3 (Index Seek) nonclustered index instead of performing a clustered index scan.





Figure 6. Execution plan for a query on the Orders3 table



If you look at the properties of the Index Seek statement, you will find that the query now only performs 92 logical reads, and in the properties of the SELECT statement, you will see that the CPU and total time has decreased. The difference is not significant, but again, this is a small dataset.



It should also be noted that there is only one Compute Scalar statement in the execution plan, not two as in the first query. Since the calculated column is indexed, the values โ€‹โ€‹have already been calculated. This eliminates the need to compute values โ€‹โ€‹at runtime, even if the column has not been defined to be persistent.



Index on the stored column



You can also create an index on the computed column you are saving. While this will result in additional data and index data being stored, it can be useful in some cases. For example, you can create an index on a persistent computed column, even if it uses the float or real data types. This approach can also be useful when working with CLR functions, and when it is not possible to check if the functions are deterministic.



The following statement CREATE TABLEcreates a table Orders4. The table definition includes both a persistent column Costand a nonclustered covering index ix_cost4.



DROP TABLE IF EXISTS Orders4;
GO
CREATE TABLE Orders4(
  LineID int IDENTITY PRIMARY KEY,
  ItemID int NOT NULL,
  Quantity int NOT NULL,
  Price decimal(18, 2) NOT NULL,
  Profit decimal(18, 2) NOT NULL,
  Cost AS (Quantity * Price - Profit) PERSISTED,
  INDEX ix_cost4 NONCLUSTERED (Cost, ItemID));

INSERT INTO Orders4 (ItemID, Quantity, Price, Profit)
SELECT StockItemID, Quantity, UnitPrice, LineProfit
FROM Sales.InvoiceLines
WHERE UnitPrice IS NOT NULL
ORDER BY InvoiceLineID;


After the table and index are created and populated, execute SELECT.



SELECT ItemID, Cost FROM Orders4 WHERE Cost >= 1000;


Figure 7 shows the execution plan. As in the previous example, the query starts with a non-clustered index search (Index Seek).





Figure 7. Execution plan for a query on the Orders4 table



This query also performs only 92 logical reads as the previous one, which results in approximately the same performance. The main difference between the two computed columns, and between indexed and non-indexed columns, is the amount of space used. Let's check this by running the stored procedure sp_spaceused.



sp_spaceused 'Orders1';
GO
sp_spaceused 'Orders2';
GO
sp_spaceused 'Orders3';
GO
sp_spaceused 'Orders4';
GO


The results are shown in Figure 8. As expected, the computed columns stored have more data and the indexed columns have more indexes.





Figure 8. Comparison of space utilization for all four tables



Most likely, you will not need to index the stored computed columns for no good reason. As with other database-related questions, your choice should be based on your particular situation: your queries and the nature of your data.



Working with Calculated Columns in SQL Server



The calculated column is not a regular table column and should be handled with care to avoid degrading performance. Most performance issues can be resolved through storing or indexing the column, but both approaches need to consider the extra disk space and how the data changes. When the data changes, the calculated column values โ€‹โ€‹must be updated in the table or index, or both, if you indexed the persisted calculated column. You can only decide which of the options is best for your specific case. And, most likely, you will have to use all the options.





Read more






All Articles