Why alter table add NOT NULL column takes long time?
Adding a NOT NULL column to the existing table will run for longer time for high volume tables. Because SQL Server has to scan all the pages of the table and allocate a slot for the new column. A page split would occur to accommodate the new column if the data page is already full or has insufficient space. All the indexes on the table have to be updated to reflect the changes in the page number when page split occurs. The entire process will generate lot of log records in the transaction log. Overall SQL Server has many things to do when adding a non-null column to an existing table, this is the reason why it takes longer time for large tables. Starting from SQL Server 2012 it is instantaneous as it only makes the changes in metadata.
This post will cover the improvement in SQL Server 2012 for adding a not null column.
The following example will be used to compare the differences between SQL Server 2008 and 2016 for adding a non-null column to an existing table.
1.Create a table and populate with sample data
create table dbo.Product ( ProductId bigint, Price decimal(12,2) ) insert into dbo.Product select message_id, rand()*100*message_id from sys.messages
2.Clear SQL Server buffer cache and transaction logs
CHECKPOINT DBCC DROPCLEANBUFFERS
3.Enable statistics io and time
set statistics io, time on
4.Execute the alter statement in both SQL Server 2008 and 2016
alter table dbo.Product add manufacturedate date not null constraint df_mfgdate default '1/1/2010'
The execution of the ALTER statement completed instantly in SQL Server 2016 whereas it took a while to complete the same statement in SQL Server 2008.
The amount of work performed by the two versions of SQL Server is compared in the following image. The STATISTICS IO shows that SQL Server 2008 scanned all the pages of the Product table and finished in 7.6 seconds but SQL Server 2016 didn’t perform table scan and almost the change was instant and finished in 0.02 seconds.
In the prior versions of SQL 2012, the time required to add a non-null column to the existing table will be propositional to the table size. So it would require more time (even hours) to complete the ALTER statement when the table size is in gigabytes.
The same can be confirmed from the SQL Server buffer cache. At the time of executing ALTER statements, there was nothing in the cache as CHECKPOINT followed by DBCC DROPCLEANBUFFERS statements were executed. The execution of the ALTER statement in SQL Server 2008 brought all the pages into memory and modified to add the new column as shown in the following image (is_modified). Whereas SQL Server 2016 did not have to scan the table to accomplish the task.
How it works in SQL Server 2012+ versions
Starting from SQL Server 2012 adding a not null column with default constant value is only a metadata change. The default value of the column is stored in internal table and it will be referenced whenever a query is executed to access the rows. The default value can be seen from the system table “system_internals_partition_columns”
SELECT Object_name(p.object_id), ColumnType = Type_name(pc.system_type_id), pc.is_nullable, has_default, default_value FROM sys.system_internals_partitions p INNER JOIN sys.system_internals_partition_columns pc ON p.partition_id = pc.partition_id WHERE p.object_id = Object_id('dbo.Product');
If the default value provided for the column is not a constant then the ALTER table will not benefit from the “metadata only” change. For example adding a rowversion column to an existing table still needs to scan all the pages to store the rowversion value for each row. Using NEWID and user-defined functions for DEFAULT value is also not an instant operation. If the default value is a number, string, or a date including getdate() function will be stored in internal table and it will be an instant change.
Examining the table’s pages provide further understanding of how this change is applied when the ALTER statement is executed.
The following image is the output of the DBCC PAGE function before altering the table from SQL Server 2016. ProductId 21 is taken as a sample to explore further, as shown there are only two columns for a row. Slot 0 has ProductId, Slot 1 has the Price column and the total record size is 24 bytes.
The below image is taken after adding a not null column from SQL Server 2016. The “manufacturedate” column is displayed from the output of DBCC PAGE, but space for the column is not allocated in the page as the Offset is 0x0 and the physical length is still 0. The record size in the page header is not changed and it remains 24 bytes.
The space will be allocated for the column only when some changes happen to the row. It can be any column of the row. For example when the Price column is updated the space will be allocated for the newly added column.
UPDATE Product SET Price=424.42 WHERE ProductId=21
The below image is the output of DBCC PAGE after executing the update statement. The space for the new column is allocated and it is confirmed as the record size has changed from 24 to 27 and the Offset 0x0 to 0x15 and the physical length is also changed to 3.
In SQL Server 2008 as shown in the following images the ALTER statement has changed the page records at the time of the schema change. The first image was taken before altering the table, the later is after the ALTER statement. The page updated with the new column and space is allocated as part of the alter statement itself.
It is clear that adding not-null column to an existing table is faster in SQL Server 2012 version onwards. But the important point to remember is it is not applicable for all type of DEFAULT values. As mentioned earlier adding a rowversion column or using NEWID as a DEAULT value to an existing table still requires to scan the entire table and update every rows.