Partition is a feature introduced in SQL Server 2005. Partition is a method to slice the data to many smaller chunks from larger table. There are many benefits by partitioning a large table. When a table is partitioned each partition is stored in separate filegroup or within the same filegroup. Maintenance can be performed on individual partition instead of entire table and it improves the data load and archive performance.
This post will cover the implementation of partition in SQL Server. Throughout this post a simple table design will be used for demonstration purpose.
A table will be divided into many partitions based on the selected partition column. So, it is important to identify the correct column to partition a table. If not selected correctly it will create serious problem to query performance. In our example, Order Date column selected as a partition column. The aim of this post is to cover basics of table partition and its implementation, so this post will not cover the column selection guide and other demerits of partition.
The partition function is a key element which instructs the storage engine to create “n” number of partitions based on the boundary values. The boundary values are from the partition column itself, based on the boundary value rows in the table will be placed in a specific partition. If I want to create a separate partition for each year then the boundary values will be 2005-12-31, 2006-12-31, 2007-12-31 and 2008-12-31. Then the order table will look similar to the below diagram. Here the first partition will contain all the records up to the date 2005-12-31, if there are records before 2005-01-01 all those records will also be stored in the first partition. The second, third and fourth partitions will have 2006, 2007 and 2008 year records respectively. SQL server by default create an extra partition as shown in the below image (pn+1) which will be used to store records for which order date is greater than 2008-12-31.
Partition function provides two flavors for the boundary value which are range left and range right. When the function is created with range left all records up to and including boundary value will be stored in the left partition, the example given here is based on range left hence all the records where order date is 2005-12-31 will be stored in first partition P1. If the function is created as range right then the records belong to the boundary value will be stored in next (right) partition, considering our example all the rows in 2005-12-31 will be stored in P2 partition.
Until now we looked at partition column and function which are logical part of table partition process, the partition scheme is used to associate logical partition to physical filegroups. Each partition can be placed in separate filegroup or all partitions can be placed in a single filegroup.
If I know the historical records up to 2006-12-31 will not be modified, then partition P1 and P2 can be placed in a read-only file group, this will help in backup process as read only files need not to be part of daily backup routine. Next the partitions P3 and P4 are mapped to slow performing SAN disk (a disk with less IOPS) and latest partition is mapped to high performance disk.
The partition function and scheme are not bound only to a specific table. If there are other tables where similar data exists all those tables can be partitioned using the same partition function and scheme. For example, if I need to partition OrderLines table which is the subset of order table, I probably use same function and scheme to partition the OrderLines table.
How to implement partition
Create partition function based on the Order date value.
CREATE PARTITION FUNCTION PF_ORDERDATE (DATE) AS RANGE LEFT FOR VALUES ( '2005-12-31','2006-12-31', '2007-12-31','2008-12-31' );
Create partition scheme for the function
CREATE PARTITION SCHEME PS_ORDERDATE AS PARTITION PF_ORDERDATE TO ( readonly_fg, readonly_fg, slowdisk_fg, slowdisk_fg, [primary] );
CREATE PARTITION SCHEME PS_ORDERDATE AS PARTITION PF_ORDERDATE ALL TO ([PRIMARY]);
In first option, filegroup is specified for each boundary values but if you don’t have different filegroups then second query is the perfect one to use as it places all the partitions in primary filegroup. But the recommendation is to consider multiple filegroups while partitioning.
Create table and load some data, for this demo I am using WideWorldImporters database to load the data, it is available in GitHub for download. While creating the table, the partition scheme PS_ORDERDATE should be specified in the ON clause.
CREATE TABLE [dbo].[Orders]( [OrderID] [int] NOT NULL, [CustomerID] [int] NOT NULL, [SalespersonPersonID] [int] NOT NULL, [PickedByPersonID] [int] NULL, [ContactPersonID] [int] NOT NULL, [BackorderOrderID] [int] NULL, [OrderDate] [date] NOT NULL, [ExpectedDeliveryDate] [date] NOT NULL, [CustomerPurchaseOrderNumber] [nvarchar](20) NULL, [IsUndersupplyBackordered] [bit] NOT NULL, [Comments] [nvarchar](max) NULL, [DeliveryInstructions] [nvarchar](max) NULL, [InternalComments] [nvarchar](max) NULL, [PickingCompletedWhen] [datetime2](7) NULL, [LastEditedBy] [int] NOT NULL, [LastEditedWhen] [datetime2](7) NOT NULL ) ON PS_ORDERDATE (OrderDate) GO INSERT INTO dbo.Orders SELECT * FROM WideWorldImporters.Sales.Orders
Let’s check how the table is partitioned. The sys.partitions view will show list of partitions for the table. I will cross check the result returned from the view against actual table.
SELECT * FROM SYS.PARTITIONS WHERE OBJECT_ID= OBJECT_ID('dbo.Orders') SELECT YEAR(orderdate) order_year, COUNT(OrderId) record_count FROM dbo.Orders GROUP BY YEAR(orderdate) ORDER BY order_year
The result shows that the table is partitioned and rows matching with the record count taken by grouping on year from the table.
SQL Server provides some dynamic management view which provides information about partitions
It returns all the partitions for tables and indexes
It returns function name, number of partitions created by the function, left or right partition, etc., about each partition function created within database
It returns the partition scheme and associated partition function.
It returns all the boundary values for each partition function.