Quantcast
Channel: SQL Server Database Engine forum
Viewing all articles
Browse latest Browse all 12963

When table with clustered columnstore indexe is partitioned the performance degrades if data is located in multiple partitions

$
0
0

Hello,

Below I provide a complete code to re-produce the behavior I am observing.  You could run it in tempdb or any other database, which is not important.  The test query provided at the top of the script is pretty silly, but I have observed the same performance degradation with about a dozen of various queries of different complexity, so this is just the simplest one I am using as an example here. Note that I also included approximate run times in the script comments (this is obviously based on what I observed on my machine).  Here are the steps with numbers corresponding to the numbers in the script:

1. Run script from #1 to #7.  This will create the two test tables, populate them with records (40 mln. and 10 mln.) and build regular clustered indexes.

2. Run test query (at the top of the script).  Here are the execution statistics:

Table 'Main'. Scan count 5, logical reads 151435, physical reads 0, read-ahead reads 4, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Txns'. Scan count 5, logical reads 74155, physical reads 0, read-ahead reads 7, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
 SQL Server Execution Times:
   CPU time = 5514 ms, elapsed time = 1389 ms.

3. Run script from #8 to #9. This will replace regular clustered indexes with columnstore clustered indexes.

4. Run test query (at the top of the script).  Here are the execution statistics:

Table 'Txns'. Scan count 4, logical reads 44563, physical reads 0, read-ahead reads 37186, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Main'. Scan count 4, logical reads 54850, physical reads 2, read-ahead reads 96862, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
 SQL Server Execution Times:
   CPU time = 828 ms, elapsed time = 392 ms.

As you can see the query is clearly faster.  Yay for columnstore indexes!.. But let's continue.

5. Run script from #10 to #12 (note that this might take some time to execute).  This will move about 80% of the data in both tables to a different partition.  You should be able to see the fact that the data has been moved when running Step # 11.

6. Run test query (at the top of the script).  Here are the execution statistics:

Table 'Txns'. Scan count 4, logical reads 44563, physical reads 0, read-ahead reads 37186, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Main'. Scan count 4, logical reads 54817, physical reads 2, read-ahead reads 96862, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
 SQL Server Execution Times:
   CPU time = 8172 ms, elapsed time = 3119 ms.

And now look, the I/O stats look the same as before, but the performance is the slowest of all our tries!

I am not going to paste here execution plans or the detailed properties for each of the operators.  They show up as expected -- column store index scan, parallel/partitioned = true, both estimated and actual number of rows is less than during the second run (when all of the data resided on the same partition).

So the question is: why is it slower?

Thank you for any help!

---------------------------------------

Here is the code to re-produce this:

--///////////////////////////////
--==> Test Query - begin --<=== 
--//////////////////////////////

DBCC DROPCLEANBUFFERS
DBCC FREEPROCCACHE

SET STATISTICS IO ON
SET STATISTICS TIME ON

SELECT  COUNT(1)
FROM Txns AS z WITH(NOLOCK) 
LEFT JOIN Main AS mmm WITH(NOLOCK) ON mmm.ColBatchID = 70 AND z.TxnID = mmm.TxnID AND mmm.RecordStatus = 1
WHERE z.RecordStatus = 1

--///////////////////////////////
--==>   Test Query - end  --<=== 
--//////////////////////////////


--===========================================================
--1. Clean-up
IF OBJECT_ID('Txns') IS NOT NULL DROP TABLE Txns
IF OBJECT_ID('Main') IS NOT NULL DROP TABLE Main
IF EXISTS (SELECT 1 FROM sys.partition_schemes WHERE name = 'PS_Scheme') DROP PARTITION SCHEME PS_Scheme
IF EXISTS (SELECT 1 FROM sys.partition_functions WHERE name = 'PF_Func') DROP PARTITION FUNCTION PF_Func
 
--2. Create partition funciton
CREATE PARTITION FUNCTION PF_Func(tinyint) AS RANGE LEFT FOR VALUES (1, 2, 3)

--3. Partition scheme
CREATE PARTITION SCHEME PS_Scheme AS PARTITION PF_Func ALL TO ([PRIMARY])

--4. Create Main table
CREATE TABLE dbo.Main(
	SetID int NOT NULL, 
	SubSetID int NOT NULL,
	TxnID int NOT NULL,
	ColBatchID int NOT NULL,
	ColMadeId int NOT NULL,
	RecordStatus tinyint NOT NULL DEFAULT ((1))
) ON PS_Scheme(RecordStatus)

--5. Create Txns table
CREATE TABLE dbo.Txns(
	TxnID int IDENTITY(1,1) NOT NULL,
	GroupID int NULL,
	SiteID int NULL,
	Period datetime NULL,
	Amount money NULL,
	CreateDate datetime NULL,
	Descr varchar(50) NULL,
	RecordStatus tinyint NOT NULL DEFAULT ((1))
) ON PS_Scheme(RecordStatus)


--6. Populate data (credit to Jeff Moden: http://www.sqlservercentral.com/articles/Data+Generation/87901/)
--	 40 mln. rows - approx. 4 min

--6.1 Populate Main table
DECLARE @NumberOfRows INT = 40000000

INSERT INTO Main (
		SetID,
		SubSetID,
		TxnID,
		ColBatchID,
		ColMadeID,
		RecordStatus)
 SELECT TOP (@NumberOfRows)
        SetID = ABS(CHECKSUM(NEWID())) % 500 + 1, -- ABS(CHECKSUM(NEWID())) % @Range + @StartValue,
		SubSetID = ABS(CHECKSUM(NEWID())) % 3 + 1,
		TxnID = ABS(CHECKSUM(NEWID())) % 1000000 + 1,
		ColBatchId = ABS(CHECKSUM(NEWID())) % 100 + 1,
		ColMadeID = ABS(CHECKSUM(NEWID())) % 500000 + 1,
		RecordStatus = 1
FROM sys.all_columns ac1
CROSS JOIN sys.all_columns ac2

--6.2 Populate Txns table
--	 10 mln. rows - approx. 1 min

SET @NumberOfRows = 10000000

INSERT INTO Txns (
		GroupID,
		SiteID,
		Period,
		Amount,
		CreateDate,
		Descr,
		RecordStatus)
 SELECT TOP (@NumberOfRows)
        GroupID = ABS(CHECKSUM(NEWID())) % 5 + 1, -- ABS(CHECKSUM(NEWID())) % @Range + @StartValue,
		SiteID = ABS(CHECKSUM(NEWID())) % 56 + 1,
		Period = DATEADD(dd,ABS(CHECKSUM(NEWID())) % 365, '05-04-2012'),  -- DATEADD(dd,ABS(CHECKSUM(NEWID())) % @Days, @StartDate)
		Amount = CAST(RAND(CHECKSUM(NEWID())) * 250000 + 1 AS MONEY),
		CreateDate = DATEADD(dd,ABS(CHECKSUM(NEWID())) % 365, '05-04-2012'),
		Descr = REPLICATE(CHAR(65 + ABS(CHECKSUM(NEWID())) % 26), ABS(CHECKSUM(NEWID())) % 20),
		RecordStatus = 1
FROM sys.all_columns ac1
CROSS JOIN sys.all_columns ac2


--7. Add PK's
--   1 min
ALTER TABLE Txns ADD CONSTRAINT PK_Txns PRIMARY KEY CLUSTERED (RecordStatus ASC, TxnID ASC) ON PS_Scheme(RecordStatus)
CREATE CLUSTERED INDEX CDX_Main ON Main(RecordStatus ASC, SetID ASC, SubSetId ASC, TxnID ASC) ON PS_Scheme(RecordStatus)

--//////////////////////////
--==> Run test Query --<=== 
--//////////////////////////

--===========================================================
-- Replace regular indexes with clustered columnstore indexes
--===========================================================

--8. Drop existing indexes
ALTER TABLE Txns DROP CONSTRAINT PK_Txns
DROP INDEX Main.CDX_Main

--9. Create clustered columnstore indexes (on partition scheme!)
--	 1 min
CREATE CLUSTERED COLUMNSTORE INDEX  PK_Txns ON Txns ON PS_Scheme(RecordStatus)
CREATE CLUSTERED COLUMNSTORE INDEX  CDX_Main ON Main ON PS_Scheme(RecordStatus)

--//////////////////////////
--==> Run test Query --<=== 
--//////////////////////////

--===========================================================
-- Move about 80% the data into a different partition
--===========================================================

--10. Update "RecordStatus", so that data is moved to a different partition
--    14 min (32002557 row(s) affected)
UPDATE  Main
SET		RecordStatus = 2
WHERE	TxnID < 800000 -- range of values is from 1 to 1 mln.

-- 4.5 min (7999999 row(s) affected)
UPDATE  Txns
SET		RecordStatus = 2
WHERE	TxnID < 8000000 -- range of values is from 1 to 10 mln.

--11. Check data distribution
SELECT 
	OBJECT_NAME(SI.object_id) AS PartitionedTable
	, DS.name AS PartitionScheme
	, SI.name AS IdxName
	, SI.index_id
	, SP.partition_number
	, SP.rows
FROM sys.indexes AS SI WITH (NOLOCK)
JOIN sys.data_spaces AS DS WITH (NOLOCK)
	ON DS.data_space_id = SI.data_space_id
JOIN sys.partitions AS SP WITH (NOLOCK)
	ON SP.object_id = SI.object_id 
AND SP.index_id = SI.index_id 
WHERE DS.type = 'PS'
AND OBJECT_NAME(SI.object_id) IN ('Main', 'Txns')
ORDER BY 1, 2, 3, 4, 5;

/*
PartitionedTable	PartitionScheme	IdxName		index_id	partition_number	rows
Main				PS_Scheme		CDX_Main	1			1					7997443
Main				PS_Scheme		CDX_Main	1			2					32002557
Main				PS_Scheme		CDX_Main	1			3					0
Main				PS_Scheme		CDX_Main	1			4					0
Txns				PS_Scheme		PK_Txns		1			1					2000001
Txns				PS_Scheme		PK_Txns		1			2					7999999
Txns				PS_Scheme		PK_Txns		1			3					0
Txns				PS_Scheme		PK_Txns		1			4					0
*/

--12. Update statistics
EXEC sys.sp_updatestats

--//////////////////////////
--==> Run test Query --<=== 
--//////////////////////////





Viewing all articles
Browse latest Browse all 12963

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>