HammerDB

Use the Query Store In SQL Server 2016 To Improve HammerDB Performance (T-SQL Tuesday #79)


Improving HammerDB Performance with the New 2014 Cardinality Estimator


TSQL2SDAY-150x150

T-SQL Tuesday #79 is being hosted by Michael J. Swart this month. His prompt was to check out what’s new with SQL Server 2016. I figured I would play around with the Query Store and our old friend HammerDB.

This post is a follow-up of sorts to my post HAMMERDB AND THE NEW SQL SERVER 2014 CARDINALITY ESTIMATOR. Read that first to get a basic overview of the problem we are trying to solve. Come back once you are finished.

Basic synopsis of that article if you don’t want to read it for yourself:

  • With the new cardinality estimator introduced in SQL Server 2014, performance of the SLEV stored procedure used by HammerDB seriously regresses.
  • Adding Trace Flag 9481 to the statement that regressed in the stored procedure brings the performance back to the same levels as SQL Server 2012.

The Workload on SQL 2016

First, we need to figure out if the HammerDB workload performs similarly to when we first upgraded to SQL 2014. I’m going to follow the same steps that I took when upgrading the database to 2014, except I will be changing the compatibility level to 130 (the new 2016 level) and at the end I will be enabling the query store on the database. (For my specific HammerDB build instructions, check out that original 2014 post).

/*
Using this blog post from Thomas LaRock as a guideline for upgrading to 2014:
thomaslarock.com/2014/06/upgrading-to-sql-server-2014-a-dozen-things-to-check/
*/
USE [master]
GO

--Both data and log file on my C drive - not being fancy here
RESTORE DATABASE [tpcc]
FROM DISK = N'C:\2012INSTANCES\tpcc_blogtesting_10WH.bak'
WITH FILE = 1
,MOVE N'tpcc' TO N'C:\2016INSTANCES\DATA\tpcc.mdf'
,MOVE N'tpcc_log' TO N'C:\2016INSTANCES\LOGS\tpcc_log.ldf'
,NOUNLOAD
,STATS = 5
,REPLACE
GO

--Be sure to set the database compatibililty_level up to the newest 2016 version!
ALTER DATABASE [tpcc] SET COMPATIBILITY_LEVEL = 130;

USE [tpcc]
GO

DBCC CHECKDB WITH DATA_PURITY;

DBCC UPDATEUSAGE(tpcc);

USE [tpcc]
GO
EXEC sp_MSforeachtable @command1='UPDATE STATISTICS ? WITH FULLSCAN';

USE [tpcc]
GO
--Enable Query Store on tpcc database
ALTER DATABASE CURRENT SET QUERY_STORE = ON;

After the database is restored and upgraded to 2016, we can run the TPC-C workload. Here are the results and the perfmon graph from the TPC-C workload running on my local 2016 instance:

HammerDB Results: 32,497 Transactions Per Minute;  7,171 New Orders Per Minute

CPU Average: 76%; Batch Requests / Sec average: ~ 527

2016_defaultPerfMon

Here are the results from the same workload on SQL 2012, 2014, and 2016 all in one table. You can see that we have the same issue on SQL 2016 that we had when first upgrading to SQL 2014.

Workload      TPM NOPM %CPU Avg BatchRequests/Sec Avg
2012 (baseline) 114,901 25,035 35% 1,750
2014 (Pre TF Fix) 31,542 6,894 74% 492
2016 32,497 7,171 76% 527

Sure enough, when we check the plan cache for the most expensive queries based on total_worker_time, we see the same query we would expect rise to the top. It is also using the same plan as it was on SQL 2014.

orig_plancache

orig_plan

Can we use the new Query Store to force the ‘good 2012 plan’ for the query without modifying the stored procedure?

Let’s say, for whatever reason, that we don’t want to modify the stored procedure and manually insert the trace flag option onto the query. Is there still a way to force the ‘good plan’? Well, let’s find out.

First, I’m going to pull up the Top Resource Consuming Queries window in the Query Store.

query_store_top_queries

You can see that our problem query is incredibly easy to find in the top left window based on total duration. Also notice that in the top right Plan summary window, there is currently only one available plan for the query (plan_id 49).

We need to figure out how we can get our ‘good plan’ using Trace Flag 9481 as an available plan that we can force using the Query Store.

The Query Store Fix

Here is the code we can use to manually execute the SLEV stored procedure using TF9481 at the session level. This will get us back to the ‘good plan’ and hopefully the query store will recognize the ‘good plan’ as an available plan that we can force. (IMPORTANT: Set Options will need to be identical to the options the existing query in cache used. You can find these pretty easily by looking at the XML of the original plan in cache.)

 USE [tpcc]
GO

--SET OPTIONS MUST BE IDENTICAL TO THE EXISTING PLAN IN CACHE!!
    --(otherwise our 'good plan' won't show as available for the existing query)

SET ANSI_NULLS ON
SET ANSI_PADDING ON
SET ANSI_WARNINGS ON
SET ARITHABORT OFF
SET CONCAT_NULL_YIELDS_NULL ON
SET NUMERIC_ROUNDABORT OFF
SET QUOTED_IDENTIFIER ON


--clear the plan of the bad query using the plan_handle
DBCC FREEPROCCACHE(0x05000500CE07091B409F80E73401000001000000000000000000000000000000000000000000000000000000)


--enable TF 9481 at the session level to revert to the legacy 2012 Cardinality Estimator
DBCC TRACEON(9481)

BEGIN TRAN
EXEC [dbo].[SLEV]
  @st_w_id = 8,
  @st_d_id = 9,
  @threshold = 10
ROLLBACK TRAN --we can even rollback the proc to prevent any actual data modification

GO 10 --execute 10 times just for fun

--disable the TF
DBCC TRACEOFF(9481) 

Now let’s check the Query Store Top Resource Consumers window again:
query_store_top_queries_fixedPlan

Sure enough, the ‘good plan’ is listed as part of the Plan Summary for our problem query (plan_id 74). Now we can easily force this plan using the GUI or t-sql:

--For query_id 49, force plan_id 74
EXEC sys.sp_query_store_force_plan @query_id = 49, @plan_id = 74;

Results

Just so that we can get a cool perfmon graph, I’m going to start the HammerDB workload before forcing the ‘good plan’. After about 30 seconds or so, I’ll force the ‘good plan’ using the query store and we can see what impact it has on the workload performance.

2016_perfMon_PlanForce

You can see the plan was forced right around the 3:55:20 mark. It quickly brought batches / second up to around 1800 and close to the SQL Server 2012 benchmark that was in the original post.

The query is back down to around 700 reads per execution…

query_stats_fixed

…and here are the stats after running a full workload with the Query Store fix in place. We still aren’t all the way back to the 2012 baseline levels, but we are much closer.

Workload TPM NOPM %CPU Avg BatchRequests/Sec Avg
Query Store Fix 98,177 21,324 32% 1,633
2012 (baseline) 114,901 25,035 35% 1,750

Conclusion

With new versions of SQL Server come new tools, and the Query Store is a very powerful tool that can be useful when you need to force a plan based on a trace flag but don’t want to necessarily hard code the trace flag into the query or stored procedure. I can’t wait to start using the Query Store on production databases.

Once again, always be sure to test workloads and applications before upgrading a production database to a newer version of SQL Server – especially when going from 2012 or older to 2014 and newer.

Advertisements

HammerDB and The New SQL Server 2014 Cardinality Estimator

OR

(The Importance of Workload Testing When Upgrading to SQL Server 2014)


Improving HammerDB Performance with the New 2014 Cardinality Estimator


With SQL Server 2014 comes a brand spankin’ new cardinality estimator. Here is an excerpt from the summary of the whitepaper titled “Optimizing Your Query Plans with the SQL Server 2014 Cardinality Estimator” by Joe Sack:

“SQL Server 2014 introduces the first major redesign of the SQL Server Query Optimizer cardinality estimation process since version 7.0.  The goal for the redesign was to improve accuracy, consistency and supportability of key areas within the cardinality estimation process, ultimately affecting average query execution plan quality and associated workload performance. “

Sounds fancy, huh? So I decided to play around and compare a HammerDB TPC-C workload played against a 2012 database versus that same workload against a 2014 version of that database. For those that do not know, HammerDB is a free open-source benchmarking tool that can be used on a variety of RDBMSs.

For this scenario, let’s pretend that our application (HammerDB) currently uses a SQL Server 2012 database on the backend. We have just been approved to upgrade to SQL Server 2014, and we want to make sure that after the upgrade our application will perform at least at the same level as it did while on the 2012 database. Generally speaking, users are not happy if an upgrade causes a performance degradation. =)

Surely HammerDB will perform better with the new and improved 2014 Cardinality Estimator (CE), right? Let’s see.

The Setup Process

1. I began by using HammerDB (v2.16) to create a new TPC-C database called ‘tpcc’ with 10 Virtual Warehouses on my local 2012 instance. The result is a small 1GB database with both the data and log files on my C: drive. (Note: I am using the Original schema to build the database. I have not tested with the Updated schema and results could very well be different.)

2. Next, I took a backup of the tpcc database. This allowed me to use the same starting database for each test, without having to use HammerDB to rebuild it.

BACKUP DATABASE [tpcc] TO DISK = N'C:\2012INSTANCES\tpcc_blogtesting_10WH.bak'; 

3. I restored the tpcc database to my new 2014 instance and ran some necessary upgrade commands:


/*
Using this blog post from Thomas LaRock as a guideline for upgrading to 2014:
Upgrading to SQL Server 2014: A Dozen Things to Check
*/ USE [master] GO --Both data and log file on my C drive - not being fancy here RESTORE DATABASE [tpcc] FROM DISK = N'C:\2012INSTANCES\tpcc_blogtesting_10WH.bak' WITH FILE = 1 ,MOVE N'tpcc' TO N'C:\2014INSTANCES\DATA\tpcc.mdf' ,MOVE N'tpcc_log' TO N'C:\2014INSTANCES\LOGS\tpcc_log.ldf' ,NOUNLOAD ,STATS = 5 GO --Be sure to set the database compatibililty_level up to the newest version! ALTER DATABASE [tpcc] SET COMPATIBILITY_LEVEL = 120; USE [tpcc] GO DBCC CHECKDB WITH DATA_PURITY; DBCC UPDATEUSAGE(tpcc); USE [tpcc] GO EXEC sp_MSforeachtable @command1='UPDATE STATISTICS ? WITH FULLSCAN';

4. Now that the database was ready to go on both my local 2012 instance and on my local 2014 instance, it was time to prepare the workload driver options on HammerDB. I used a timed test with 1 Minute of Rampup Time and 3 minutes for the Test Duration. The Virtual User Options can be seen below:
HammerDBSetup

5. Time to setup PerfMon. I used PerfMon to capture ‘Batches Requests/Sec’ (separate counter for each instance) and CPU Usage. These two metrics give a good enough idea of how the workload is performing. I set the Scale for ‘% Processor Time’ to 10.0 to make it easier to view compared to Batch Requests/sec.

perfmon_setup

The Workloads

Now that the setup is complete, it’s time to run the HammerDB workloads. Here are the results and PerfMon charts:

2012 (Performance Baseline):

In our scenario, this is how our application performs pre-upgrade. This is the performance baseline. After upgrading to 2014, these are the numbers we want to try to hit.

HammerDB Results: 114,901 Transactions Per Minute (TPM); 25,035 New Orders Per Minute (NOPM)

CPU Average: 35%; Batch Requests / Sec average: ~1,750

2012_initialPerfMon

2012 PerfMon Chart

 2014 (Workload after the upgrade):

Let’s point the application at the upgraded 2014 database and see what happens!

HammerDB Results: 31,542 Transactions Per Minute; 6,894 New Orders Per Minute

CPU Average: 74%; Batch Requests / Sec average: ~ 492

2014_initialPerfMon

2014 PerfMon Chart

Uh Oh! After the upgrade our CPU Utilization has more than doubled and our application throughput has decreased by over 70%! If this were in production, users would be looking for heads to roll.

Let’s take a look at the performance numbers side-by-side:

Workload      TPM NOPM %CPU Avg BatchRequests/Sec Avg
2012 114,901 25,035 35%                         1,750
2014   31,542   6,894 74%                            492

What gives?! I thought 2014 was supposed to improve performance?! Well, let’s see if we can figure out what happened.

What Happened?

Because CPU usage was so different between the two workloads, the first thing I decided to do was to query sys.dm_exec_query_stats and compare query statistics based on which queries used the most CPU time (total_worker_time).


SELECT TOP 15 SUBSTRING(qt.TEXT, (qs.statement_start_offset / 2) + 1, (
(
CASE qs.statement_end_offset
WHEN - 1
THEN DATALENGTH(qt.TEXT)
ELSE qs.statement_end_offset
END - qs.statement_start_offset
) / 2
) + 1) AS 'QueryText'
,qs.total_logical_reads / qs.execution_count AS AvgLogicalReads
,qs.total_worker_time / qs.execution_count AS 'AvgWorkerTime(µs)'
,qs.total_worker_time AS 'total_worker_time(µs)'
,qp.query_plan
FROM sys.dm_exec_query_stats qs
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) qt
CROSS APPLY sys.dm_exec_query_plan(qs.plan_handle) qp
WHERE qt.TEXT NOT LIKE '%sys.dm%' --filter out dmv queries
ORDER BY qs.total_worker_time DESC -- CPU time
OPTION(RECOMPILE)

Here are the results from the query on both instances (2012 on the left, 2014 on the right):

2012vs2014_fixed

2012 Results                                                            2014 Results

Right away, the query using the most total CPU on the 2014 instance jumps out at us. You can see the difference between the query on the 2012 instance and the 2014 instance. But why is the query behaving differently? Let’s check the execution plans.

The query (formatted for readability):

SELECT @stock_count = count_big(DISTINCT STOCK.S_I_ID)
FROM dbo.ORDER_LINE
 ,dbo.STOCK
WHERE ORDER_LINE.OL_W_ID = @st_w_id
 AND ORDER_LINE.OL_D_ID = @st_d_id
 AND (ORDER_LINE.OL_O_ID < @st_o_id)  AND ORDER_LINE.OL_O_ID >= (@st_o_id - 20)
 AND STOCK.S_W_ID = @st_w_id
 AND STOCK.S_I_ID = ORDER_LINE.OL_I_ID
 AND STOCK.S_QUANTITY < @threshold

2012vs2014_plansSo, the query went from doing an Index Seek + a RID lookup on the STOCK table in 2012, to a parallelized table scan (with a missing index hint) in 2014.

The Fix

Luckily for us, the problematic query belongs to a stored procedure created by HammerDB called SLEV. This means it will be pretty easy to modify the query since we can directly edit the stored procedure on the database.

SQL Server 2014 introduces a brand new trace flag to assist in reverting to the legacy Cardinality Estimator. Here is another quote from the whitepaper mentioned earlier by Joe Sack:

“Trace Flag 9481 reverts query compilation and execution to the pre-SQL Server 2014 legacy CE behavior for a specific statement.”

So, we can use Trace Flag 9481 to bring back the old CE. This trace flag can be enabled globally, or at the individual query level. This allows us to fine tune certain queries where performance may have degraded since the upgrade, without affecting queries that may have actually improved with the upgrade.

We can place “OPTION QUERYTRACEON(9481)” at the end of the problem query inside the stored procedure to revert just that query to the old version. Hopefully this will get our HammerDB workload back to acceptable performance levels.

CREATE PROCEDURE [dbo].[SLEV] @st_w_id INT
	,@st_d_id INT
	,@threshold INT
AS
BEGIN
	DECLARE @st_o_id INT
		,@stock_count INT

	BEGIN TRANSACTION

	BEGIN TRY
		SELECT @st_o_id = DISTRICT.D_NEXT_O_ID
		FROM dbo.DISTRICT
		WHERE DISTRICT.D_W_ID = @st_w_id
			AND DISTRICT.D_ID = @st_d_id

		SELECT @stock_count = count_big(DISTINCT STOCK.S_I_ID)
		FROM dbo.ORDER_LINE
			,dbo.STOCK
		WHERE ORDER_LINE.OL_W_ID = @st_w_id
			AND ORDER_LINE.OL_D_ID = @st_d_id
			AND (ORDER_LINE.OL_O_ID < @st_o_id) AND ORDER_LINE.OL_O_ID >= (@st_o_id - 20)
			AND STOCK.S_W_ID = @st_w_id
			AND STOCK.S_I_ID = ORDER_LINE.OL_I_ID
			AND STOCK.S_QUANTITY < @threshold OPTION (QUERYTRACEON 9481) --*****ADD THIS OPTION TO REVERT QUERY TO LEGACY CE***** 

SELECT @st_o_id AS N'@st_o_id' ,@stock_count AS N'@stock_count' 
END TRY 
BEGIN CATCH 
SELECT ERROR_NUMBER() AS ErrorNumber 
,ERROR_SEVERITY() AS ErrorSeverity 
,ERROR_STATE() AS ErrorState 
,ERROR_PROCEDURE() AS ErrorProcedure 
,ERROR_LINE() AS ErrorLine 
,ERROR_MESSAGE() AS ErrorMessage; 
IF @@TRANCOUNT > 0
			ROLLBACK TRANSACTION;
	END CATCH;

	IF @@TRANCOUNT > 0
		COMMIT TRANSACTION;
END
GO

Now that the trace flag is in the stored procedure, let’s fire HammerDB back up.

2014_traceflag_Perfmon

2014 Workload with TF 9481 on problem query

querystats_afterTF

BOOM! HammerDB throughput is back up to the same level it was at before the upgrade, CPU on my laptop is back in the 35% range, and the query’s statistics are back to the legacy CE levels. End users are rejoicing!

Conclusion

I’ve simply shown one way to fix the problem query in HammerDB. There are other ways this could have been handled, but the trace flag Microsoft introduced makes it very easy to quickly revert queries to the legacy CE to ‘fix’ performance.

It is very important to always test applications, workloads, or queries before fully upgrading to SQL Server 2014. To make the upgrade process easier on yourself, take some time to get to know your workloads (application throughput, query performance, etc.) before the upgrade ever takes place. With a good baseline for reference, it makes it much easier to compare performance post-upgrade. A single bad query can bring the entire application to its knees. The new Cardinality Estimator will most likely do much more good than bad for queries, but there will always be edge cases where performance declines.