Extended Events

SQL Server 2012 MEMORYCLERK_XE Memory Leak

Monday morning: a time to settle in, grab some coffee, and check on any alerts that might have sprung up over the weekend. On this particular Monday, we received an unusual alert from a Test server: “Job DBA_DatabaseIntegrityCheck has been executing for 53 hours.”

Well, that’s no good. I decided to investigate and quickly ran sp_WhoIsActive on the server to see what was going on.

resource_semaphore

This isn’t the original screenshot from when the job had been running for 53+ hours, but everything else was the same.

Right away I could see the DBCC CHECKDB statement running and waiting on RESOURCE_SEMAPHORE. So it appeared to be some sort of memory issue on the server. I decided to run the handy memory clerk usage query from Glenn Berry’s 2016 DMV Queries. Here is that query and the results that I received:

--From Glenn Berry's SQL Server Diagnostic Information Queries for May 2016
--http://www.sqlskills.com/blogs/glenn/sql-server-diagnostic-information-queries-for-may-2016/
--* Copyright (C) 2016 Glenn Berry, SQLskills.com
--* All rights reserved.
-- Memory Clerk Usage for instance  (Query 41) (Memory Clerk Usage)
SELECT TOP(10) mc.[type] AS [Memory Clerk Type],
CAST((SUM(mc.pages_kb)/1024.0) AS DECIMAL (15,2)) AS [Memory Usage (MB)]
FROM sys.dm_os_memory_clerks AS mc WITH (NOLOCK)
GROUP BY mc.[type]
ORDER BY SUM(mc.pages_kb) DESC OPTION (RECOMPILE);

/*
**RESULTS:
Memory Clerk Type         Memory Usage (MB)
MEMORYCLERK_XE            1311.49
MEMORYCLERK_SQLBUFFERPOOL 238.69
CACHESTORE_SQLCP          102.48
MEMORYCLERK_SOSNODE       45.45
CACHESTORE_PHDR           37.34
CACHESTORE_OBJCP          22.79
USERSTORE_SCHEMAMGR       18.29
USERSTORE_DBMETADATA      14.88
MEMORYCLERK_SQLGENERAL    14.24
MEMORYCLERK_SQLSTORENG    9.45
*/

Why would MEMORYCLERK_XE be taking up so much of the memory on the box? I knew that the MEMORYCLERK_XE clerk was the total amount of memory that the extended events sessions on the server were using. So, once again, time for further analysis.

select name
 ,total_buffer_size / 1024 as total_buffer_size_kb
from sys.dm_xe_sessions
order by total_buffer_size_kb desc

/*
**RESULTS:
NAME                          total_buffer_size_kb
system_health                 4223
SpotlightResponseTimeWaitData 4223
SpotlightWorkloadAnalysis     4223
SpotlightDeadlockDetection    4031
sp_server_diagnostics session 383
*/

You can see that there were 5 different XEvent sessions on the server (we use Dell Spotlight which opens up three XEvent sessions on every server it monitors) – and all of them combined were using about 16.5mb of buffer space in memory. That was nowhere near the 1GB+ of memory that MEMORYCLERK_XE was using! So what gives?

My next course of action was to simply drop all XEvent session on the instance and hope that memory usage cleared up. I dropped all three Spotlight sessions, and disabled the system_health session just for fun. MEMORYCLERK_XE usage dropped from 1312 mb to 1299 mb. It barely moved at all! What was going on?

After posting on dba.stackexchange and doing some googling, I finally came across this KB article from Microsoft. Apparently there was a known memory leak with Extended Events in early versions of SQL Server 2012. So I checked on the current version of the test server I had been working on:

SELECT @@VERSION

Microsoft SQL Server 2012 (SP1) - 11.0.3437.0 (X64)
 May 29 2014 16:03:40
 Copyright (c) Microsoft Corporation
 Enterprise Edition: Core-based Licensing (64-bit) on Windows NT 6.3 <X64> (Build 9600: ) (Hypervisor)

Sure enough, it was only SP1. The KB article states there were CU fixes for both SP1 and SP2, but we decided to go ahead and just upgrade the instance to SP3 + CU3 to get to the most recent version we could.

After patching the instance, I setup a simple job to query the amount of memory MEMORYCLERK_XE was using and insert that value into a table every 30 minutes – that way I could track the usage history. The basic setup script for that job can be found here.

 

memoryclerk_graph

I was a little nervous for the first couple of days, because it looked like the memory usage was slowly climbing on its own. However, it eventually dropped off and seems to be maintaining a steady usage of somewhere in the 30-50 mb range. More importantly, I can now drop the event sessions on the instance and the MEMORYCLERK_XE memory usage drops to 0!

memoryusage_0

Hopefully this might help someone else who may be having the same problem that we encountered. This is also just another example of why you should patch your SQL Servers regularly.

Resources: