HELP, My SCCM SQL DB Filled Up
Good morning friend, this is your wakeup call. The SCCM database is growing at 1 GB per minute and you’re about to be out of space. Yes, there are better ways to wake up, like hearing the waves crashing on the beach outside of your tropical hotel room.
Our DBA team let us know that the alerting we were getting was
spot on, and they had tracked down one transaction in SQL that was running for
a very long time. They showed us the details on that transaction which included
a client name. We went with our first reaction which was to completely destroy
that client install and hopefully stop the bleeding. While that worked, support
later told us to just DISABLE the service on the client to retain logging data
that would otherwise get deleted.
We looked at our SCCM database using the standard reports available in SSMS, specifically the Disk Usage by Top Tables report. The column for Data (KB) is what we care about the most and for dbo.Logs it showed a size just north of 700 GB. To put this into perspective, our normal for this table is around 100-150 MB per day retained.
The dbo.Logs table is maintained by the built-in maintenance
task Delete
Aged Log Data. According to Microsoft, you should use this task to
delete from the database aged log data used for troubleshooting. This data
isn't related to Configuration Manager component operations. This means
that we can safely purge this down to nothing and the only impact should be
that we delete data that could help us troubleshoot issues, like this issue.
So it’s 9:00 AM, our database has auto grew itself to within
an inch of the drive’s capacity, and nothing in SCCM works. Let’s just go to
that maintenance task and click “Run Now” to fix this. NOPE, there isn’t a run
now button. You can try changing the time the task runs to a time very shortly
in the future, but this doesn’t always work. Our other problem aside from the
lack of a “Run Now” button was that our DB exploded by 700 GB between 5 AM and
8 AM, and this maintenance task only accepts a “days to keep” parameter equal
to or higher than 1 day which would not have deleted our 700 GB of logs that
were less than a day old.
We asked Copilot and ChatGPT how to manually run that task with a specific parameter for number of days, and neither of them had any brilliant answers. While this isn’t easily discoverable in any documentation I could find, maintenance tasks are just SQL stored procedures. The query from the below screenshot shows you the thousands of SQL stored procedures for you to pick from, and we stumbled across this one called spDeleteAgedLogData. Spoiler alert, most others are named very similar to their plain text display name in the maintenance tasks GUI.
FULL DISCLOSURE: EXECUTING
THIS SP IS NOT A SUPPORTED / ADVERTISED METHOD FOR RUNNING SCCM MAINTENANCE
TASKS. DO THIS AT YOUR OWN RISK. TEST IN QA. MAKE SURE YOU HAVE GOOD BACKUPS. IF
YOU’RE NOT COMFORTABLE WITH THE POTENTIAL CONSEQUENCES OF THIS 100 PERCENT, CONSIDER
NOT TOUCHING THIS UNLESS YOU ARE TOLD TO BY MICROSOFT SUPPORT.
Our scheduled run of this Delete Aged Log Data maintenance task removes data over 7 days old, but for this issue we needed to get into the last 5 hours, which cannot be done with the 1 day minimum parameter accepted in the console GUI. Our solution was to create a new SQL query (EXEC spDeleteAgedLogData 0) which would run this SQL stored process manually and delete ALL aged log data. This was also our “Run Now” button.
Our average run time for this maintenance task was around 20-30 seconds. If you were to run this maintenance task with the dates shrunk down from 7 to 0, you would expect it to take about 3 minutes. In our case we executed it multiple times dropping the parameter by 1 from 7 to 0. When we ran it each time down to 1, we had our expected 30 second run times. When we ran it with 0 (remember we had to clear out the dbo.Logs table within the last few hours), it ran for a little over 7 hours and our dbo.Logs table shrunk from 700 GB down to about 70 MB. #WINNING
It is worth noting that if you just want to hit the “Run Now”
button with whatever parameters you already have set in the GUI, you can
manually do this from a quick query. The below query will set the RunNow value from
0 to 1, which trips the task to run. If you’re fast enough, you can watch the
task go from 0 to 1 and then back from 1 to 0, but the key if you miss that is
you’ll see the last start / completion times move from the last run to the
current time where you ran it.
See
It Run:
Again a full disclaimer, if you are not comfortable with any
of this, DON’T DO IT. Seek professional help from Microsoft. I’m not a DBA and
there are likely tons of people out there that are laughing at this post
because they know it better. On that note, if you have any tips for edits to
this, please message me and I’ll make it sound smarter with your advice + I’ll
give you some credit. As always, thanks for reading!
Did you shrink the SCCM DB afterwards
ReplyDelete