Slow performance and large MessageBox
September 30, 2004 Leave a comment
This was a tricky one. We’d seen performance dropping over a period of days. Symptoms were as follows:
- Constant SQL churning.
- Large MessageBox Database size.
- Large MessageBox backups for system volume.
- Horrendously slow response times.
I decided at first it was the lack of the “Back up BizTalk Databases” job (this job archives completed messages in the MB database as it runs). Once enabled, performance got worse as the large MessageBox created large backup files and lots of processing. Of course, running out of space on the backup drive didn’t help, but that’s another discussion.
In digging (using SQL Query Analyzer), I found thousands of messages (140,000 in our case), all dehydrated, in the following tables:
- SELECT Count(*) AS BizTalkServerIsolatedHostQ FROM BizTalkServerIsolatedHostQ
- SELECT Count(*) AS DynamicStateInfo_BizTalkServerIsolatedHost FROM DynamicStateInfo_BizTalkServerIsolatedHost
- SELECT Count(*) AS Instances FROM Instances
- SELECT Count(*) AS MessageParts FROM MessageParts
- SELECT Count(*) AS MessageRefCountLogTotals FROM MessageRefCountLogTotals
- SELECT Count(*) AS Spool FROM Spool
So, I identified two potential causes:
- Long-running scope of STP orchestrations (I don’t think this caused it, but it’s something we should try to resolve, time allowing).
- BizTalk defect as described in MSKB 867449. Hotfix released on 6/11/2004; but I don’t see it in any patches as of this writing. This is an actual defect, relating to a BizTalk server that loses connection spamming the message tables once connection is re-established. For the REALLY sleepy, read this article:
http://support.microsoft.com/default.aspx?scid=kb;en-us;867449&Product=bizt2002
As it turns out, In this environment, the BizTalk and SQL servers are on Virtual Server .VHD files and undergo a nightly backup. If the BizTalk server starts sooner than the SQL server, BizTalk frantically (and correctly) attempts to reconnect. However, as the defect article describes, it pushes large numbers of messages into the MB.
The manual fix:
- Use HAT (Operations, Messages with no filter) and delete dehydrated messages. The downside: HAT can only delete 2047 at a time .. and is S-L-O-W.
- Write something in WMI (I found some incomplete examples, but couldn’t use them).
I settled on backing up the MessageBox database and execute the following SQL statements in SQL Query Analyzer:
- DELETE FROM BizTalkServerIsolatedHostQ
- DELETE FROM DynamicStateInfo_BizTalkServerIsolatedHost
- DELETE FROM Instances
- DELETE FROM MessageRefCountLogTotals
- DELETE FROM MessageParts
- DELETE FROM Spool
This has the effect of orphaning all the dehydrated message parts. After this, run all the maintenance jobs in SQL Enterprise Manager (Management, SQL Server Agent, Jobs) to tidy up the orphan references. Restart both servers so they’ll recover, and watch performance on each as they get to know each other again.
I’d appreciate comments as to the veracity of this solution; did I get all the tables, am I causing other damage, etc.