I would like to understand what algorithm SQL Server 2008 uses to log the 'taking longer than 15 seconds' message, e.g.
Aug 8 16:53:05 daffy MSSQLSERVER: 833: SQL Server has encountered 1435 occurrence(s) of I/O requests taking longer than 15 seconds to complete on file [M:\SQL_Data_Dev\tempdev1.ndf] in database [tempdb] (2). The OS file handle is 0x0000000000002148. The offset of the latest long I/O is: 0x0000010fe20000
http://support.microsoft.com/kb/897284/en-us
Reporting occurs in intervals that are 5 minutes, or more, apart.
Reporting occurs when the next I/O request is made on the file. If a
record action has occurred and 5 minutes or more have passed since
the last report occurred, the informational message that is
mentioned in the "Summary" section is written to the SQL Server
error log.
I interpret this paragraph to mean that I will see, at most, one 'taking longer than 15 seconds' message every 5 minutes.
However, examining my logs, I see these messages arriving plenty faster than that ... ~2.5 minutes between the first and the second message, per below, and then of course that whole burst of messages all arriving at 16:55:39
Time # of slow IOs
16:53:05 1435
16:55:39 456
16:55:39 458
16:55:39 428
16:55:39 451
16:55:39 452
16:55:39 458
16:55:39 427
16:55:39 482
16:55:39 3
16:55:39 2
Does anyone know what the algorithm is for reporting the 'taking longer than 15 seconds' message?
Plse note that I am not interested in solving the slow IO problem ... plse don't point me toward long lists of trouble-shooting steps ... rather, I want to understand how tightly the arrival of the log message correlates with slow IO ... according to the KB above, the message can arrive as much as 5 minutes after the actual event ... but according to what I'm seeing in my logs, the delay may be rather shorter than that. I am wanting to resolve this contradiction between KB article behavior and the behavior I'm seeing on my gear.
--sk
Stuart Kendrick
FHCRC