Why you need to defrag your hard disk

The other day,  I was working on an issue, where a SQL database couldn't be expanded, and the error message given by SQL Server is operating system error 665. After a quick lookup, I got an article on Microsoft ( "A heavily fragmented file in an NTFS file system volume may not grow beyond a certain size caused by an implementation limit in structures that are used to describe the allocations.") that said that in order to resolve the operating system error, one had to defrag the hard disk, so that the fragments are less, and that the file can be expanded as there are more extents.

This error happens when the hard disk is not big enough, and when it was formatted, the large file size record option, /L, wasn't applied (The hard disk had to be big enough for /L to be applied automatically, or you need to apply it manually).

I suggested to the customer to format a new hard disk (as this system was hosted in a VM), with large file size records, FORMAT NEWDRIVE: /FS:NTFS /L so that larger extents can be created (and the existing hard disk can be copied over to the newly formatted hard  disk, with the new hard disk being reassigned the same drive designation as the old drive), however, the customer refused, intending to see the defrag to completion.

Earlier today, I found a blog entry on Microsoft regarding possible solutions. For completeness sake, here it is:

 

The operating system error 665, indicating a file system limitation has been reached continues to gain momentum beyond DBCC snapshot files.    Over the last ~18 months I have reproduced issues with standard database files, BCP output, backups and others.

We have posted previous blogs talking about the NTFS attribute design and associated limitations (665) as the storage space for the attribute structures becomes exhausted.

 

http://blogs.msdn.com/b/psssql/archive/2008/07/10/sql-server-reports-operating-system-error-1450-or-1452-or-665-retries.aspx

http://blogs.msdn.com/b/psssql/archive/2009/03/04/sparse-file-errors-1450-or-665-due-to-file-fragmentation-fixes-and-workarounds.aspx

 

Online DBCC has been susceptible to this limitation leveraging a secondary stream for copy-on-write activities.    The sparse nature of DBCC snapshot or a snapshot database can drive attribute exhaustion.

 

As space is acquired the disk storage location(s) and size(s) are stored in the attribute structures.   If the space is adjacent to a cluster already tracked by the file the attributes are compressed into a single entry, spanning the entire size.   However, if the space is fragmented it has to be tracked with multiple attributes.

 

The 665 issue can pop up with larger file sizes.   As the file grows it acquires more space.  During the space acquisition the attributes are used to track this space.  

 

Repro 1 – DBCC Snapshot or Snapshot Database

 

The copy-on-write logic acquires new space as pages are dirtied in the main database.  As writes occur to the snapshot target more attributes are used.  The space allocations and clusters are by definition and design stored in spare locations.

 

Repro 2 – Database files

 

I can insert data into a database with a smaller auto grow size, each acquiring new disk space.  If the disk space is not acquired in strict, cluster adjacent order the growth is tracked with separate attributes.   After millions of grows from a large import or index rebuild (usually around 300GB) I can exhaust the attribute list and trigger the 665 error.

 

Repro 3 – BCP

 

BCP extends the output file in the process of simply writing to the file.  Each block written has to acquire disk space and it is reliant on adjacent disk space allocation to accommodate a large file.

 

One customer was trying to use a copy of BCP per CPU and the query option to partition the output streams and improve the export performance.   This can work very well but in this case it back-fired.  The output location was shared and as each copy of BCP was writing data it caused them to leap frog each other on the same physical media.   Each of the BCP output streams quickly exhausted the attribute storage as none of them were acquiring adjacent storage.

 

What Should I Do?

 

Testing has shown that defragmentation for the physical media may help reduce the attribute usage.   Just be sure your defragmentation utility is transactional.  (Note:  defragmentation is a different story on SSD media and typically does not address the problem.  Copying the file and allowing the SSD firmware to repack the physical storage is often a better solution.)

 

Copy – Doing a file copy may allow better space acquisition.

 

Use SQL Server 2014 for DBCC online operations.  SQL Server 2014 no longer uses the secondary stream but a separate, sparse file marked for delete on close.  This may reduce the shared attribute storage required by a secondary stream.

 

Adjust the auto growth to acquire sizes conducive for production performance as well as packing of attributes.

 

Format the drive using the larger NTFS metadata, file tracking structures providing a larger set of attribute structures. https://support.microsoft.com/en-us/kb/967351

 

Use ReFS instead of NTFS.  ReFS does not contain the same design as NTFS.

 

Adjust utility write sizes, for example the BCP buffer size change:   I just discovered that the fix list for SQL Server 2012 SP2 does not appear to contain the fix information.   The fix changes the bcpWrite (internal API used by ODBC driver and BCP.exe) from a 4K write size to 64K write size.

 

Backup – Carefully plan the number of files (stripe set) as well as transfer and block sizes.

Published Mon, 16 Jan 2017 @ 11:12 PM by chuacw
Related articles: ,

Leave a Comment

(required) 
(required) 
(optional)
(required) 
Enter the following code to ensure that your comment reaches the intended party:
Enter the numbers you see in the image: