Why you need to defrag your hard disk
The other day, I was working on an issue, where a SQL database couldn't be expanded, and the error message given by SQL Server is operating system error 665. After a quick lookup, I got an article on Microsoft ( "A heavily fragmented file in an NTFS file system volume may not grow beyond a certain size caused by an implementation limit in structures that are used to describe the allocations.") that said that in order to resolve the operating system error, one had to defrag the hard disk, so that the fragments are less, and that the file can be expanded as there are more extents.
This error happens when the hard disk is not big enough, and when it was formatted, the large file size record option, /L, wasn't applied (The hard disk had to be big enough for /L to be applied automatically, or you need to apply it manually).
I suggested to the customer to format a new hard disk (as this system was hosted in a VM), with large file size records, FORMAT NEWDRIVE: /FS:NTFS /L so that larger extents can be created (and the existing hard disk can be copied over to the newly formatted hard disk, with the new hard disk being reassigned the same drive designation as the old drive), however, the customer refused, intending to see the defrag to completion.
Earlier today, I found a blog entry on Microsoft regarding possible solutions. For completeness sake, here it is:
The operating system error 665, indicating a file system limitation
has been reached continues to gain momentum beyond DBCC snapshot
files. Over the last ~18 months I have reproduced issues with
standard database files, BCP output, backups and others.
We have posted previous blogs talking about the NTFS attribute
design and associated limitations (665) as the storage space for the
attribute structures becomes exhausted.
Online DBCC has been susceptible to this limitation leveraging a
secondary stream for copy-on-write activities. The sparse nature of
DBCC snapshot or a snapshot database can drive attribute exhaustion.
As space is acquired the disk storage location(s) and size(s) are
stored in the attribute structures. If the space is adjacent to a
cluster already tracked by the file the attributes are compressed into a
single entry, spanning the entire size. However, if the space is
fragmented it has to be tracked with multiple attributes.
The 665 issue can pop up with larger file sizes. As the file grows
it acquires more space. During the space acquisition the attributes are
used to track this space.
Repro 1 – DBCC Snapshot or Snapshot Database
The copy-on-write logic acquires new space as pages are dirtied in
the main database. As writes occur to the snapshot target more
attributes are used. The space allocations and clusters are by
definition and design stored in spare locations.
Repro 2 – Database files
I can insert data into a database with a smaller auto grow size, each
acquiring new disk space. If the disk space is not acquired in strict,
cluster adjacent order the growth is tracked with separate
attributes. After millions of grows from a large import or index
rebuild (usually around 300GB) I can exhaust the attribute list and
trigger the 665 error.
Repro 3 – BCP
BCP extends the output file in the process of simply writing to the
file. Each block written has to acquire disk space and it is reliant on
adjacent disk space allocation to accommodate a large file.
One customer was trying to use a copy of BCP per CPU and the query
option to partition the output streams and improve the export
performance. This can work very well but in this case it back-fired.
The output location was shared and as each copy of BCP was writing data
it caused them to leap frog each other on the same physical media.
Each of the BCP output streams quickly exhausted the attribute storage
as none of them were acquiring adjacent storage.
What Should I Do?
Testing has shown that defragmentation for the physical media may help reduce the attribute usage. Just be sure your defragmentation utility is transactional. (Note:
defragmentation is a different story on SSD media and typically does
not address the problem. Copying the file and allowing the SSD firmware
to repack the physical storage is often a better solution.)
Copy – Doing a file copy may allow better space acquisition.
Use SQL Server 2014 for DBCC online operations. SQL
Server 2014 no longer uses the secondary stream but a separate, sparse
file marked for delete on close. This may reduce the shared attribute
storage required by a secondary stream.
Adjust the auto growth to acquire sizes conducive for production performance as well as packing of attributes.
Format the drive using the larger NTFS metadata, file tracking structures providing a larger set of attribute structures. https://support.microsoft.com/en-us/kb/967351
Use ReFS instead of NTFS. ReFS does not contain the same design as NTFS.
Adjust utility write sizes, for example the BCP
buffer size change: I just discovered that the fix list for SQL Server
2012 SP2 does not appear to contain the fix information. The fix
changes the bcpWrite (internal API used by ODBC driver and BCP.exe) from
a 4K write size to 64K write size.
Backup – Carefully plan the number of files (stripe set) as well as transfer and block sizes.