Friday, November 26, 2010

Dangers of Disk Housekeeping

Many computer users like to keep their hard drives tidy by cleaning up redundant files. This is actually a dangerous exercise. Let me explain.

When Windows saves a file to disk, it tries to find a single block of space large enough to store the file instead of fragmenting the data across blocks of space in different parts of the drive. Whether a file is fragmented or not makes no difference to being able to access the file. As long as the file appears in a folder listing, the file system holds the information to be able to link the different blocks together in the right order. But it is faster and more efficient to be able to read a file from one continuous block - though to be truthful, with modern fast PCs and hard drives the difference is hardly noticeable.

Some types of file can't easily be stored in a continuous block, either because they are very large and there isn't a single block of space big enough anywhere on the hard drive, or because the file grows over a period of time and other files have been created that prevent its original space from being extended. Database files and Microsoft Office documents are good examples of files that grow over a period of time.


Microsoft Windows has always included a Disk Defragmenter tool. Its purpose is to improve system performance by ensuring that files on the disk are not fragmented. Although running the defragmenter may not make much difference to the perceived performance of a modern PC, it does have an effect - both good and bad - on the chances of recovering deleted files, and that is what we should be concerned about.

When a file is first deleted, its directory entry and related file system information aren't immediately deleted or overwritten so the file may be recovered intact even if it was fragmented. After these file system structures have been overwritten by new data, however, the only way a file may be recovered is forensically. Essentially what happens then is that the recovery software scans the hard drive examining it block by block. Most files start off with a "header block" that identifies the type of file and contains other information such as the length of the file. When a header block is detected, the recovery software must assume - in the absence of other information from the file system - that the file was not fragmented, and that the file consists of that header block plus however many consecutive data blocks are needed to contain the length of file stated in the header.

It follows from this that the chances of recovering a deleted file, even if it was deleted a long time ago, are much greater if the file was not fragmented at the time it was deleted. At the same time, running Disk Defragmenter severely harms the chances of recovering files that were deleted prior to running it, as the disk space those deleted files occupied are likely to be reused to hold relocated data. The common quiet afternoon task of cleaning up the hard drive followed by emptying the Recycle Bin and defragmenting is actually one of the worst things you can do. Disk housekeeping is one of the most common causes of deleting wanted files and by emptying the Recycle Bin and defragmenting you have just made the job of recovering those deleted files much more difficult.