Why Disk Cleanup Matters, and How You Can Get It Done

Posted on January 4, 2013 · Posted in Individual Solutions

I suppose it’s because I’m a geek at heart, but I’m fascinated by the concept of disk cleanup…

Why Disk Cleanup capability is important for you

Old Mop

By Disk Cleanup I mean deleting unnecessary files  from the Hard Disk on your own PC. Unnecessary can mean many things – obsolete system files, temporary files, corrupted files… but more importantly, user files, files you’ve saved to your disk in the past and no longer need. This may be because you have newer versions of them, or because you have duplicates of the same version in different folders, or because they’re safely stored elsewhere… or simply because you’ve run out of space. All files you could do without – if you knew which ones and where they are.

Why do you even care to have such files on your disk? You may rightly say that disk storage is so cheap, you can have all the files you want without running out of space. But the point is that even with unlimited space, you wouldn’t want your files to run out of control: having multiple copies or old versions of the same file may lead to confusion, bloat your disk backups, prolong your virus scans, and, yes – run you out of disk space, because it’s true that you have lots of space, but usage grows to fill the available space; especially if you’re into large media files – all the more so if you use a faster but smaller Flash-based Solid State Drive (SSD), which I strongly recommend you do.

Things get worse if you work in an organization that tries to manage your disk by placing quotas on either what you store or what you’re allowed to back up (if you don’t back your data up, read this and do it!). Once you have a quota, you need to trim your stuff – and you need to do it intelligently; which is why you need disk cleanup tools like those I discuss below.

Basic disk cleanup you already have in Windows

Microsoft Windows contains a built-in tool called (somewhat unimaginatively) Disk Cleanup that most users probably haven’t even heard of. This tool scans your disk and shows you what you may delete, and how much space you will recover thereby. It can identify accumulated fluff like temporary files, old log files, old restore points and backup files, and many others. You can tell it to delete the files and you can compress old files to save space.

This can be quite useful: it told me that I have more than 400MB of unneeded “per user queued Windows error reports” – and I don’t even know what those are!…

The shortcoming of this tool is that it won’t address user data – your documents, photos and such. For those you need to look beyond Windows.

Duplicate file detection tools you can get today

Fortunately there many third party utilities you can get that do look at your own data. For example:

These tools usually have a free or trial version you can download, and some offer a reasonably priced “pro” version as well. I haven’t tested them all, but if you’re comfortable with shareware you’ll find your way around them easily. Just be sure to have a backup (I said that already, right?) before testing them!

These tools primarily identify duplicate files, and let you preview them and decide what to delete. They can compare file names and sizes, but they usually also detect files with the same content even if they’ve been renamed. And they find loads of stuff – Fast duplicate file finder scanned my “My documents” folder and found some 1400 groups of duplicate files!

Now it gets interesting

Deleting duplicates is easy… but what if you need even more space? Then you may need to make a crueler cut, and delete files you have only one copy of. How do you choose them?

The trick is to find those files that are (a) largest, and (b) least important. With tens of thousands of files in a thousand sub folders, how do you do it? This is where you need a tool like Treesize Personal. This one also does duplicate finding, but its focus is on visualizing a disk so you can see at a glance which subfolders and files contain the most megabytes. Click “top 100 files” and you get a listing of the largest “offenders” across the entire disk, allowing you to figure which ones have outlived their usefulness to you (I found I have not one but two copies of a bloated installation file for the drivers that came with a camera I no longer even have!)

And you get an unplanned perk: once you start mucking in your old files, using a tool that makes the mucking easy and intuitive, you discover lots of interesting old stuff – family photos you haven’t seen for ages, drafts of creative work you’d never completed, eBooks you may want to revisit… an interesting mix of nostalgia, ideas for new projects and plain fun.

What IT ought to do about it

The tools described above, and their use, fall in the personal domain; you can get them and use them on your own machine. Ideally the IT group in an organization ought to provide this disk cleanup capability, making sure the tool used is compatible with the standard software build. And they may want to customize the cleanup tool so as to throw in some safeguards related to legal requirements in the discovery and retention space.

But then, once we’re talking corporate IT, there is even more to be done across users and groups – duplication and version control are of huge value to an enterprise’s data assets, which may reside – and be duplicated – on individual users’ disks and in their emails as well as in enterprise Intranet systems like SharePoint and others. There are solutions in that space too, but they tend to be applied to formal data collections and not to the files of the folks in the trenches…

What’s still missing

As computers get smarter, we may yet see them play a more active role in this process. I can envision a day when the cleanup program might not just display, but also suggest to you what you may want to delete, based perhaps on its learning your past archiving, deletion, and access behavior. Or, as I wrote before, it could allow us to say “Find all presentations related to X that are over 3 years old unless they have been forwarded by Y to Z in the last 6 months”. Or even offer to “Sort by usefulness”. That would make the computer a true replacement for the secretaries of old, those whose job it was to keep the boss together!

Even sooner, a specific area where we could use some progress is email: the cleanup tools I’ve seen all look at my couple of Gigabytes of email storage as two monolithic PST files. There’s a huge amount of duplication in there, both between folders and within email threads (There was an experimental tool called Outlook Thread Compressor a while back, which could remove early messages that are quoted at the bottom of later replies in the thread. Outlook 2010 has some of this functionality as well. We need more of this sort of thing).  Hear that, software developers?…

 

Related Posts

Handling Obsolescence of Knowledge in Information Work