ADS

Alternate Data Streams (Loose File Organization?)

Imagine, if you will, that anyone can take any given file and attach another file of any type to it, and then take the original file and rename it and move it and copy it and throw it all around the room (or the house), and that file that you attached to it won’t be adversely effected? Further imagine, if you can, that this “extra” file that’s attached to your file is completely hidden without tools written specifically to detect and expose them!?!?!

Imagine, if you will, how many thousands of files are on your PC, and how every single one of these, system files included, could (and many do), have extra files attached to them that you likely never even had the slightest idea was possible? Read on…

Why?

In the design phase of a recent project, I had a requirement to track some files, and I needed a way to tag or otherwise keep track of files that have been processed by the application. This presented a little bit of a problem due to the nature of the files… they’re on what I’ll call an “open system”, where any user can move, copy, rename or even delete them. Having an application keep up with this proved to be an interesting challenge.

To put this in context a bit, let’s define the overall scenario: a desktop application to manage the common user folders of a Windows OS… Pictures, Videos, Music, Documents, so on and so forth. Now, there’s plenty that can be done via Windows natively to help organize these: standard metadata for certain file types such as ratings, tags, image sizes, music and video lengths, etc., however some other types of files (take a .txt file in particular), offers no way through the standard Win UI to add any such information. Thus, we’ll have an application that scans predetermined file directories and processes the files into a nice app built on your choice of development platform sitting on top of your choice of database (C# and an Access 2013 backend, for this particular project, though both are irrelevant for the context of the article).

Because the files themselves must remain in place and open to standard user interaction, we’re somewhat restricted in viable options. Here’s a few that were considered…

Alternative Considerations

Consideration: Scan the specified directories on a regular basis and check for existence in the db. If not, grab the file and process it. Store a checksum and the last modified date.
Conclusion: Ok, well, I didn’t really even consider this long enough to call it an actual consideration. This would be an obvious mess and hardly the most efficient means to approach it besides.

Consideration: Create/Use a FileSystemMonitor Service in .NET to watch the specified directories for changes.
Conclusion: Sounds good, except for bit of a caveat with the FileSystemMonitor. The FSM will notify file creations, deletions and renames, but won’t inherently notify when a file is moved. Hence, when a file is moved, you’ll have a notification that the file was deleted, followed by another notification that the file was created (or maybe those ought to be swapped in sequence). Either way, there’s no definitive link between the two, so determining that the new file is actually the old file is not the cleanest thing in the world (and never mind what would happen if a user copies more than a handful of files from one folder to another). All that aside, a FSM service is a bit of work to set up anyway… a bit more than I had in mind for this project. Besides which, requiring a Windows Service to support a toy app that makes your Windows media and document a little more fun to play with? Naaa…

Consideration: Using a FileTable feature SQL Server 2012 to share a SQL table as if it were a Windows directory, and attaching triggers to the FileData? (Thanks to BananaRepublic for that idea)
Conclusion: Although an excellent idea, and of all, the easiest to manage syncing of files with, for this scenario it wouldn’t do as the files must remain completely open for standard users to interact with as they would on any day-to-day basis.

Consideration: Accessing the System Properties of files and writing data to them, as devised and demonstrated by Wayne Phillips in this link (thanks to Crystal Long for the suggestion on that).
Conclusion: I have to admit that I didn’t give this a thorough testing, but it appears that these system properties are more or less predefined based on type, and it appears that writing a custom property (such as a database’s entry ID) may not be the most straightforward (or even possible), and putting the information into a property designated by the system for something else doesn’t give me the warm and fuzzies that the system might not decide to overwrite that information for me.

Consideration: Search the web for a way to add metadata to a file.
Conclusion: Alternate Data Streams!

Imagine, if you will, that anyone can take any given file and attach another file of any type to it, and then take the original file and rename it and move it and copy it and throw it all around the room, and that file that you attached to it won’t be adversely effected? Further imagine, if you can, that this “extra” file that’s attached to your file is completely hidden without tools written specifically to detect and expose them?

This certainly does sound like an ideal place to stuff a database record ID, doesn’t it? Maybe, we’ll get into that in a bit, but first I want to take the opportunity to point out the obvious security concerns of such a feature (simply because I wouldn’t want anyone to think I overlooked it). I’ll keep it short and simple: any hacker can attach malware as an executable into the most innocent of files. A 10 byte text file can have a 100mb executable attached to it, and it’ll show up in Windows Explorer simply as a 10 byte text file. I won’t get into the details of security concerns here, because for one, they’re obvious, for two, once you know what to look for there’s articles all over the place on it, and for three, there’s nothing I can do about it anyway (ok, use Mark Russinovich’s Streams program to remove them), but I can use what’s there to help me with this application, so that’s where I’ll keep going.

Limitations?

A few, yes. ADS requires an NTFS filesystem… something which the greater majority of semi-modern Windows systems use these days. NTFS was created by Microsoft and is generally closed, thus support for other OSs is limited without 3rd party tools: Mac uses HFS+, Linux uses a variety of others, but not generally NTFS, Android uses a Flash File System (YAFFS), SD Cards are variable, etc. The point being, we don’t want to rely on NTFS ADS information if the target files will be travelling outside the realm of your Windows laptop/PC or a Windows based tablet.

For the sake of my application most of this doesn’t matter, but it’s good to know anyway, so here’s some questions that we can answer about how well the ADS holds up in different scenarios:

Q: What If: I zip the file?
A: You’ll lose the ADS info. Not taking into account moving to another system or emailing, if you zip a file, then unzip it, the ADS will be gone.

Q: What If: I email the file to myself or someone else who receives it on an NTFS system?
A: You’ll lose the ADS info. Emailed to myself on the same computer (different email account), taking the file from Outlook and attempting to read the ADS reveals that the ADS no longer exists.

Q: What If: The file crosses a LAN (assuming it’s stored and received on NTFS systems)?
A: The ADS information remains (tested on Win7 Pro with Windows Server 2003 R2 running in VirtualBox)

Q: What If: I upload via cloud service (SkyDrive, DropBox) or FTP, then download it later?
A: FTP to Linux based server: ADS is gone. SkyDrive or DropBox I haven’t tested but don’t hold much hope for.

Q: What If: The file sits on an NTFS network drive, and I access it with a non-NTFS based system such as a Tablet?
A: I’m not sure yet! Waiting for an opportunity to test this… I seem to think that it depends on whether you explicitly copy the file to your device, then copy back, or just read the file off the network.

Conclusion

I think the research goes to show that under a specific scenario, ADS can be an extremely powerful tool that’s rather easily at our disposal, though despite the research I’ve put into this today, I’m still willing to admit that such scenarios are likely to be rare. Given the filesystem limitations, if there’s a better tool than ADS, it’d be better used.

But Wait! How?

Ok, if nothing else you just want to see it work. How to do it?

If you want to play in the commandline, take a look at the Practical Guide to ADS in the links section below these: the entire thing can be demonstrated in 4 or 5 commands from the commandline.

  • VB6 and presumably VBA can find some information here (thanks Tony Toews for directing me to Karl’s entry): http://vb.mvps.org/samples/Streams/
  • For C#, there’s nothing managed available, so we need to use the WinAPI instead. A lightweight class for reading and writing can be found here: http://www.codeproject.com/Articles/9387/Manipulate-Alternate-Data-Streams.  A much more robust project that handles just about everything you could want can be found here: http://www.codeproject.com/Articles/2670/Accessing-alternative-data-streams-of-files-on-an

Links