De-duplication in BTRFS
Online de-dupe is fast becoming a requirement of large storage systems. There have been several discussions about the pros and cons of online vs offline vs why bother at all. I'm in the camp that says de-dupe is a *requirement* for some people, so please have it as an included option with the file system. It makes little sense to avoid doing it online with BTRFS, so stick it in at that point.
The arguments as to whether it suits my environment are best answered by me, so please let me make the decision as to whether I enable it on my disks.
Responses
Deduplication should be great for us. In our fileserves & mailhubs would save a lot of space without modifying apps.
Depending on how well your mail stores are managed, you may achieve little in the way of meaningful space savings. Enterprise-grade mail systems tend to dedupe data, internally. Thus, what's written to disk may not have much in the way of externally-deduplicatable (i.e., at the filesystem, outside of the app's internal functionality) data. If you're running a mail system that functions like Exchange, your FS-based dedupe rates would be lower than those found on a classic sendmail/mbox-based system.
Where you'll tend to get your best dedupe rates (of the above two listed use cases) are on fileservers - particularly home directory stores - where users are each squirreling away their own copies (or worse, multiple copies per user) of a given document.
Overall, before you can project space savings, you have to have a good idea of how inefficiently your current storage systems are being used.
That is the way NetApp's PAM modules work. They are "dedup aware". The deduped blocks are cached in the PAM card and thus provide bootstorm protection in virtualized environments.
Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.
