Provide good hybrid storage management in BTRFS

Latest response

In ye good olde days, mainframe and mini storage systems had the ability to grade various storage systems with a cost.  This cost would then be used to allocate disk blocks per their use.

 

i.e. blocks that are accessed the most would be migrated up the storage systems to higher cost (i.e. faster access) while blocks that are seldom used are migrated to lower costs (i.e. slower access).

 

Bringing this up to ye moderne tymes, we could have storage from fast to slow as follows:

 

1 - RAMdisk

2 - SSD

3 - Fibre Attached Storage/SAS etc

4 - SATA arrays

5 - Slower/cheaper SATA disks

6 - Tape

 

The ranking is up to the user and should be capable of many more levels than this.  Disk blocks then have a "heat" associated with them which is used to allocate them to the relevant level of storage.

 

There is already seems to be some work within BTRFS to provide this, but it appears to only have 2 levels of storage (SSD/mechanical disk?) so is much less flexible than it could be.

 

This, along with de-dupe, would be a great building block for virtual storage pools.  It would provide an huge amount of flexibility to admins to build storage systems which automatically allocate blocks.  Hot adding a new type of storage with a different access speed to the pool would kick in a bit of disk re-allocation.  No messing around - all work done by the filesystem.

 

Even for the home/desktop user, this would provide benefit.  A small SSD along with larger SATA disk would provide fast access to the most highly requested blocks, while my photos/music/video etc would largely be stored on the SATA drive (especially the wife's music which I *never* listen to)

Responses

HSM (heirarchical storage management) is a wonderful thing, but...

 

In ye good olde days, the mainframe had a whopping two levels of storage: disk and tape.  You are asking for 6 or 7 or more, possibly without considering the inevitable consequences: users will configure all 6 tiers, and then wonder why their fire-breathing 40-core server is slower than a Pentium II--while it is madly hammering away at its many disk(-like) systems evaluating the 'heat' of every block/extent and migrating it up or down 1 tier at time, for a relatively subtle change in performance.

 

Really, I suspect that 2 tiers is enough and 3 is the most that should be considered--though what those tiers are should be entirely (and easily) user-configured (some applications, it could be RAMdisk vs SSD; other, SSD vs. 7.2k SATA; others, 10k SAS/RAID-10 vs. 5.4k SATA/RAID-6).

 

Tape, sadly, is probably more than can be justified in a file system given the layering approach to Linux storage; asking them to jam tape-library support into the file system to handle the last stage of HSM doesn't strike me as very likely to work.  Maybe a generic plug-in for external/3rd-party "near-line" storage system drivers? Also the performance implications of using devices with wait times in the minutes in the file system layer could get a little scary unless handled very, very carefully.

Would there be any difference for calculating block "heat" for different numbers of storage layers?  The heat is calculated in the same way regardless of what your defined layers are surely.

 

This isn't something for the feint of heart, however.  I'll definitely agree on that.  If it were possible to create a standard config for users with a small SSD and larger 7200rpm SATA (for example), it would be great.

 

My viewpoint is more from the corporate storage market, however, where systems would be dedicated for file serving.  Players like NetApp, EMC, 3-Par etc are all moving towards offering some kind of tiered storage with automatic block allocation.  They'll doubtless all call it something different and charge the earth for it too, but for a distro to provide the building blocks would be a killer feature.

 

Fair point about tape, but there is still use of tape within organisations.  I've worked with places where all backups are to disk libraries rather than tape.  Either way though, to be able to build your own complete storage system (from the hotest blocks right down to very cold backup blocks) would be incredibly useful.

Also - my request is to provide the tool.  I don't so much care about an end user who configures an over-complex system.  There has to be some personal responsibility at the admin level of things.

 

A flexible form of HSM, however, would allow simple systems to be configured (single SSD + single SATA for example for home users) right up to comlpex storage solutions with many different levels of failover, speed, redundancy, RAID etc.  To do this all with the same tool would be killer.  A fully scalable filesystem that can be used in all areas - from tablets/laptops right through to network storage solutions.

 

It's just that the current HSM idea in BTRFS seems to be just 2 different levels, which seems to fall way short of what could be achieved and deployed in corporate land.

To illustrate my point a bit, I found the following consumer product over the festive season which provides SSD caching of hot data in desktop machines:

 

http://www.ocztechnology.com/ocz-synapse-cache-sata-iii-2-5-ssd.html

 

Now I'm not endorsing this product, but is it not clear to see that hierarchical storage systems are the way to go?  Not simply with a mechanical & SSD option for 2 levels.  Let me choose the number of levels I use and let me assign my own "costs" to each level of storage.

 

If I can do this in a laptop, should I not have the option of doing it in the datacentre?

 

Yes, I'm banging the point.  But I don't want Red Hat to get left behind on this as I'll be left having to use the over-expensive storage providers' solutions.  These won't do nearly what I want and will cost an increasing percentage of the IT budget.

 

Duncan