Over the next couple of months, I’d like to slowly sketch out some of the thoughts and impressions that I’ve been gathering about Exchange 2010 storage over the last year or so and combine them with the specific insights that I’m gaining at my new job. In this inaugural post, I want to tackle what I have come to view as the fundamental question that will drive the heart of your Exchange 2010 storage strategy: will you use a RAID configuration or will you use a JBOD configuration?

In the interests of full disclosure, the company I work for now is a strong NetApp reseller, so of course my work environment is conducive to designing Exchange in ways that make it easy to sell the strengths of NetApp kit. However, part of the reason I picked this job is precisely because I agree with how they address Exchange storage and how I think the Exchange storage paradigm is going to shake out in the next 3-5 years as more people start deploying Exchange 2010.

In Exchange 2010, Microsoft re-designed the Exchange storage system to target what we can now consider to be the lowest common denominator of server storage: a directly attached storage (DAS) array of 7200 RPM SATA disks in a Just a Box of Disks (JBOD) configuration. This DAS/JBOD/SATA (what I will now call DJS) configuration has been an unworkable configuration for Exchange for almost its entire lifetime:

  • The DAS piece certainly worked for the initial versions of Exchange; that’s what almost all storage was back then. Big centralized SANs weren’t part of the commodity IT server world, reserved instead for the mainframe world. Server administrators managed server storage. The question was what kind of bus you used to attach the array to the server. However, as Exchange moved to clustering, it required some sort of shared storage. While a shared SCSI bus was possible, it not only felt like a hack, but also didn’t scale well beyond two nodes.
  • SATA, of course, wasn’t around back in 1996; you had either IDE or SCSI. SCSI was the serious server administrator’s choice, providing better I/O performance for server applications, as well as faster bus speeds. SATA, and its big brother SAS, both are derived from the lessons that years of SCSI deployments have provided. Even for Exchange 2007, though, SATA’s poor random I/O performance made it unsuitable for Exchange storage. You had to use either SAS or FC drives.
  • RAID has been a requirement for Exchange deployments, historically, for two reasons: to combine enough drive spindles together for acceptable I/O performance (back when disks were smaller than mailbox databases), and to ensure basic data redundancy. Redundancy was especially important once Exchange began supporting shared storage clustering and required both aggregate I/O performance only achievable with expensive disks and interfaces as well as the reduced chance of a storage failure being a single point of failure.

If you look at the marketing material for Exchange 2010, you would certainly be forgiven for thinking that DJS is the only smart way to deploy Exchange 2010, with SAN, RAID, and non-SATA systems supported only for those companies caught in the mire of legacy deployments. However, this isn’t at all true. There are a growing number of Exchange experts (and not just those of us who either work for storage vendors or resell their products) who think that while DJS is certainly an interesting option, it’s not one that’s a good match for every customer.

In order to understand why DJS is truly possible in Exchange 2010, and more importantly begin to understand where DJS configurations are a good fit and what underlying conditions and assumptions you need to meet in order to get the most value from DJS, we need to separate these three dimensions and discuss them separately.

JBOD vs RAID

While I will go into more detail on all three dimensions at later date, I want to focus on the JBOD vs.. RAID question now. If you need some summaries, then check out fellow Exchange MVP (and NetApp consultant) John Fullbright’s post on the economics of DAS vs. SAN as well as Microsoft’s Matt Gossage and his TechEd 2009 session on Exchange 2010 storage. Although there are good arguments for diving into drive technology or storage connection debates, I’ve come to believe that the central philosophy question you must answer in your Exchange 2010 design is at what level you will keep your data redundant. Until Exchange 2007, you had only one option: keeping your data redundant at the disk controller level. Using RAID technologies, you had two copies of your data[1]. Because you had a second copy of the data, shared storage clustering solutions could be used to provide availability for the mailbox service.

With Exchange 2007’s continuous replication features, you could add in data redundancy at the application level and avoid the dependency of shared storage; CCR creates two copies, and SCR can be used to create one or more additional copies off-site. However, given the realities of Exchange storage, for all but the smallest deployments, you had to use RAID to provide the required number of disk spindles for performance. With CCR, this really meant you were creating four copies; with SCR, you were creating an additional two copies for each target replica you created.

This is where Exchange 2010 throws a wrench into the works. By virtue of a re-architected storage engine, it’s possible under specific circumstances to design a mailbox database that will fit on a single drive while still providing acceptable performance. The reworked continuous replication options, now simplified into the DAG functionality, create additional copies on the application level. If you hit that sweet spot of the 1:1 database to disk ratio, then you only have a single copy of the data per replica and can get an n-1 level of redundancy, where n is the number of replicas you have. This is clearly far more efficient for disk usage…or is it? The full answer is complex, the simple answer is, “In some cases.”

In order to get the 1:1 database to disk ratio, you have to follow several guidelines:

  1. Have at least three replicas of the database in the DAG, regardless of which sites they are in. Doing so allows you to place both the EDB and transaction log files on the same physical drive, rather than separating them as you did in previous versions of Exchange.
  2. Ensure that you have at least two replicas per site. The reason for this is that unlike Exchange 2007, you can reseed a failed replica from another passive copy. This allows you to avoid reseeding over your WAN, which is something you do not want to do.
  3. Size your mailbox databases to include no more users than will fit in the drive’s performance envelope. Although Exchange 2010 converts many of the random I/O patterns to sequential, giving better performance, not all has been converted, so you still have to plan against the random I/O specs.
  4. Ensure that write transactions can get written successfully to disk. Use a battery-backed caching controller for your storage array to ensure the best possible performance from the disks. Use write caching for the physical disks, which means ensuring each server hosting a replica has a UPS.

At this point, you probably have disk capacity to spare, which is why Exchange 2010 allows the creation of archive mailboxes in the same mailbox database. All of the user’s data is kept at the same level of redundancy, and the archived data – which is less frequently accessed than the mainline data – is stored without additional significant disk or I/O penalty. This all seems to indicate that JBOD is the way to go, yes? Two copies in the main site, two off-site DR copies, and I’m using cheaper storage with larger mailboxes and only four copies of my data instead of the minimum of six I’d have with CCR+SCR (or the equivalent DAG setup) on RAID configurations.

Not so fast. Microsoft’s claims around DJS configurations usually talk about the up-front capital expenditures. There’s more to a solid design than just the up-front storage price tag, and even if the DJS solution does provide savings in your situation, that is only the start. You also need to think about the lifetime of your storage and all the operational costs. For instance, what happens when one of those 1:1 drives fails?

Well, if you bought a really cheap DAS array, your first indication will be when Exchange starts throwing errors and the active copy moves to one of the other replicas. (You are monitoring your Exchange servers, right?) More expensive DAS arrays usually directly let you know that a disk failed. Either way, you have to replace the disk. Again, with a cheap white-box array, you’re on your own to buy replacement disks, while a good DAS vendor will provide replacements within the warranty/maintenance period. Once the disk is replaced, you have to re-establish the database replica. This brings us to the wonderful manual process known as database reseeding, which is not only a manual task, but can take quite a significant amount of time – especially if you made use of archival mailboxes and stuffed that DJS configuration full of data. If we can reseed 20GB of data per hour[2] (from a local passive copy to avoid the I/O hit to the active copy), that’s 10 hours for a 200GB database or 50 hours – over two days! – for a 1 TB database. All during that time, you have one less replica of that database to protect you. If your business processes and requirements don’t give you that amount of leeway, you either have to design smaller databases (and waste the disk capacity, which brings us right back to the good old bad days of Exchange 2000/2003 storage design) or use RAID.

Now, with a RAID solution, we don’t have that same problem. We still have a RAID volume rebuild penalty, but that’s happening inside the disk shelf at the controller, not across our network between Exchange servers. And with a well-designed RAID solution such as generic RAID 10 (1+0) or NetApp’s RAID DP, you can actually survive the loss of more disks at the same time. Plus, a RAID solution gives me the flexibility to populate my databases with smaller or larger mailboxes as I need, and aggregate out the capacity and performance across my disks and databases. Sure, I don’t get that nice 1:1 disk to database ratio, but I have a lot more administrative flexibility and can survive disk loss without automatically having to begin the reseed dance.

Don’t get me wrong – I’m wildly enthusiastic that I as an Exchange architect have the option of designing to JBOD configurations. I like having choices, because that helps me make the right decisions to meet my customers’ needs. And that, in the end, is the point of a well-designed Exchange deployment – to meet your needs. Not the needs of Microsoft, and not the needs of your storage or server vendors. While I’m fairly confident that starting with a default NetApp storage solution is the right choice for many of the environments I’ll be facing, I also know how to ask the questions that lead me to consider DJS instead. There’s still a place for RAID at the Exchange storage table.

In further installments over the next few months, I’ll begin to address the SATA vs. SAS/VC and DAS vs. SAN arguments as well. I’ll then try to wrap it up with a practical and realistic set of design examples that pull all the pieces together.

[1] RAID-1 (mirroring) and RAID-10 (striping and mirroring) both create two physical copies of the data. RAID-5 does not, but it allows the loss of a single drive failure — effectively giving you a virtual second copy of the data.

[2] I don’t yet have solid data on how fast reseeds are in real-world conditions, so this number is an educated guess. I do believe, however, it’s a higher rate than what you’d see in most circumstances.

  • Share/Bookmark
4 Responses to “From Whence Redundancy? Exchange 2010 Storage Essays, part 1”
  1. Xiotech Employee <– just want to be clear :)

    WOW – Great job on this blog post. In fact, i'll be forwarding it around my team.

    What’s really interesting is we are seeing more and more companies being pushed to look at DAS solutions for their application environments. Mostly I think it’s around reducing the overall cost of the solution, as well as certain predictability around performance and reliability. DAS solutions in the past, have signified low cost, low performance and low reliability as well as an inability to scale very far. Not to mention, it was missing key features that the applications vendors didn’t include like COW snapshots, replication, deduplication etc.

    Just think, 10 years ago we spent a lot of time educating people on SAN vs. DAS. We talked about how great it is to create a snapshot, or replicate your data to a DR site and you couldn’t do that unless it was in a Storage Array with those features.

    If we look at storage controllers today, we’ve packed a whole bunch of features into 2(ea) servers/controller heads. In today’s storage array, not only is it doing RAID and Cache protection (which is super important), but it’s also doing thin provisioning, replications, dedupe (in some cases), snapshots (full copy and COW), multi-tier migration, CIFS, NFS, FC, iSCSI, FCoE etc etc. It’s getting to the point that performance predictability is pretty much going away. Reliability of the code, and mixing of different technologies (1GB, 2GB, 4GB FC Drive bays, SAS connections) as well as all the various “plumbing” connectivity options most arrays offer today. Not to mention, the fundamental building block of a Storage array is data protection. 2TB drives, rebuild times etc all take a toll on controllers.

    Fast forward 10 years and I think we are seeing the application vendors have caught up. Exchange 2010 is a great example of this. VSphere is another one. Solutions like our Emprise 5000 (I did mention I was with Xiotech right ? :) – ) product line offer the ability to have DAS predictability. Both from a performance stand point, as well as a price standpoint. You need 10TB’s of Tier 1, Tier 2 or Tier 3 storage, it costs X amount. You need 10,000 IOPS it’s X amount etc.

    Don’t get me wrong, there is CLEARLY a need for Storage Controllers to do some of the things you pointed out in your blog post, I still think Intelligent DAS has a very large place in the overall Storage foundation of a datacenter.

    @StorageTexan

  2. [...] among other applications.  In fact, Devin Ganger did a great blog post around this very subject in regards to Exchange 2010.  It was a pretty cool read. I left a comment on his site, it pretty much matches (some may call [...]

  3.  
Leave a Reply