Avid Pro Audio Community

georgia · #1 11-01-2001, 09:35 AM

It appears there is some confusion and myths surrounding RAID technology in these forums. So I thought I’d toss together this overview of RAID technology…. I guess I spent WAY to much time designing data centers...

I hope this helps a bit.
Anyway,

Overview

A drive array is a collection of hard disk drives that are grouped together. When you talk about RAID, there is often a distinction between physical drives and arrays and logical drives and arrays. Physical arrays can be divided or grouped together to form one or more logical arrays. These logical arrays can be divided into logical drives that the operating system sees. The logical drives are treated like single hard drives and can be partitioned and formatted accordingly. The RAID controller is what manages how the data is stored and accessed across the both the physical and logical arrays. It ensures that the operating system only sees the logical drives and does not need to worry about managing the underlying schema. As far as the system is concerned, it's dealing with regular hard drives. A RAID controller's functions can be implemented in hardware or software. Hardware implementations are better for RAID levels that require large amounts of calculations. The single individual RAID levels don't address every application requirement that exist. So, to get more functionality, someone thought of the idea of combining RAID levels. The main benefit of using multiple RAID levels is the increased performance. Usually combining RAID levels means using a hardware RAID controller. The increased level of complexity of these levels means that software solutions are no practical. RAID 0 has the best performance out of the single levels and it is the one most commonly being
combined. Not all combinations of RAID levels exist. The most common combinations are RAID 0+1 and 1+0. The difference between 0+1 and 1+0 might seem subtle… the difference lies in the amount of fault tolerance. Both these levels require at least 4 hard drives to implement so this can get a bit expensive.. ok lets hit the details of RAID levels…

RAID 0
This is the simplest level of RAID… and it just involves striping. Data redundancy is not even present in this level, so it is not recommended for applications where data is critical. This level offers the highest level of performance out of any single RAID level. At least 2 hard drives are required, preferably identical, and the maximum depends on the RAID controller. None of the space is wasted as long as the hard drives used are identical. it's relatively low cost and high performance gain. This level is good for most people that don't need any data redundancy. It works with SCSI and IDE/ATA implementations. Finally, it's important to note that if any of the hard drives in the array fails, you lose everything.

RAID 1
This level is usually implemented as mirroring. Two identical copies of data are stored on two drives. When one drive fails, the other drive still has the data to keep the system going. Rebuilding a lost drive is very simple since you still have the second copy. This adds data redundancy to the system and provides some safety from failures. Some implementations add an extra RAID controller to increase the fault tolerance even more. It’s ideal for applications that use critical data. Even though the performance benefits are not great, it really helps with preserving data. It is also relative simple and has a low cost of implemention. Most RAID
controllers nowadays implement some form of RAID 1.

RAID 2
This level uses bit level striping with Hamming code ECC. The technique used here is somewhat similar to striping with parity but not really. The data is split at the bit level and spread over a number of data and ECC disks. When data is written to the array, the Hamming codes are calculated and written to the ECC disks. When the data is read from the array, Hamming codes are used to check whether errors have occurred since the data was written to the array. Single bit errors can be detected and corrected immediately. This is the only level that really deviates from traditional RAID ideas. Remember, this level is very complicated and expensive RAID controller hardware is needed.

RAID 3
Raid 3 uses byte level striping with dedicated parity. In other words, data is striped across the array at the byte level with one dedicated parity drive holding the redundancy information. The idea behind this level is that striping the data increasing performance and using dedicated parity takes care of redundancy. 3 hard drives are required. 2 for striping, and 1 as the dedicated parity drive. Although the performance is good, the added parity does slow down writes. The parity information has to be written to the parity drive whenever a write occurs. This increased computation calls for a hardware controller, so software
implementations are not practical. RAID 3 is good for applications that deal with large files since the stripe size is small.

RAID 4
This level is very similar to RAID 3. The only difference is that it uses block level striping instead of byte level striping. The advantage in that is that you can change the stripe size to suit application needs. This level is often seen as a mix between RAID 3 and RAID 5, having the dedicated parity of RAID 3 and the block level striping of RAID 5. Again, you'll probably need a hardware RAID controller for this level. Also, the dedicated parity drive continues to slow down performance in this level as well.

RAID 5
RAID 5 uses block level striping and distributed parity. This level tries to remove the bottleneck of the dedicated parity drive. With the use of a distributed parity algorithm, this level writes the data and parity data across all the drives. Basically, the blocks of data are used to create the parity blocks which are then stored across the array. This removes the bottleneck of writing to just one parity drive. However, the parity information still has to be calculated and written whenever a write occurs, so the slowdown involved with that still applies. The fault tolerance is maintained by separating the parity information for a block from the actual data block. This way when one drive goes, all the data on that drive can be rebuilt from the data on the other drives. Recovery is more complicated than usual because of the distributed nature of the parity. Just as in RAID 4, the stripe size can be changed to suit the needs of the application. Also, using a hardware controller is probably the more practical solution. RAID 5 is one of the most popular RAID levels being used today. It appears to be the best combination of performance, redundancy, and storage efficiency.

RAID 0+1
This combination uses RAID 0 for it's high performance and RAID 1 for it's high fault tolerance. Let's say you have 8 hard drives. You can split them into 2 arrays of 4 drives each, and apply RAID 0 to each array. Now you have 2 striped arrays. Then you would apply RAID 1 to the 2 striped arrays and have one array mirrored on the other. If a hard drive in one striped array fails, the entire array is lost. The other striped array is left, but contains no fault tolerance if any of the drives in it fail.

RAID 1+ 0
RAID 1+0 applies RAID 1 first then RAID 0 to the drives. To apply RAID 1, you split the 8 drives into 4 sets of 2 drives each. Now each set is mirrored and has duplicate information. To apply RAID 0, you then stripe across the 4 sets. In essence, you have a striped array across a number of mirrored sets. This combination has better fault tolerance than RAID 0+1. As long as one drive in a mirrored set is active, the array can still function. So theoretically you can have up to half the drives fail before you lose everything, as opposed too nly two drives in RAID 0+1.

In conclusion
Ok now that you know the different RAID levels and configurations, why would you even bother? Well it really all depends on your application and the RAID level you use. However, in general using RAID provides data redundancy, fault tolerance, increased capacity, and increased performance. Data redundancy protects the data from hard drive failures. This benefit is good for companies or individuals that have critical or important data to protect, or just anyone that's paranoid about losing their gigabytes of data. Fault tolerance goes hand in hand with redundancy in providing a better over-all storage system. The only RAID level that does not have any form of redundancy or fault tolerance is RAID 0. RAID also provides
increased capacity by combining multiple drives. The efficiency of how the total drive storage is used depends on the RAID level. Usually, levels involving mirroring need twice as much storage to mirror the data. And lastly, the reason most people go to RAID is for the increase in performance. Depending on the RAID level used, the performance increase is different. For applications that need raw speed, RAID is definitely the way to go.

Here is a simple view of RAID:
Mirroring gives you Redundancy …therefore Data security goes up. Write performance goes down due to duplicated writes ( the amount varies by implementation). and read performance goes up, since there are two spindles with duplicated data that can be accessed by the system. In fact, in some implementations, the data that is closest to the read head of a given spindle is chosen for read making the seek and latency time drop dramatically ( note: again this depends on how your system is implemented and how you configure caching algorithms. The main thing to remember here is that the RAID controller writes the same data blocks to each mirrored drive. Each drive or array has the same information in it To set up mirroring the number of drives will have to be in the power of 2 for obvious reasons. The drawback here is that both drives are tied up during the writing process which limits parallelism and can hurt performance. A good RAID controller will only read from one of the drives since the data on both are the same. While the other is used to read, the free drive can be used for other requests. This increases parallelism, which is pretty much the concept behind the performance increase of RAID.

Stripping
Spreading that single file across a bunch-o-drives. Security of data drops ( more spindles & drive mechanics to break) but this gives you almost unlimited size of a “single” logical disk. Add two 60 gig disks get 1 120 gig disk. . Striping improves the performance of the array by distributing the data across all the drives. The main principle behind striping is parallelism. Imagine you have a large file on a single hard drive. If you want to read the file, you have to wait for the hard drive to read the file from beginning to end. Now, if you break the file up into multiple pieces and distribute it across multiple hard drives, you have all these drives reading a part of the file at the same time. You only have to wait as long as it takes to read each piece since the drives are working in parallel. The same is true if you were writing a large file to a disk. Transfer performance is greatly increased. The more hard drives you have, the greater the increase in performance. The stripe size is a largely debated topic. There is no ideal stripe size but certain sizes work best with certain applications. The performance effects of increasing or decreasing stripe size are apparent. Using a small stripe size will enable files to be broken up more and distributed across the drives. The transfe performance will increase due to the increased parallelism. However, this also increases the randomness of the position of each piece of the file. As you probably guessed already, using a large stripe size will do the opposite of decreasing the size. The data will be less distributed and transfer performance is decreased. The randomness is decreased as well. The best way to find out the right stripe size for your particular application is to experiment. Start out with a medium stripe size and try decreasing or increasing the siz and recording the difference in over-all performance. Remember, if you want to move or transfer a file somewhere, the controller accesses both drives simultaneously, which is where the performance gain kick in. It only takes half the time to transfer the file. If you increase the number of hard drives, the file will be transferred in 1/Nth the time it takes to transfer from 1 hard drive .

Mirroring and stripping
Add them both together data redundancy is up, security of data is better, read performance goes up, much faster ( depending on configuration again), write performance suffers depending on implementation

Sorry for being so long winded…. It just seemed that there is some confusion with regard to RAID capabilities and benefits.

Does RAID work with Protools ? Theortically... Yup. How much? It depends on how you configure your system and the Storage subsystems associated with it. Will Protools SUPPORT RAID? Beats me, ask them...

... For What its worth...

cheers

georgia

Java · #2 07-14-2002, 10:41 PM

georgia,

thanks for taking the time to put it to print.

so i suppose i can conclude that if i want read speed and write speed along with safety I should go with either 1+0 or 0+1.

do you think it would be safe to use RAID 0 and have a second drive array to backup to or would it be better performance wise to implement more drives?

Speed is the most important but i really don't ever want to lose a file - period!

Scsi is so nice but so unaffordable when you add up the cost of the controller card and then the higher prices of scsi drives.

I have seen enclosures that support RAID 0 for less then 500 bucks and figured I could give it a whirl and later move the raid to daily backups (later adding some type of tape drive for weekly backups too)

So should I really be thinking about some system that does 0+1 and/or 1+0 instead of a cheapie 500 dollar enclosure i can add my own drives to?

Thanks in advance.
J

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
simpleton with simple prob seeking simple solution	veritas	003, Mbox 2, Digi 002, original Mbox, Digi 001 (Win)	21	06-20-2004 12:49 AM
Pricing a Simple Audio Post System. Tha turns out to be not so simple...	mazuroo	Post - Surround - Video	1	05-30-2002 06:15 PM
On Board Raid or PCI Raid Adapter ???	Allen Hallada	003, Mbox 2, Digi 002, original Mbox, Digi 001 (Win)	6	01-21-2002 06:16 AM
RAID technology overivew - FYI	georgia	General Discussion	2	10-02-2001 01:26 PM
When is a RAID really a RAID? Will this increase track count?	ThomCat	003, Mbox 2, Digi 002, original Mbox, Digi 001 (Mac)	1	07-25-2000 12:28 AM