In our previous blog entries in this series, we have examined how networking and application stack considerations affect good cloud design. In this entry, we will take a look at how to design for storage in the cloud.
Cloud Storage Considerations
The results of your application research drives cloud storage design. Based on the data gathered about applications, we can determine how much storage being used will be ephemeral and how much must be persistent. We then size our Node Controllers (NCs) accordingly in terms of ephemeral storage. We determine what type of Storage Controller (SC) configuration we will use (local vs SAN adapter) and size the disk array accordingly. The implementation of the cloud storage also feeds back into the networking design since the addition of significant iSCSI load increases the likelihood of having a private storage only network. There are two primary considerations:
Take the following example for sizing NC storage:
* Applications: Majority of storage is ephemeral, small instances (5 GB size).
* NC Servers: Medium-size boxes, CPU and memory able to support up to 20 instances per box.
* Storage Design Recommendation: Need at least 200 GB of RAID 5 (or RAID 10 if you have the money...) across all NCs.
SC Storage
Take the following example for sizing SC storage:
* Need a lot of persistent storage
* Need a SAN adapter and SAN. (iSCSI).
What are the IOPS needs?
** If high, automatically pushes into 10GB Ethernet.
** Is 10GB E enough on it’s own, or do we need to dedicate NIC(s) for storage?
Walrus Storage Considerations:
Walrus storage planning can be challenging. Walrus is an S3-compatible storage repository used for Eucalyptus images and snapshots, as well as user files in a storage-as-a-service configuration. Walrus only uses “local” disk (although it is possible to use SAN on the back-end). In HA configurations, it uses DRBD for data replication.
It is not trivial to increase disk space available to the Walrus without downtime. Careful planning is a high priority. Some initial considerations:
* If you have the option, design it with so much space that you will never have to increase size later.
* If you have a SAN with thin provisioning, back the Walrus w/thin-provisioned LUNs. Make them enormous. If possible, put each LUN in an HA configuration on a different SAN to avoid SPOF.
* In any case, you have to size it massive from the get-go. Go roughly double the size of the storage controller space available. (Snapshots, EMIs, STaaS, etc...) Problematic matching SANs with petabytes of storage.......
Rule of thumb: Many people start w/a 2 TB walrus, but some folks do significantly bigger. Your mileage may vary.
In our next and final post in this series, we will take a look at compute infrastructure considerations as well as some additional pitfalls to watch out for when designing a Eucalyptus cloud.
Cloud Storage Considerations
The results of your application research drives cloud storage design. Based on the data gathered about applications, we can determine how much storage being used will be ephemeral and how much must be persistent. We then size our Node Controllers (NCs) accordingly in terms of ephemeral storage. We determine what type of Storage Controller (SC) configuration we will use (local vs SAN adapter) and size the disk array accordingly. The implementation of the cloud storage also feeds back into the networking design since the addition of significant iSCSI load increases the likelihood of having a private storage only network. There are two primary considerations:
- What are the disk necessities for the node controllers? (Based on app research, i/o, network, etc.)
- Ephemeral vs. persistent storage ratio (and raw numbers)
Take the following example for sizing NC storage:
* Applications: Majority of storage is ephemeral, small instances (5 GB size).
* NC Servers: Medium-size boxes, CPU and memory able to support up to 20 instances per box.
* Storage Design Recommendation: Need at least 200 GB of RAID 5 (or RAID 10 if you have the money...) across all NCs.
- Design Note: As a rule of thumb, take expected amount of storage used for applications (cache, OS, etc.) and double it. When you double the space, you need to add more disks/spindles, and will theoretically get more IOPS.
- Note: You should max out at 7 disks in RAID array, as we have observed "interesting" behavior at 8 or higher. If need more ephemeral storage, add additional NCs to the cluster.
SC Storage
Take the following example for sizing SC storage:
* Need a lot of persistent storage
* Need a SAN adapter and SAN. (iSCSI).
What are the IOPS needs?
** If high, automatically pushes into 10GB Ethernet.
** Is 10GB E enough on it’s own, or do we need to dedicate NIC(s) for storage?
- Design Note: You should consider a separate storage VLAN.
- Design Note: Using FC as SC storage means *NO HA IS POSSIBLE* for the Eucalyptus SC.
Walrus Storage Considerations:
Walrus storage planning can be challenging. Walrus is an S3-compatible storage repository used for Eucalyptus images and snapshots, as well as user files in a storage-as-a-service configuration. Walrus only uses “local” disk (although it is possible to use SAN on the back-end). In HA configurations, it uses DRBD for data replication.
It is not trivial to increase disk space available to the Walrus without downtime. Careful planning is a high priority. Some initial considerations:
* If you have the option, design it with so much space that you will never have to increase size later.
* If you have a SAN with thin provisioning, back the Walrus w/thin-provisioned LUNs. Make them enormous. If possible, put each LUN in an HA configuration on a different SAN to avoid SPOF.
* In any case, you have to size it massive from the get-go. Go roughly double the size of the storage controller space available. (Snapshots, EMIs, STaaS, etc...) Problematic matching SANs with petabytes of storage.......
Rule of thumb: Many people start w/a 2 TB walrus, but some folks do significantly bigger. Your mileage may vary.
In our next and final post in this series, we will take a look at compute infrastructure considerations as well as some additional pitfalls to watch out for when designing a Eucalyptus cloud.
Regarding NC storage for instance ephemeral:
ReplyDeleteThis is a tradeoff and users should consider workload, usage patterns and their deployment as a whole before making decisions.
RAID5 or 10 with lots of SAS disks and a decent disk controller is all well and good, giving fault tolerance and speed (depending on level) but costs money and reduces ephemeral storage space. The end result is that instance sizes can be limited and this will push users to leverage EBS *more*.
Some users might want very big ephemeral storage for their instances to reduce the dependency (and other associated issues) of using EBS. Of course, workload is king and will dictate whether non-persistent storage is enough.
Instances themselves are ephemeral and should not be perceived as persistent (they will fail), so why not architect the service to play off this? Can this work with the SLA in place? If so, perhaps consider cheaper (SATA) and larger disks (in RAID0 for some extra speed).
This won't work for all deployments but dev/test in particular may benefit from this approach, where often the workload is transient.