Organizers: Johns Hopkins School of Public Health and BioTeam
Date: April 26-27, 2016
Venue: Johns Hopkins Bloomberg School of Public Health, Baltimore, MD
Investigators in life-science research institutions routinely generate and process many tens, and often-times hundreds, of Terabytes of data from high-throughput instruments. For institutions to remain competitive, their IT organizations must now field multi-Petabyte scale-out storage systems. Most IT organizations turn to “Enterprise-grade” turnkey storage solutions. Unfortunately this approach diverts an increasingly greater proportion of grant funds and institutional capital away from hypothesis driven research activities. Here we advocate an alternative strategy based on open-source file-systems and white-box hardware.
In the late 1990s, Linux clusters emerged from the convergence of two trends: 1) The development of Linux-based open-source software stacks for networked distributed computing and 2) the availability of commodity computing hardware enabled by increasingly powerful low-end microprocessors.
HPC based on proprietary SMP systems was rapidly replaced by linux clusters based on low-end microprocessors. The approach had many advantages, including: 1) avoidance of vendor lock-in, 2) reduction of cost due to commodity hardware and 3) enhancement of innovation due to full access to an open-source stack.
History repeats. Today we are experiencing a convergence of two comfortably familiar trends: 1) the availability of Linux-based open-source scale-up/scale-out software stacks that enables the use of low-end commodity drives and 2) the availability of commodity storage hardware and low-end NAS drives.
At the Johns Hopkins Bloomberg School of Public Health we have exploited these familiar trends to build the main file system for a cluster with life-science workflows. Our Lustre-over-ZFS file system has 1.2PB (usable) and was built for about $137/TB (usable). This corresponds to a cost of $0.0023/GB/month (if amortized over a 5 year lifetime). The cost of this high-performance file system compares very favorably to the cost of Amazon’s archival tier of storage (Glacier) which is currently ($0.007/GB).
We and our corporate partners appreciate that few research-IT organizations currently possess the skills or expertise to exploit, let alone assess, an open-source/white-box scale-out storage strategy. Yet, our experience is that storage technology is no more mysterious or difficult than linux cluster technology. It is just different. Accordingly, the purpose of this workshop is to address this gap in knowledge, by providing an intensive and cutting-edge primer on Petabyte-scale storage systems with a focus on ZFS-on-linux and Luster-over-ZFS. The workshop is targeted to technical experts who want to learn how to build and manage their own systems.
We will present reference architectures, best practices, and discussions specific to Petabyte-scale open-source/commodity storage systems with a focus on the life sciences. We will describe production systems (including our own) as well as solutions with varying price points and levels of availability. Together with experts from BioTeam, Silicon Mechanics and Intel will make a deep dive into the devilish details.
In addition, we will describe business-models and financing strategies including: 1) Convincing PIs with NIH-R01s to collectively invest in an open-source/white-box storage system and 2) Applying for NIH-S10 and NSF-MRI grant funding. We expect that systems based on documented reference architectures with published performance and reliability data, together with best practices and administration tools, will help to establish credibility in the eyes of individual PIs or review panels from the NIH and NSF.
Workshop participants will take away the following:
- A broad perspective on the storage landscape in the life-sciences
- A perspective on open-source/white-box scale-out storage
- Hands-on experience with installation and basic administration of ZFS-on-Linux and Lustre-over-ZFS.
- An understanding of disk-drive failure and the interpretation of failure rate data
- An overview of reference designs (including parts lists and approximate costs) for two types of storage systems: 1) ZFS-on-linux and 2) Lustre-over-ZFS.
- An understanding of “where the bodies are buried” and how to recover from failures
- Funding strategies for storage systems
- An introduction to potential corporate partners as well as experts in ZFS and Lustre.
- An introduction to a community of like-minded research-IT organizations.
The 2-day workshop will have four sessions. Session I (Morning of April 26) will provide a broad-brush overview of technologies and trends. There will follow an intensive three-session bootcamp suitable for technical experts with interest in developing their own solutions. A detailed agenda and tutorial material will be found on the schedule page.
The workshop will take place in l (room W5030) at the Johns Hopkins Bloomberg School of public Health (615 N. Wolfe St, Baltimore MD, 21205). For directions to the Wolfe St. building please click on either of the following links
If you are driving, we recommend that you park in the Washington St. Parking lot on the NE corner of the intersection of Wolfe and Washington streets. The school is on the SW corner of the same intersection.
We do not have a hotel with a block of reserved rooms, however, the links below should be useful. Airport hotels are inexpensive, but you would need to rent a car. Downtown or Fells Point hotels are a short taxi or uber ride to the medical campus and they are in “fun” neighborhoods.
- List of Hotels that provide discounts if you mention “Johns Hopkins.”
Please register here.