Storage I/O in Modern Servers and Data-centric Applications: Efficiency and Scalability Challenges1

Tutorial at HiPEAC 2012, 24 January 2012, Paris, France
Angelos Bilas, FORTH, Greece,
Toni Cortes, UPC, Spain
Ricardo-Jimenez Peris, UPM, Spain
Bilha Mendelson and Muli Ben-Yehuda, IBM, Israel

With the recent emergence of data-centric applications, there is a need to process data at higher rates for keeping up with demand. This increases pressure on the I/O path of modern systems for accessing and processing data. On the other hand multicore processors and system architectures allow raw resources to scale. Servers today can easily incorporate tens of cores, multiple storage controllers, and numerous storage devices. However, achieving high processing rates end-to-end in the I/O path is far from trivial. This problem is exacerbated by the trends towards virtualization in modern datacenters for isolation and portability purposes. Existing layers in the I/O path introduce significant overheads or incur limitations.  In addition, most of these layers have not been designed with "fat" multicores in mind.

In this tutorial we will examine the I/O path from applications to devices. We will discuss overheads and solutions currently being proposed and we will provide projected target numbers for I/O subsystem performance in the next years. First, we will present a set of typical data-centric applications, their requirements, and challenges. We will focus on transactional and streaming applications, since lately these have attracted significant attention. Then, we will discuss the different layers in the base I/O stack and related overheads and scaling limitations. Next, we will present overheads associated with virtualized I/O and in particular VMM and guest crossings. Finally, we will discuss the potential of mixed I/O hierarchies, using both SSDs and disks, and the potential of new types of memories.

Tentative Program

10:00 - 10:15: Opening Remarks (Angelos Bilas)
10:15 - 10:45: Cloud and data-center applications scaling (Ricardo Jimenez Peris)
10:45 - 11:15: Scaling the I/O stack on multicore processors (Angelos Bilas)
11:15 - 11:30: Break
11:30 - 12:00: System virtualization: approaches and overhead (Bilha Mendelson and Muli-Ben Yehuda)
12:00 - 12:30: The evolving storage hierarchy (Toni Cortes)
12:30 - 13:00: Closing remarks/Panel (all presenters)


1This tutorial is conducted in the context of the IOLanes project ( IOLanesis a EU-funded research project targeted at understanding and improving the I/O performance in modern hardware that employees multicore architectures by adapting or redesigning the I/O stack, as appropriate and by providing system-level support that will allow future storage systems to take advantage of multicore CPUs in new ways.


Cloud and data-center applications scaling (Ricardo Jimenez Peris):

This presentation will discuss the architecture and requirements of data-centric applications for cloud infrastructures, including data streaming systems, transactional systems, and tuple-store type applications. First we will provide a categorization of various applications and the rationale behind different paradigms for data processing. Then, we will discuss scaling requirements in each paradigm and we will provide an understanding of the main issues that are currently being considered as the limitation factors for further application scaling. Finally, we will focus on current trends and approaches to address some of these issues by research.

Scaling the I/O stack on multicore processors (Angelos Bilas):

This presentation will first cover required background to understand what happens when an I/O request travels from the application all the way to the storage devices and the purpose of the main functions and layers in the I/O stack. Next, the presentation will discuss projections on basic trends and how they will affect persistent I/O performance. Finally, the presentation will focus on scaling issues over multicore processors that have not been taken into account when traditional I/O stacks have been designed but are currently emerging as main problems for future I/O architectures and systems.

Storage I/O parallelism and heterogeneity (Toni Cortes):

Initially, magnetic disks were directly connected to processing board building a simple and well know hierarchy (disk, buffer cache, main memory). Nevertheless, progress in many different technological areas related to storage such as i) network technology that has allowed to take disks farther from the CPU, ii) parallel storage systems built from several disk devices in parallel that offer high bandwidths (such as RAID systems) but reducing the reliability, iii) new storage technology such as SSDs or Storage Class Memories (SCM) that have much lower latencies and higher bandwidths but at the cost of capacity, and iv) the new cloud storage techniques have implied significant changes in the storage hierarchy. In addition, the understanding that not all stored data has the same importance has also guided some changes in this hierarchy. This presentation will discuss how all these technological and conceptual changes have affected the storage hierarchy over time and what new changes we should be expecting in the near and mid-term future.

System virtualization: approaches and overhead (Bilha Mendelson and Muli-Ben Yehuda):

Server virtualization has been widely adopted by the market and the number of servers running virtual machines is increasing daily. As machine virtualization gains popularity, the hypervisor itself, along with its management stack, becomes a basic and required part of the system. The next natural evolution phase in the virtualization abstraction chain is to view the hypervisor as part of the user workload and to be able to run multiple hypervisors inside virtual machines, each with its own set of nested guest virtual machines. A key question with virtualization in general, and nested virtualization in particular, is how to run virtual machines efficiently, i.e., with minimal run-time overhead. The talk will begin by covering the basics of single-level x86 virtualization, covering approaches for virtualization the CPU, MMU, and I/O devices. We will then present an analysis of nested virtualization in hardware platforms with only a single level of hardware virtualization support, such as the x86 platform, and the Turtles project, the first implementation of high-performance nested virtualization on Intel x86-based systems. We will cover Device Assignment and Exitless Interrupts, two approaches we have developed for reducing the overhead of I/O virtualization. With device assignment and exitless interrupts, virtual machines can achieve 97%--100% of bare-metal performance even for the most demanding I/O-intensive workloads.

Tutorial Presenters

Dr. Bilha Mendelson has been a member of the Haifa Research Laboratory in Haifa for several years. She worked on avionic real-time systems at Elbit Ltd. before attending graduate school. In 1990 she joined the Haifa Research Laboratory. She has been developing compiler optimizations for compiler for DSP and also for the other IBM architectures. Currently she is a senior manager of the System Optimization and Quality Technologies department. She received a B.Sc. and M.Sc. in computer science from the Technion - Israel Institute of Technology, Haifa, and Ph.D. in computer engineering from the University of Massachusetts at Amherst. She holds several patents primarily in the area of code optimization. Her areas of interest include code optimization algorithms, compiler technology, computer architecture, and performance improvement issues.

Muli Ben-Yehuda is a systems researcher at IBM Research -- Haifa. He is a recognized expert in the area of machine and I/O virtualization, an IBM Master Inventor, and the recipient of the OSDI 2010 Jay Lepreau Best Paper Award for the paper "The Turtles Project: Design and Implementation of Nested Virtualization". He has co-authored over twenty publications and holds several US patents. His research interests include machine virtualization, cloud computing, operating system and hypervisor design and implementation, and I/O virtualization. He has contributed code and ideas to many operating systems and hypervisors, including the Linux kernel, and the Xen and KVM hypervisors. He holds a B.A. in Computer Science (cum laude) from the Open University of Israel and is currently pursuing a Ph.D. in Computer Science at the Technion – Israel Institute of Technology.

Prof. Angelos Bilas received his diploma in Computer Engineering from the University of Patras in 1993, and the M.S and Ph.D. degrees in Computer Science from Princeton University, NJ in 1995 and 1998 respectively. Prof. Bilas is currently an Associate Professor at FORTH-ICS and the University of Crete, Greece. Between 1998-2002 he held an Assistant Professor position with the ECE Department at the University of Toronto. His current work focuses at the architectural layer and all software layers (firmware, operating systems, user-level applications) of computer systems as well as the interactions between these layers. During 2000-2001, he was part of the senior technical team in Emphora, Inc., a startup company located in Princeton, NJ, USA. Emphora built high-end storage subsystems for database and file servers, based on novel, scalable, high-performance interconnect technologies and I/O communication protocols. His current interests include architectures and runtime-system support for scalable storage systems, communication protocols, and miniaturisation of computer systems. His work has been published in prestigious conferences in computer architecture and systems (ISCA, HPCA, ASPLOS). Prof. Bilas is the recipient of a Marie Curie Excellent Teams Award (2005-2009). He regularly participates to program committees of top conferences in his area and he is an Associate Editor for IEEE Computer Architecture Letters.

Prof. Toni Cortes is the manager of the storage-system group at the BSC (since 2006) and is also an associate professor at Universitat Politècnica de Catalunya (since 1998). He received his M.S. in computer science in 1992 and his Ph.D. also in computer science in 1997 (both at Universitat Politècnica de Catalunya). Since 1992, Toni has been teaching operating system and computer architecture courses at the Barcelona school of informatics (UPC) and from 2000 to 2004 he also served as Vice Dean for international affairs at the same school. His research concentrates in storage systems, programming models for scalable distributed systems, and operating systems. He has published 10 journal papers (plus 7 Lecture Notes), 48 papers in international conferences and workshops, 2 book chapters, and has co-edited one book on mass storage systems. In addition, he has also advised 6 PhD thesis since 1997. Dr. Cortes has been involved in several EU projects (Paros, Nanos, POP, and XtreemOS) and has also participated in cooperation with IBM (TJW research lab) on scalability issues both for MPI and UPC. He is also editor of the Cluster Computing Journal and the coordinator of the SSI task in the IEEE TCSS. He has served in many international conference program committees and/or organising committees and was general chair for the Cluster 2006 conference. His involvement in IEEE CS has been awarded by the "Certificate of appreciation" in 2007.

Prof. Ricardo Jimenez-Peris co-director of LSD, has over fifteen years of experience in distributed systems research and transfer of technology activities. He is currently an Associate Professor of Computer Science at UPM. He obtained his PhD degree in Computer Science from UPM where he was distinguished with one of the best PhD awards of the academic year. He was a postdoc researcher at ETH Zurich during late 1999 and 2000. He has filed several patents, published over 100 research papers and is co-author of the book “Database Replication” published by Morgan & Claypool. His papers have been published in prestigious distributed systems and database journals ACM Trans. On Computer Systems, ACM Trans. On Database Systems, VLDB Journal, etc. and conferences such as SIGMOD, ICDCS, WWW, Middleware, SRDS, etc. He has served in several roles for international conferences such as General Chair for SRDS 2011, PC chair for EDCC 2009, Tutorial Chair in LADC 2008, Workshop Chair for ICDCS 2006, and as PC member in VLDB, ICDCS, ICDE, DSN, SRDS, etc. He has also been invited as keynote speaker in Chilean Computer Science conference 2011 and VECPAR 2006 and has invited speaker at the Microsoft Research DB seminar in 2011 and Oracle Research and Technical Seminar in 2005. He is very active in European research projects, being/having been technical coordinator of the FP7 CumuloNimbo project, coordinator of the FP7 Stream project, research director of FP7 NEXOF-RA project, editor-in-chief of the strategic research agenda of NESSI-Grid, coordinator of the Adapt FP5 project, and being involved in the IOLanes and 4Caast projects.

Contract Info

Contract Number: 248615


- Starting date: 2010/01/01

- Duration: 36 months


- Total cost: €4.260.426

- Funding: €2.718.000

Seventh Framework Programme