High Performance Deferred Update Replication

Staff - Faculty of Informatics

Start date: 17 March 2015

End date: 18 March 2015

You are cordially invited to attend the PhD Dissertation Defense of Daniele SCIASCIA on Tuesday, March 17th 2015 at 16h30 in room A33 (Red building)

Abstract:
Replication is a well-known approach to implementing storage systems that can tolerate failures. Replicated storage systems are designed such that the state of the system is kept at several replicas. A replication protocol ensures that the failure of a replica is masked by the rest of the system, in a way that is transparent to its users. Replicated storage systems are among the most important building blocks in the design of large scale applications. Applications at scale are often deployed on top of commodity hardware, store a vast amount of data, and serve a large number of users. The larger the system, the higher its vulnerability to failures. The ability to tolerate failures is not the only desirable feature in a replicated system. Storage systems need to be efficient in order to accommodate requests from a large user base while keeping low response times. In that respect, replication can leverage multiple replicas to parallelize the execution of user requests.

This thesis focuses on Deferred Update Replication (DUR), a well-established database replication approach. It provides high availability in that every replica can execute client transactions. In terms of performance, it is better than other replication techniques in that only one replica executes a given transaction while the other replicas only apply state changes.
However, DUR suffers from the following drawback: each replica stores a full copy of the database, which has consequences in terms of performance.

The first consequence is that DUR cannot take advantage of the aggregated memory available to the replicas. Our first contribution is a distributed caching mechanism that addresses the problem. It makes efficient use of the main memory of an entire cluster of machines, while guaranteeing strong consistency.

The second consequence is that DUR cannot scale with the number of replicas. The throughput of a fully replicated system is inherently limited by the number of transactions that a single replica can apply to its local storage. We propose a scalable version of the DUR approach where the system state is partitioned in smaller replica sets. Transactions that access disjoint partitions are parallelized.

The last part of the thesis focuses on latency. We show that the scalable DUR-based approach may have detrimental effects on response time, especially when replicas are geographically distributed. The thesis considers different deployments and their implications on latency. We propose optimizations that provide substantial gains in geographically distributed environments.

 

Dissertation Committee:

  • Prof. Fernando Pedone, Università della Svizzera italiana, Switzerland (Research Advisor)
  • Prof. Mehdi Jazayeri, Università della Svizzera italiana, Switzerland (Internal Member)
  • Prof. Olaf Schenk, Università della Svizzera italiana, Switzerland (Internal Member)
  • Prof. Jose Enrique Armendariz-Inigo, UPNa, Spain (External Member)
  • Prof. Luis Rodrigues, INESC-ID, Portugal (External Member)
  • Prof. Willy Zwaenepoel, EPFL, Switzerland (External Member)