Scalable State Machine Replication

Staff - Faculty of Informatics

Start date: 16 February 2016

End date: 17 February 2016

You are cordially invited to attend the PhD Dissertation Defense of Eduardo BEZERRA on Tuesday, February 16th 2016 at 14h30 in room SI-008 (Informatics building)

Abstract:
Redundancy provides fault-tolerance. A service can run on multiple servers that replicate each other, in order to provide service availability even in the case of crashes. A way to implement such a replicated service is by using techniques like state machine replication (SMR). SMR provides fault tolerance, while being linearizable, that is, clients cannot distinguish the behaviour of the replicated system to that of a single-site, unreplicated one. However, having a fully replicated, linearizable system comes at a cost, namely, scalability - by scalability we mean that adding servers will always increase the maximum system throughput, at least for some workloads. Even with a careful setup and using optimizations that avoid unnecessary redundant actions to be taken by servers, at some point the throughput of a system replicated with SMR cannot be increased by additional servers; in fact, adding replicas may even degrade performance. A way to achieve scalability is by partitioning the service state and then allowing partitions to work independently. Having a partitioned, yet linearizable and reasonably performant service is not trivial, and this is the topic of research addressed here.

To allow systems to scale, while at the same time ensuring linearizability, we propose and implement the following ideas: (i) Scalable State Machine Replication (S-SMR), (ii) Optimistic Atomic Multicast (Opt-amcast), and (iii) Fast S-SMR (Fast-SSMR). S-SMR is an execution model that allows the throughput of the system to scale linearly with the number of servers without sacrificing consistency. To provide faster responses for commands, we developed Opt-amcast, which allows messages to be delivered twice: one delivery guarantees atomic order (conservative delivery), while the other is fast, but not always guarantees atomic order (optimistic delivery). The implementation of Opt-amcast that we propose is called Ridge, a protocol that combines low latency with high throughput. Fast-SSMR is an extension of S-SMR that uses the optimistic delivery of Opt-amcast: while a command is atomically ordered, some precomputation can be done based on its fast, optimistically ordered delivery, improving response time.

Dissertation Committee:

  • Prof. Fernando Pedone, Università della Svizzera italiana, Switzerland (Research Advisor)
  • Prof. Mehdi Jazayeri, Università della Svizzera italiana, Switzerland (Internal Member)
  • Prof. Nate Nystrom, Università della Svizzera italiana, Switzerland (Internal Member)
  • Prof. Benoît Garbinato, Université de Lausanne, Switzerland (External Member)
  • Prof. Rui Oliveira, Universidade do Minho, Portugal (External Member)