Building global and scalable systems with Atomic Multicast

Staff - Faculty of Informatics

Date: 29 January 2018 / 14:30 - 16:00

USI Lugano Campus, room SI-003, Informatics building (Via G. Buffi 13)


You are cordially invited to attend the PhD Dissertation Defense of Samuel BENZ on Monday, January 29th 2018 at 14h30 in room SI-003 (Informatics building) 


Abstract:

The rise of worldwide Internet-scale services demands large distributed systems. Indeed, when handling several millions of users, it is common to operate thousands of servers spread across the globe. Here, replication plays a central role, as it contributes to improve the user experience by hiding failures and by providing acceptable latency. In this thesis, we claim that atomic multicast, with strong and well-defined properties, is the appropriate abstraction to efficiently design and implement globally scalable distributed systems.

Internet-scale services rely on data partitioning and replication to provide scalable performance and high availability. Moreover, to reduce user-perceived response times and tolerate disasters (i.e., the failure of a whole datacenter), services are increasingly becoming geographically distributed. Data partitioning and replication, combined with local and geographical distribution, introduce daunting challenges, including the need to carefully order requests among replicas and partitions. One way to tackle this problem is to use group communication primitives that encapsulate order requirements.

While replication is a common technique used to design such reliable distributed systems, to cope with the requirements of modern cloud based “always-on'' applications, replication protocols must additionally allow for throughput scalability and dynamic reconfiguration, that is, on-demand replacement or provisioning of system resources. We propose a dynamic atomic multicast protocol which fulfills these requirements. It allows to dynamically add and remove resources to an online replicated state machine and to recover crashed processes.

Major efforts have been spent in recent years to improve the performance, scalability and reliability of distributed systems. In order to hide the complexity of designing distributed applications, many proposals provide efficient high-level communication abstractions. Since the implementation of a production-ready system based on this abstraction is still a major task, we further propose to expose our protocol to developers in the form of distributed data structures. B-trees for example, are commonly used in different kinds of applications, including database indexes or file systems. Providing a distributed, fault-tolerant and scalable data structure would help developers to integrate their applications in a distribution transparent manner.

This work describes how to build reliable and scalable distributed systems based on atomic multicast and demonstrates their capabilities by an implementation of a distributed ordered map that supports dynamic re-partitioning and fast recovery. To substantiate our claim, we ported an existing SQL database atop of our distributed lock-free data structure.

Dissertation Committee:

  • Prof. Fernando Pedone, Università della Svizzera italiana, Switzerland (Research Advisor)
  • Prof. Antonio Carzaniga, Università della Svizzera italiana, Switzerland (Internal Member)
  • Prof. Robert Soulé, Università della Svizzera italiana, Switzerland (Internal Member)
  • Prof. Alysson Bessani, University of Lisbon, Portugal (External Member)
  • Prof. Benoît Garbinato, University of Lausanne, Switzerland (External Member)