Scaling Strongly Consistent Replicated Systems

Staff - Faculty of Informatics

Date: 8 February 2023 / 10:30 - 12:00

Online

You are cordially invited to attend the PhD Dissertation Defence of Leandro Pacheco de Sousa on Wednesday 8 February 2023 at 10:30 on Teams.

Abstract:
With the recent rise of cloud infrastructure, geographically distributed applications have become a reality. Application developers can easily place servers close to end users, even if those users are spread around the globe. Still, designing applications that work correctly and perform well at such a large scale is a difficult endeavor, involving many tradeoffs. One such tradeoff is that between consistency and availability. While relaxing application guarantees might allow for better performance and availability, deciding which and how guarantees should be relaxed is a complex task, subject to application semantics and user expectations. On the other hand, strong consistency allows for developers to focus on core business logic and end users to interact with a more transparent system. Reliability in a distributed application is achieved through replication. Full replication, where each server stores the whole application state, does not scale, as every server needs to apply every update. By partitioning the application state and letting only a subset of servers store it, performance can scale if the workload allows. Atomic multicast is a communication abstraction that can serve as a fundamental building block for partitioned applications. It allows for requests to be reliably sent to one or more groups of destinations, and ensures a partial ordering of deliveries, a property fundamental to consistent and scalable systems. In this thesis, we focus on exploiting the atomic multicast abstraction to build strongly consistent geographically distributed applications. To that end, we first explore the design of a geographically scalable file system, GlobalFS. We then focus on the design of a latency efficient atomic multicast protocol, PrimCast. Finally, we propose linearizable atomic multicast, a stronger version of atomic multicast. GlobalFS is a POSIX-like distributed file system that provides strong consistency guarantees and scales geographically by allowing for fast local operations while still providing consistent operations over the whole system. GlobalFS builds upon atomic multicast, providing four execution modes which are then used to execute each file system operation. We describe and implement a prototype of GlobalFS which we then use to validate the approach in a global deployment on Amazon's EC2. PrimCast is an atomic multicast protocol that allows for message delivery in three communication steps at any destination. PrimCast is genuine, that is, only sender and destinations take steps to deliver a message, a critical property in large scale deployments. We present the complete algorithm and proof of correctness for PrimCast. We also show how loosely synchronized clocks can be used to reduce the convoy effect that further delays messages under high system load. We implement a prototype of PrimCast and evaluate its performance under various scenarios. Linearizable atomic multicast is an atomic multicast that provides linearizability for applications without the need for extra coordination among replicas. We show why classic atomic multicast by itself is not enough to ensure linearizability, describe a stronger ordering property that fixes the issue and show how a classic atomic multicast protocol can be modified to provide the stronger property.

Dissertation Committee:

  • Prof. Fernando Pedone, Università della Svizzera italiana, Switzerland (Research Advisor)
  • Prof. Patrick Thomas Eugster, Università della Svizzera italiana, Switzerland (Internal Member)
  • Prof. Cesare Pautasso, Università della Svizzera italiana, Switzerland (Internal Member)
  • Prof. Toni Cortes, Universitat Politècnica de Catalunya, Spain (External Member)
  • Prof. Pascal Felber, Université de Neuchâtel, Switzerland (External Member)
  • Prof. Rüdiger Kapitza, TU Braunschweig, Germany (External Member)