Antonio Carzaniga

Publications - Abstracts

Forwarding and Routing with Packet Subscriptions

T. Jepsen, A. Fattaholmanan, M. Moshref, N. Foster, A. Carzaniga, and R. Soulé

In this paper, we explore how programmable data planes can naturally provide a higher-level of service to user applications via a new abstraction called packet subscriptions. Packet subscriptions generalize forwarding rules, and can be used to express both traditional routing and more esoteric, content-based approaches. We present strategies for routing with packet subscriptions in which a centralized controller has a global view of the network, and the network topology has a hierarchical or general structure. We also describe a compiler for packet subscriptions that uses a novel BDD-based algorithm to efficiently translate predicates into P4 tables that can support O(100K) expressions. Using our system, we have built eight diverse applications. We show that these applications can be deployed in brownfield networks while performing line-rate message processing, using the full switch bandwidth of 6.5Tbps.


MeMo: Automatically identifying metamorphic relations in Javadoc comments for test automation

A. Blasi, A. Gorla, M.D. Ernst, M. Pezzè, and A. Carzaniga

Software testing depends on effective oracles. Implicit oracles, such as checks for program crashes, are widely applicable but narrow in scope. Oracles based on formal specifications can reveal application-specific failures, but specifications are expensive to obtain and maintain. Metamorphic oracles are somewhere in-between. They test equivalence among different procedures to detect semantic failures. Until now, the identification of metamorphic relations has been a manual and expensive process, except for few specific domains where automation is possible. We present MeMo, a technique and a tool to automatically derive metamorphic equivalence relations from natural language documentation, and we use such metamorphic relations as oracles in automatically generated test cases. Our experimental evaluation demonstrates that 1) MeMo can effectively and precisely infer equivalence metamorphic relations, 2) MeMo complements existing state-of-the-art techniques that are based on dynamic program analysis, and 3) metamorphic relations discovered with MeMo effectively detect defects when used as test oracles in automatically-generated or manually-written test cases.


P4 Weaver: Supporting Modular and Incremental Programming in P4

A. Fattaholmanan, A. Carzaniga, M. Baldi, and R. Soulé

In this paper, we introduce P4 Weaver as an approach towards bringing modularity into the P4 language. P4 Weaver is designed to merge new data plane features into a base program in a principled and controlled way, so as to preserve the reliability of the switch. We also present an architecture for an integrated development environment that supports modular P4 programming while also safeguarding the intellectual property of the vendor code. We demonstrate the utility of P4 Weaver by adding three popular but non-trivial protocols to a P4 switch. We show that modularity is indeed beneficial and that P4 Weaver supports modularity efficiently and reliably.


Forwarding and Routing with Packet Subscriptions

T. Jepsen, A. Fattaholmanan, M. Moshref, N. Foster, A. Carzaniga, and R. Soulé

In this paper, we explore how programmable data planes can naturally provide a higher-level of service to user applications via a new abstraction called packet subscriptions. Packet subscriptions generalize forwarding rules, and can be used to express both traditional routing and more esoteric, content-based approaches. We present strategies for routing with packet subscriptions in which a centralized controller has a global view of the network, and the network topology is organized as a hierarchical structure. We also describe a compiler for packet subscriptions that uses a novel BDD-based algorithm to efficiently translate predicates into P4 tables that can support O(100K) expressions. Using our system, we have built three diverse applications. We show that these applications can be deployed in brownfield networks while performing line-rate message processing, using the full switch bandwidth of 6.5Tbps.


Analyzing System Performance with Probabilistic Performance Annotations

D. Rogora, A. Carzaniga, A. Diwan, M. Hauswirth, R. Soulé

To understand, debug, and predict the performance of complex software systems, we develop the concept ofprobabilistic performance annotations. In essence, we annotate components (e.g., methods) with a relation between a measurableperformance metric, such as running time, and one or morefeatures of the input or the state of that component. We use two forms of regression analysis: regression trees and mixture models. Such relations can capture non-trivial behaviorsbeyond the more classic algorithmic complexity of a component. We present a method to derive such annotations automatically by generalizing observed measurements. We illustrate the use of our approach on three complex systems—theownCloud distributed storage service; the MySQL databasesystem; and the x264 video encoder library and application -- producing non-trivial characterizations of the performance. Notably, we isolate a performance regression and identify the root cause of a second performance bug in MySQL


Packet Subscriptions for Programmable ASICs

T. Jepsen, M. Moshref, A. Carzaniga, N. Foster, and R. Soulé

In this paper, we explore how programmable data planes can provide a higher-level of service to user applications via a new abstraction called packet subscriptions. Packet subscriptions generalize forwarding rules, and can be used to express both traditional routing and more esoteric, content-based approaches. We describe a compiler for packet subscriptions that uses a novel BDD-based algorithm to efficiently translate predicates into P4 tables that can support O(100K) expressions. Using our compiler, we've built a proof-of-concept pub/sub financial application for splitting market feeds (e.g., Nasdaq's ITCH protocol) with line-rate message processing, using the full switch bandwidth of 6.5Tbps.


Life in the Fast Lane: A Line-Rate Linear Road

T. Jepsen, M. Moshref, A. Carzaniga, N. Foster, and R. Soulé

This paper explores the question: what abstractions are needed to support a more general form of stateful processing in programmable forwarding planes? It argues that we should look for clues from the domain of stream processing. As a case study, it describes an implementation of the Linear Road benchmark for stream processing systems written in P4. The artifact of our implementation, which runs on a programmable ASIC, provides a version of the benchmark that far exceeds the throughput of any prior work. More importantly, the experience provides perspective on the challenges for implementing stateful abstractions in P4.


Performance Annotations for Cloud Computing

D. Rogora, S. Smolka, A. Carzaniga, A. Diwan, and R. Soulé

Web services and applications are complex systems. Layers of abstraction and virtualization allow flexible and scalable deployment. But they also introduce complications if one wants predictable performance and easy trouble-shooting. We propose to support the designers, testers, and maintainers of such systems by annotating system components with performance models. Our goal is to formulate annotations that can be used as oracles in performance testing, that can provide valuable guidance for debugging, and that can also inform designers by predicting the performance profile of an assembly of annotated components. We present an initial formulation of such annotations together with their concrete derivation from the execution of a complex web service.


High-Throughput Subset Matching on Commodity GPU-Based Systems

D. Rogora, M. Papalini, K. Khazaei, A. Margara, A. Carzaniga, and G. Cugola

Large-scale information processing often relies on subset matching for data classification and routing. Examples are publish/subscribe and stream processing systems, database systems, social media, and information-centric networking. For instance, an advanced Twitter-like messaging service where users might follow specific publishers as well as specific topics encoded as tag sets must join a stream of published messages with the users and their preferred tag sets so that the user tag set is a subset of the message tags.

Subset matching is an old but also notoriously difficult problem. We present TagMatch, a system that solves this problem by taking advantage of a hybrid CPU/GPU stream processing architecture. TagMatch targets large-scale applications with thousands of matching operations per seconds against hundreds of millions of tag sets. We evaluate TagMatch on an advanced message streaming application, with very positive results both in absolute terms and in comparison with existing systems. As a notable example, our experiments demonstrate that TagMatch running on a single, commodity machine with two GPUs can easily sustain the traffic throughput of Twitter even augmented with expressive tag-based selection.


High Throughput Forwarding for ICN with Descriptors and Locators

M. Papalini, K. Khazaei, A. Carzaniga, and D. Rogora

Application-defined and location-independent addressing is a founding principle of information centric networking (ICN) that is inherently difficult to realize if one also wants scalable routing and forwarding. We propose an ICN architecture, called TagNet, intended to combine expressive application-defined addressing with scalable routing and forwarding. TagNet features two independent delivery services: one with application-defined and possibly location-independent content descriptors, and one with network-defined host locators. In this paper we develop and evaluate specialized forwarding algorithms for TagNet. We then implement and combine these algorithms in a forwarding engine built on a general-purpose commodity CPU, and show experimentally that, thanks to the dual addressing, by descriptor or by locator, this engine can achieve a throughput of over 20Gbps with large forwarding tables corresponding to hundreds of millions of users.


Automatic Workarounds: Exploiting the Intrinsic Redundancy of Web Applications

A. Carzaniga, A. Gorla, N. Perino, and M. Pezzè

Despite the best intentions, the competence, and the rigorous methods of designers and developers, software is often delivered and deployed with faults. To cope with imperfect software, researchers have proposed the concept of self-healing for software systems. The ambitious goal is to create software systems capable of detecting and responding “autonomically” to functional failures, or perhaps even preempting such failures, to maintain a correct functionality, possibly with acceptable degradation. We believe that self-healing can only be an expression of some form of redundancy, meaning that, to automatically fix a faulty behavior, the correct behavior must be already present somewhere, in some form, within the software system either explicitly or implicitly. One approach is to deliberately design and develop redundant systems, and in fact this kind of deliberate redundancy is the essential ingredient of many fault tolerance techniques. However, this type of redundancy is also generally expensive and does not always satisfy the time and cost constraints of many software projects.

With this article we take a different approach. We observe that modern software systems naturally acquire another type of redundancy that is not introduced deliberately but rather arises intrinsically as a by-product of modern modular software design. We formulate this notion of intrinsic redundancy and we propose a technique to exploit it to achieve some level of self-healing. We first demonstrate that software systems are indeed intrinsically redundant. Then we develop a way to express and exploit this redundancy to tolerate faults with automatic workarounds. In essence, a workaround amounts to replacing some failing operations with alternative operations that are semantically equivalent in their intended effect, but that execute different code and ultimately avoid the failure. The technique we propose finds such workarounds automatically. We develop this technique in the context of Web applications. In particular, we implement this technique within a browser extension, which we then use in an evaluation with several known faults and failures of three popular Web libraries. The evaluation demonstrates that automatic workarounds are effective: out of the nearly 150 real faults we analyzed, 100 could be overcome with automatic workarounds, and half of these workarounds found automatically were not publicly known before.


Measuring Software Redundancy

A. Carzaniga, A. Mattavelli, and M. Pezzè

Redundancy is the presence of different elements with the same functionality. In software, redundancy is useful (and used) in many ways, for example for fault tolerance and reliability engineering, and in self-adaptive and self-checking programs. However, despite the many uses, we still do not know how to \emph{measure} software redundancy to support a proper and effective design. If, for instance, the goal is to improve reliability, one might want to measure the redundancy of a solution to then estimate the reliability gained with that solution. Or one might compare alternative solutions to choose the one that expresses more redundancy and therefore, presumably, more reliability.

We first formalize a notion of redundancy whereby two code fragments are considered redundant when they achieve the same functionality with different executions. On the basis of this abstract and general notion, we then develop a concrete method to obtain a meaningful quantitative measure of software redundancy. The results we obtain are very positive: we show, through an extensive experimental analysis, that it is possible to distinguish code that is only minimally different, from truly redundant code, and that it is even possible to distinguish low-level code redundancy from high-level algorithmic redundancy. We also show that the measurement is significant and useful for the designer, as it can help predict the effectiveness of techniques that exploit redundancy.


Measuring the Mixing Time of a Network

X. Foukas, A. Carzaniga, and A.L. Wolf

Mixing time is a global property of a network that indicates how fast a random walk gains independence from its starting point. Mixing time is an essential parameter for many distributed algorithms, but especially those based on gossip. We design, implement, and evaluate a distributed protocol to measure mixing time. The protocol extends an existing algorithm that models the diffusion of information seen from each node in the network as the impulse response of a particular dynamic system. In its original formulation, the algorithm was susceptible to topology changes (or "churn") and was evaluated only in simulation. Here we present a concrete implementation of an enhanced version of the algorithm that exploits multiple parallel runs to obtain a robust measurement, and evaluate it using a network testbed (Emulab) in combination with a peer-to-peer system (FreePastry) to assess both its performance and its ability to deal with network churn.


End-to-End Congestion Control for Content-Based Networks

A. Malekpour, A. Carzaniga, and F. Pedone

Publish/subscribe or "push" communication has been proposed as a new network service. In particular, in a content-based network, messages sent by publishers are delivered to subscribers based on the message content and on subscribers’ long-term interests (subscriptions). In most systems that implement this form of communication, messages are treated as datagrams transmitted without end-to-end or in-network acknowledgments or without any form of flow control. In such systems, publishers do not avoid or even detect congestion, and brokers/routers respond to congestion by simply dropping overflowing messages. These systems are therefore unable to provide fair resource allocation and to properly handle traffic anomalies, and therefore are not suitable for large-scale deployments. With this motivation, we propose an end-to-end congestion control for content-based networks. In particular, we propose a practical and effective congestion-control protocol that is also content-aware, meaning that it modulates specific content-based traffic flows along a congested path. Inspired by an existing rate-control scheme for IP multicast, this protocol uses an equation-based flow-control algorithm that reacts to congestion in a manner similar to and compatible with TCP. We demonstrate experimentally that the protocol improves fairness among concurrent data flows and also reduces message loss significantly.


Scalable Routing for Tag-Based Information-Centric Networking

M. Papalini, A. Carzaniga, K. Khazaei, and A.L. Wolf

Routing in information-centric networking remains an open problem. The main issue is scalability. Traditional IP routing can be used with name prefixes, but it is believed that the number of prefixes will grow too large. A related problem is the use of per-packet in-network state (to cut loops and return data to consumers). We develop a routing scheme that solves these problems. The service model of our information-centric network supports information pull and push using tag sets as information descriptors. Within this service model, we propose a routing scheme that supports forwarding along multiple loop-free paths, aggregates addresses for scalability, does not require per-packet network state, and leads to near-optimal paths on average. We evaluate the scalability of our routing scheme, both in terms of memory and computational complexity, on the full Internet AS-level topology and on the internal networks of representative ASes using realistic distributions of content and users extrapolated from traces of popular applications. For example, a population of 500 million users requires a routing information base of 3.8GB with an almost flat growth and, in this case, a routing update (one content descriptor) can be processed in 2ms on commodity hardware. We conclude that information-centric networking is feasible, even with (or perhaps thanks to) addresses consisting of expressive content descriptors.


Scalable Routing for Tag-Based Information-Centric Networking

M. Papalini, K. Khazaei, A. Carzaniga, and A.L. Wolf

Routing in information-centric networking remains an open problem. The main issue is scalability. Traditional IP routing can be used with name prefixes, but it is believed that the number of prefixes will grow too large. A related problem is the use of per-packet in-network state (to cut loops and return data to consumers). We develop a routing scheme that solves these problems. The service model of our information-centric network supports information pull and push using tag sets as information descriptors. Within this service model, we propose a routing scheme that supports forwarding along multiple loop-free paths, aggregates addresses for scalability, does not require per-packet network state, and leads to near-optimal paths on average. We evaluate the scalability of our routing scheme, both in terms of memory and computational complexity, on the full Internet AS-level topology and on the internal networks of representative ASes using realistic distributions of content and users extrapolated from traces of popular applications. For example, a population of 500 million users requires a routing information base of 3.8GB with an almost flat growth and, in this case, a routing update (one content descriptor) can be processed in less than 5ms on commodity hardware. We conclude that information-centric networking is feasible, even with (or perhaps thanks to) addresses consisting of expressive content descriptors.


Cross-Checking Oracles from Intrinsic Software Redundancy

A. Carzaniga, A. Goffi, A. Gorla, A. Mattavelli, and M. Pezzè

Despite the recent advances in automatic test generation, testers must still write test oracles manually. If formal specifications are available, it might be possible to use decision procedures derived from those specifications. We present a technique that is based on a form of specification but also leverages more information from the system under test. We assume that the system under test is somewhat redundant, in the sense that some operations are designed to behave like others but their executions are different. Our experience in this and previous work indicates that this redundancy exists and is easily documented. We then generate oracles by cross-checking the execution of a test with the same test in which we replace some operations with redundant ones. We develop this notion of cross-checking oracles into a generic technique to automatically insert oracles into unit tests. An experimental evaluation shows that cross-checking oracles, used in combination with automatic test generation techniques, can be very effective in revealing faults, and that they can even improve good hand-written test suites.


Is Information-Centric Multi-Tree Routing Feasible?

A. Carzaniga, K. Khazaei, M. Papalini, and A.L. Wolf

We have argued that an information-centric network should natively support publish/subscribe event notification in addition to on-demand content delivery. We have also argued that both primitives could use the same forwarding information base and, furthermore, that both primitives can easily support addresses that are more expressive than simple hierarchical names. In this paper we present a concrete routing scheme that realizes this: "push" as well as "pull" communication; anycast as well as multicast; and descriptor-based (as opposed to name-based) addressing. The scheme is founded on multiple tree covers that can be arranged and composed hierarchically following the structure of network domains. On each tree, the scheme combines addresses so as to reduce forwarding state. We demonstrate the feasibility and scalability of the scheme through simulations on Internet-scale workloads in realistic network settings.


Automatic Recovery from Runtime Failures

A. Carzaniga, A. Gorla, A. Mattavelli, N. Perino, and M. Pezzè

We present a technique to make applications resilient to failures. This technique is intended to maintain a faulty application functional in the field while the developers work on permanent and radical fixes. We target field failures in applications built on reusable components. In particular, the technique exploits the intrinsic redundancy of those components by identifying workarounds consisting of alternative uses of the faulty components that avoid the failure. The technique is currently implemented for Java applications but makes little or no assumptions about the nature of the application, and works without interrupting the execution flow of the application and without restarting its components. We demonstrate and evaluate this technique on four mid-size applications and two popular libraries of reusable components affected by real and seeded faults. In these cases the technique is effective, maintaining the application fully functional with between 19% and 48% of the failure-causing faults, depending on the application. The experiments also show that the technique incurs an acceptable runtime overhead in all cases.


Oblivious Low-Congestion Multicast Routing in Wireless Networks

A. Carzaniga, K. Khazaei, and F. Kuhn

We propose a routing scheme to implement multicast communication in wireless networks. The scheme is oblivious, compact, and completely decentralized. It is intended to support dynamic and diverse multicast requests typical of, for example, publish/subscribe and content-based communication. The scheme is built on top of a geographical routing layer. Each message is transmitted along the geometric minimum spanning tree that connects the source and all the destinations. Then, for each edge in this tree, the scheme routes a message through a random intermediate node, chosen independently of the set of multicast requests. The intermediate node is chosen in the vicinity of the corresponding edge such that congestion is reduced without stretching the routes by more than a constant factor. We first evaluate the scheme analytically, showing that it achieves a theoretically optimal level of congestion. We then evaluate the scheme in simulation, showing that its performance is also good in practice.


Fully Decentralized Estimation of Some Global Properties of a Network

A. Carzaniga, C. Hall, and M. Papalini

It is often beneficial to architect networks and overlays as fully decentralized systems, in the sense that any computation (e.g., routing or search) would only use local information, and no single node would have a complete view or control over the whole network. Yet sometimes it also important to compute global properties of the network. In this paper we propose a fully decentralized algorithm to compute some global properties that can be derived from the spectrum of the network. More specifically, we compute the most significant eigenvalues of a descriptive matrix closely related to the adjacency matrix of the network graph. Such spectral properties can then lead to, for example, the "mixing time" of a network, which can be used to parametrize random walks and related search algorithms typical of peer-to-peer networks. Our key insight is to view the network as a linear dynamic system whose impulse response can be computed efficiently and locally by each node. We then use this impulse response to identify the spectral properties of the network. This algorithm is completely decentralized and requires only minimal local state and local communication. We show experimentally that the algorithm works well on different kinds of networks and in the presence of network instability.


A Content-Based Publish/Subscribe Matching Algorithm for 2D Spatial Objects

T. Konstantinidis, A. Carzaniga, and A.L. Wolf

An important concern in the design of a publish/subscribe system is its expressiveness, which is the ability to represent various types of information in publications and to precisely select information of interest through subscriptions. We present an enhancement to existing content-based publish/subscribe systems with support for a 2D spatial data type and eight associated relational operators, including those to reveal overlap, containment, touching, and disjointedness between regions of irregular shape. We describe an algorithm for evaluating spatial relations that is founded on a new dynamic discretization method and region-intersection model. In order to make the data type practical for large-scale applications, we provide an indexing structure for accessing spatial constraints and develop a simplification method for eliminating redundant constraints. Finally, we present the results of experiments evaluating the effectiveness and scalability of our approach.


Probabilistic FIFO Ordering in Publish/Subscribe Networks

A. Malekpour, A. Carzaniga, F. Pedone, and G. Toffetti Carughi

In a best-effort publish/subscribe network, publications may be delivered out of order (e.g., violating FIFO order). We contend that the primary cause of such ordering violations is the parallel matching and forwarding process employed by brokers to achieve high throughput. In this paper, we present an end-to-end method to improve event ordering. The method involves the receiver and minimally the sender, but otherwise uses the broker network as a black box. The idea is to analyze the dynamics of the network, and in particular to measure the delivery delay and its variation, which is directly related to out-of-order delivery. With these measures, receivers can determine a near-optimal latch time to defer message delivery upon the detection of a hole in the message sequence number. We evaluate the performance of this ordering scheme empirically in terms of the reduction in out-of-order deliveries, the delay imposed by the latch time, and its automatic adaptability to variable network conditions and input loads.


Content-Based Publish/Subscribe Networking and Information-Centric Networking

A. Carzaniga, M. Papalini, and A.L. Wolf

On-line information comes in different forms and is accessed in different ways and for different purposes. For example, a recording of Beethoven’s Ninth Symphony differs from a storm warning from the local weather service. Beethoven’s Ninth is a large media file with perpetual validity that is typically accessed on demand by users. By contrast, a storm warning is a small ephemeral message typically pushed by the weather service to all users in a specific geographic area. We argue that both should and would be well supported by an information-centric network. More specifically we argue three points. First, modern applications, reflecting the nature of human communications, use and transmit large and long-lived files as well as small ephemeral messages. Second, accessing those two types of information involves significantly different operations within the network. Third, despite their differences, both types of information would benefit from an addressing scheme based on content rather than on more or less flat identifiers, which means that both should be integrated to some extent within a unified content-based routing infrastructure.


End-to-End Reliability for Best-Effort Content-Based Publish/Subscribe Networks

A. Malekpour, A. Carzaniga, F. Pedone, and G. Toffetti Carughi

When it comes to reliability, there are two main categories of distributed publish/subscribe systems: reliable systems and best-effort systems. The former gives the highest priority to guaranteed and ordered delivery while the latter aims for high throughput and low delays. We propose a method to improve the delivery guarantees of the basic unreliable service offered by a best-effort publish/subscribe system. This method does not require any modification to the system's protocols or broker software, and instead simply uses the system's publish/subscribe API. The method is based on a technique, similar to reliable multicast, that enables subscribers to cooperatively recover lost messages. We experimentally demonstrate the effectiveness and performance of our recovery scheme in the presence of frequent message losses, and show that it enables subscribers to recover more than 70\% of lost messages with minimum negative effects on the overall network performance.


Automatic Workarounds for Web Applications

A. Carzaniga, A. Gorla, N. Perino, and M. Pezzè

We present a technique that finds and executes workarounds for faulty Web applications automatically and at runtime. Automatic workarounds exploit the inherent redundancy of Web applications, whereby a functionality of the application can be obtained through different sequences of invocations of Web APIs. In general, runtime workarounds are applied in response to a failure, and require that the application remain in a consistent state before and after the execution of a workaround. Therefore, they are ideally suited for interactive Web applications, since those allow the user to act as a failure detector with minimal effort, and also either use read-only state or manage their state through a transactional data store. In this paper we focus on faults found in the access libraries of widely used Web applications such as Google Maps. We start by classifying a number of reported faults of the Google Maps and YouTube APIs that have known workarounds. From those we derive a number of general and API-specific program-rewriting rules, which we then apply to other faults for which no workaround is known. Our experiments show that workarounds can be readily deployed within Web applications, through a simple client-side plug-in, and that program-rewriting rules derived from elementary properties of a common library can be effective in finding valid and previously unknown workarounds.


Handling Software Faults with Redundancy

A. Carzaniga, A. Gorla, and M. Pezzè

Software engineering methods can increase the dependability of software systems, and yet some faults escape even the most rigorous and methodical development process. Therefore, to guarantee high levels of reliability in the presence of faults, software systems must be designed to reduce the impact of the failures caused by such faults, for example by deploying techniques to detect and compensate for erroneous runtime conditions. In this chapter, we focus on software techniques to handle software faults, and we survey several such techniques developed in the area of fault tolerance and more recently in the area of autonomic computing. Since practically all techniques exploit some form of redundancy, we consider the impact of redundancy on the software architecture, and we propose a taxonomy centered on the nature and use of redundancy in software systems. The primary utility of this taxonomy is to classify and compare techniques to handle software faults.


Practical High-Throughput Content-Based Routing Using Unicast State and Probabilistic Encodings

A. Carzaniga, C.P. Hall, G. Toffetti Carughi, and A.L. Wolf

We address the problem that existing publish/subscribe messaging systems, including such commonly used ones as Apache’s ActiveMQ and IBM’s WebSphere MQ, exhibit degraded end-to-end throughput performance in a wide-area network setting. We contend that the cause of this problem is the lack of an appropriate routing protocol. Building on the idea of a content-based network, we introduce a protocol called B-DRP that can demonstrably improve the situation. A content-based network is a content-based publish/subscribe system architected as a datagram network: a message is forwarded hop-by-hop and delivered to any and all hosts that have expressed interest in the message content. This fits well with the character of a wide-area messaging system. B-DRP is based on two main techniques: a message delivery mechanism that utilizes and exploits unicast forwarding state, which can be easily maintained using standard protocols, and a probabilistic data structure to efficiently represent and evaluate receiver interests. We present the design of B-DRP and the results of an experimental evaluation that demonstrates its support for improved throughput in a wide-area setting.


Uniform Sampling for Directed P2P Networks

C. Hall and A. Carzaniga

Selecting a random peer with uniform probability across a peer-to-peer (P2P) network is a fundamental function for unstructured search, data replication, and monitoring algorithms. Such uniform sampling is supported by several techniques. However, current techniques suffer from sample bias and limited applicability. In this paper, we present a sampling algorithm that achieves a desired uniformity while making essentially no assumptions about the underlying P2P network. This algorithm, called doubly stochastic converge (DSC), iteratively adjusts the probabilities of crossing each link in the network during a random walk, such that the resulting transition matrix is doubly stochastic. DSC is fully decentralized and is designed to work on both directed and undirected topologies, making it suitable for virtually any P2P network. Our simulations show that DSC converges quickly on a wide variety of topologies, and that the random walks needed for sampling are short for most topologies. In simulation studies with FreePastry, we show that DSC is resilient to high levels of churn, while incurring a minimal sample bias.


Toward Deeply Adaptive Societies of Digital Systems

A. Carzaniga, G. Denaro, M. Pezze, J. Estublier, and A.L. Wolf

Modern societies are pervaded by computerized, heterogeneous devices designed for specific purposes, but also more and more often capable of interacting with other devices for entirely different purposes. For example, a cell phone could be used to purchase a train ticket on-line that could later be printed by a vending machine at the train station. This type of open environment is what we call a society of digital systems. In this paper, we outline the characteristics of societies of digital systems, and argue that they call for a new approach to cope with unforeseen interactions, possible incompatibilities, failures, and emergent behaviors. We argue that designers can not assume a closed or homogeneous world, and must instead naturally accommodate dynamic adaptations. Furthermore, self-adaptability, that is, the ability to adapt autonomically to a changing environment, also poses problems, as different adaptation strategies may interfere negatively, leading to unstable behaviors. As an initial concrete contribution to solve this problem, we propose a method to support the graceful integration of devices and software systems in an open environment. The method uses management information, and is specifically centered on the idea of expressing self-adaptation operations as change sets over the management information base.


Doubly Stochastic Converge: Uniform Sampling for Directed P2P Networks

C. Hall and A. Carzaniga

Uniformly sampling nodes from deployed peer-to-peer (P2P) networks has proven to be a difficult problem, as current techniques suffer from sample bias and limited applicability. A sampling service which randomly samples nodes from a uniform distribution across all members of a network offers a platform on which it is easy to construct unstructured search, data replication, and monitoring algorithms. We present an algorithm which allows for uniform random sampling, by the use of biased random walks, over the peers of any P2P network. Our algorithm, called doubly stochastic converge (DSC), iteratively adjusts the probabilities of crossing each link in the network during a random walk, such that the resulting transition matrix is doubly stochastic. DSC is fully decentralized and is designed to work on physically directed networks, allowing it to work on any P2P topology. Our simulations show that DSC converges quickly on a wide variety of topologies, and that the random walks needed for sampling are short for most topologies. In simulation studies with FreePastry, we show that DSC is resilient to extremely high levels of churn, while incurring a minimal sample bias.


Healing Web applications through automatic workarounds

A. Carzaniga, A. Gorla, and M. Pezzè

We develop the notion of automatic workaround in the context of Web applications. A workaround is a sequence of operations, applied to a failing component, that is equivalent to the failing sequence in terms of its intended effect, but that does not result in a failure. We argue that workarounds exist in modular systems because components often offer redundant interfaces and implementations, which in turn admit several equivalent sequences of operations. In this paper, we focus on Web applications because these are good and relevant examples of component-based (or service-oriented) applications. Web applications also have attractive technical properties that make them particularly amenable to the deployment of automatic workarounds. We propose an architecture where a self-healing proxy applies automatic workarounds to a Web application server. We also propose a method to generate equivalent sequences and to represent and select them at run-time as automatic workarounds. We validate the proposed architecture in four case studies in which we deploy automatic workarounds to handle four known failures in to the popular Flickr and Google Maps Web applications.


Evaluating Test Suites and Adequacy Criteria Using Simulation-Based Models of Distributed Systems

M.J. Rutherford, A. Carzaniga, and A.L. Wolf

Test adequacy criteria provide the engineer with guidance on how to populate test suites. While adequacy criteria have long been a focus of research, existing testing methods do not address many of the fundamental characteristics of distributed systems, such as distribution topology, communication failure, and timing. Furthermore, they do not provide the engineer with a means to evaluate the relative effectiveness of different criteria, nor the relative effectiveness of adequate test suites satisfying a given criterion. This paper makes three contributions to the development and use of test adequacy criteria for distributed systems: (1) a testing method based on discrete-event simulations; (2) a fault-based analysis technique for evaluating test suites and adequacy criteria; and (3) a series of case studies that validate the method and technique. The testing method uses a discrete-event simulation as an operational specification of a system, in which the behavioral effects of distribution are explicitly represented. Adequacy criteria and test cases are then defined in terms of this simulation-based specification. The fault-based analysis involves mutation of the simulation-based specification to provide a foil against which test suites and the criteria that formed them can be evaluated. Three distributed systems were used to validate the method and technique, including DNS, the Domain Name System.


Four Enhancements to Automated Distributed System Experimentation Methods

Y. Wang, A. Carzaniga, and A.L. Wolf

Experimentation is an essential tool employed by the developers of software systems, especially distributed systems. In prior work we developed a model-driven framework for automating various experimentation tasks, such as workload generation, and demonstrated that it gives the engineer a cost-effective means to conduct large-scale experiments on distributed testbeds. We have enhanced the methods underlying the framework in four significant ways: (1) increasing the expressiveness of workloads by allowing for conditional and reactive behaviors; (2) supporting the repeatability of experiments through the creation of environment workloads that can control the operational context; (3) enabling the composability of application and environment workloads to obtain a broader class of experiments; and (4) extending the scope of experiment management to include control over multiple runs. We use the enhancements to conduct a series of interesting new experiments. Specifically, the enhancements allow us to manipulate a fixed-wired testbed so that it simulates a mobile-wireless environment, and to selectively and maliciously inject faults into a system.


Self-Healing by Means of Automatic Workarounds

A. Carzaniga, A. Gorla, and M. Pezzè

We propose to use automatic workarounds to achieve self-healing in software systems. We observe that software systems of significant complexity, especially those made of components, are often redundant, in the sense that the same functionality and the same state-transition can be obtained through multiple sequences of operations. This redundancy is the basis to construct effective workarounds for component failures. In particular, we assume that failures can be detected and intercepted together with a trace of the operations that lead to the failure. Given the failing sequence, the system autonomically executes one or more alternative sequences that are known to have an equivalent behavior. We argue that such workarounds can be derived with reasonable effort from many forms of specifications, that they can be effectively prioritized either statically or dynamically, and that they can be deployed at run time in a completely automated way, and therefore that they amount to a valid self-healing mechanism. We develop this notion of self-healing by detailing a method to represent, derive, and deploy workarounds. We validate our method in two case studies.


Frame Shared Memory: Line-Rate Networking on Commodity Hardware

J. Giacomoni, J.K. Bennett, A. Carzaniga, D.C. Sicker, M. Vachharajani and A.L. Wolf

Network processors provide an economical programmable platform to handle the high throughput and frame rates of modern and next-generation communication systems. However, these platforms have exchanged general-purpose capabilities for performance.

This paper presents an alternative; a software network processor (Soft-NP) framework using commodity generalpurpose platforms capable of high-rate and throughput sequential frame processing compatible with high-level languages and general-purpose operating systems. A cacheoptimized concurrent lock free queue provides the necessary low-overhead core-to-core communication for sustained sequential frame processing beyond the realized 1.41 million frames per second (Gigabit Ethernet) while permitting perframe processing time expansion with pipeline parallelism.


Is Code Still Moving Around? Looking Back at a Decade of Code Mobility

A. Carzaniga, G.P. Picco, and G. Vigna

In the mid-nineties, mobile code was on the rise and, in particular, there was a growing interest in autonomously moving code components, called mobile agents. In 1997, we published a paper that introduced the concept of mobile code paradigms, which are design patterns that involve code mobility. The paradigms highlighted the locations of code, resources, and execution as first-class abstractions. This characterization proved useful to frame mobile code designs and technologies, and also as a basis for a quantitative analysis of applications built with them. Ten years later, things have changed considerably. In this paper we present our view of how mobile code evolved and discuss which paradigms succeeded or failed in supporting effectively distributed applications.


Spinneret: A log random substrate for P2P networks

J. Rose, C. Hall, and A. Carzaniga

Until now, structured and unstructured networks have been considered in absentia of each other. We believe that next-generation P2P services will require both structured and unstructured algorithms, and that it therefore makes sense to consider a unified substrate that provides good service for both. In this paper we argue for the creation of a semi-structured overlay substrate, called Spinneret, which can serve as the base layer for a variety of structured and unstructured search algorithms. In order to validate that this structure forms a good foundation for various services, we present two algorithms simulated on top of the Spinneret substrate: an unstructured k-walker random walk search as well as a logarithmic DHT search. Further, we argue that such a substrate strikes a balance between the resilience and reliability of unstructured networks and the efficiency of structured networks.


FShm: High-Rate Frame Manipulation in Kernel and User Space

J. Giacomoni, J. Bennett, A. Carzaniga, M. Vachharajani, and A.L. Wolf

The high performance, low cost, and flexibility of commodity hardware systems make them appealing for network processing applications. However, the standard software architecture of such systems imposes significant limitations. At high rates (e.g., gigabit Ethernet) and small frame sizes (64 byte), each frame must be processed in less than 672 ns. System calls, synchronization, and memory latencies can dominate this processing time. Despite significant effort to remove this overhead, we are aware of no general-purpose mechanism that can handle this load on commodity hardware.

This paper describes the frame-shared-memory architecture (FShm), a general-purpose software architecture for processing network frames on commodity multiprocessor hardware. FShm supports kernel- and user-space processing at gigabit Ethernet rates by increasing throughput without reducing the per-frame processing time, pipelining work across multiple processors. FShm can generate, capture, and forward frames at the theoretical maximum rate on a gigabit Ethernet network for all frame sizes greater than 96 bytes, and at 95% of maximum for the 64 byte minimum frame size (the limit of the tested hardware).


Content-Based Communication: a Research Agenda

A. Carzaniga and C.P. Hall

A content-based publish/subscribe system is a message-oriented communication facility based on the idea of interest-driven routing. A message, published by the sender without a set destination, is delivered to any and all the receivers that expressed an interest in its content. We refer to this communication style and to the distributed infrastructure that realizes it as content-based communication and content-based networking, respectively. In this paper we review what we consider the foundations of content-based networking, including some of the major advances of the past years. We then present a vision for further research in this area as well as for the practical realization of a content-based network. In particular, we discuss the implications of content-based communication for the network, the middleware, and applications.


DV/DRP: A Content-Based Networking Protocol For Sensor Networks

C.P. Hall, A. Carzaniga, and A.L. Wolf

An ideal sensor network would minimize communication by routing information only to those nodes requiring the information. We are exploring the use of a content-based network for this purpose, where messages containing sensor readings and associated metadata are relayed from source nodes to destination nodes based solely on the fact that the destination nodes have expressed interest in specific message content. This paper contributes a concrete protocol, called DV/DRP, that implements content-based networking for wireless sensor networks or other similarly constrained network configurations. DV/DRP augments a basic distance vector protocol to construct both primary and alternate routes. DV/DRP also features a new content-based forwarding technique called dynamic receiver partitioning. DV/DRP makes minimal assumptions on the underlying MAC layer. Notably, it uses only a primitive local broadcast, does not require reliability of the link layer nor the use of acknowledgments, and explicitly handles asymmetric links. We present simulations showing that the protocol scales to large networks while minimizing the resource consumption of individual nodes. We also show that the protocol is robust with respect to transient and permanent node failures, as well as asymmetries in wireless links. Finally, to demonstrate that DV/DRP is suitable for memory-constrained sensor networks, we discuss a preliminary implementation.


Understanding Content-Based Routing Schemes

A. Carzaniga, A.J. Rembert, and A.L. Wolf

Content-based networking is a message-oriented communication service in which a message is delivered to all destinations that have declared a selection predicate matching the content of that message. Analogous to that of a traditional address-based network, a routing scheme in a content-based network defines the router-local matching, forwarding, and header functions that collectively realize the delivery function. Several such routing schemes have been proposed in the literature, but they have been evaluated only qualitatively or in simulation. In this paper we abstract from those previous results in an effort to place them in a general theoretical framework. This framework allows us to rigorously define notions of correctness, minimality, and complexity. In particular, we prove the correctness and characterize the complexity of two existing content-based routing schemes, propose a new latency-minimal scheme, and provide the results of a Monte Carlo simulation that serves as the basis for estimating the space requirements of any given scheme.


Simulation-Based Testing of Distributed Systems

M.J. Rutherford, A. Carzaniga, and A.L. Wolf

Developers of distributed systems routinely construct discrete-event simulations to help them understand and evaluate the behavior of inter-component protocols. Typically written using an imperative programming language, these simulations capture basic algorithmic functionality at the same time as they focus attention on properties critical to distribution, including topology, timing, bandwidth, and overall scalability. We ask the following question: Can simulations also be used to help in the testing of distributed-system implementations? Because simulations amount to specifications of intended behavior, the code of a simulation can be viewed as an operational, albeit non-traditional, formal model. We claim that this kind of model, when used within a specification-based testing regime, provides developers with the foundations of a powerful new method for selecting effective test suites. The primary tool used in our method is a fault-based analysis of the simulation code in which a set of mutants are generated using standard code-mutation techniques. The analysis can be used to rate the effectiveness of a test suite, as well as the criterion used to form it. We substantiate our claim through experiments performed on the simulations and implementations of two different distributed systems.


Automating Experimentation on Distributed Testbeds

Y. Wang, M.J. Rutherford, A. Carzaniga, and A.L. Wolf

Engineering distributed systems is a challenging activity. This is partly due to the intrinsic complexity of distributed systems, and partly due to the practical obstacles that developers face when evaluating and tuning their design and implementation decisions. This paper addresses the latter aspect, providing techniques for software engineers to automate the experimentation activity. Our approach is founded on a suite of models that characterize the distributed system under experimentation, the testbeds upon which the experiments are to be carried out, and the client behaviors that drive the experiments. The models are used by generative techniques to automate construction of the workloads, as well as construction of the scripts for deploying and executing the experiments on distributed testbeds. The framework is not targeted at a specific system or application model, but rather is a generic, programmable tool. We have validated our approach by performing experiments on a variety of distributed systems. For two of these systems, the experiments were deployed and executed on the PlanetLab wide-area testbed. Our experience shows that this framework can be readily applied to different kinds of distributed system architectures, and that using it for meaningful experimentation, especially in large-scale network environments, is advantageous.


Distributed-System Failures: Observations and Implications for Testing

M.J. Rutherford, A. Carzaniga, and A.L. Wolf

Distributed systems represent an important class of software applications whose testing has received relatively little attention from the research community. As a result, there does not currently exist a general-purpose, disciplined, and effective testing method for distributed systems. While a number of research efforts target this area, there are no broad empirical studies that can be used to ground existing techniques or inform new ones. In this paper we present the results of an empirical study of failures experienced by the users of seven open-source distributed systems. The goal of our study is to understand what patterns exist in failure scenarios to guide the definition of an improved testing method for distributed systems. At a high level, our results indicate that: there is empirical evidence to support the consideration of a new generation of testing methods that address the failures due to distribution; the configurations that cause user-reported failures are reasonably straightforward to construct; and generic failure observations (i.e., those for which reusable techniques can be developed) are strongly correlated to the distributed nature of system failures. The second two results in particular imply that there is a reasonable bound on the effort required to organize the testing activity. Overall, the study gives us some early confidence that it is both necessary and feasible to consider a testing method that targets distributed systems. The results of this study are offered as a basis for future work on the definition of such a method.


Weevil: a Tool to Automate Experimentation With Distributed Systems

Y. Wang, M.J. Rutherford, A. Carzaniga, and A.L. Wolf

Engineering distributed systems is a challenging activity. This is partly due to the intrinsic complexity of distributed systems, and partly due to the practical obstacles that developers face when evaluating and tuning their design and implementation decisions. This paper addresses the latter aspect, providing techniques for software engineers to automate two key elements of the experimentation activity: (1) workload generation and (2) experiment deployment and execution. Our approach is founded on a suite of models that characterize the client behaviors that drive the experiments, the distributed system under experimentation, and the testbeds upon which the experiments are to be carried out. The models are used by simulation-based and generative techniques to automate the construction of the workloads, as well as construction of the scripts for deploying and executing the experiments on distributed testbeds. The framework is not targeted at a specific system or application model, but rather is a generic, programmable tool. We have validated our approach on a variety of distributed systems. Our experience shows that this framework can be readily applied to different kinds of distributed system architectures, and that using it for meaningful experimentation is advantageous.


A Content-Based Networking Protocol For Sensor Networks

C.P. Hall, A. Carzaniga, J. Rose, and A.L. Wolf

An ideal sensor network would minimize communication by routing information only to those nodes requiring the information. We are exploring the use of a content-based network for this purpose, where messages containing sensor readings and associated metadata are relayed from source nodes to destination nodes based solely on the fact that the destination nodes have expressed interest in specific message content. Routing uses a distance vector protocol augmented to construct both primary and alternate routes. Forwarding uses content-based matching together with a special process called dynamic receiver partitioning. The protocol, called DV/DRP, is specifically designed for wireless sensor networks or other similarly constrained network configurations. We present simulations showing that the protocol scales to large networks while minimizing the resource consumption of individual nodes. We also show that the protocol is robust with respect to both transient and permanent node and communication failures.


A Routing Scheme for Content-Based Networking

A. Carzaniga, M.J. Rutherford, and A.L. Wolf

This paper proposes a routing scheme for content-based networking. A content-based network is a communication network that features a new advanced communication model where messages are not given explicit destination addresses, and where the destinations of a message are determined by matching the content of the message against selection predicates declared by nodes. Routing in a content-based network amounts to propagating predicates and the necessary topological information in order to maintain loop-free and possibly minimal forwarding paths for messages. The routing scheme we propose uses a combination of a traditional broadcast protocol and a content-based routing protocol. We present the combined scheme and its requirements over the broadcast protocol. We then detail the content-based routing protocol, highlighting a set of optimization heuristics. We also present the results of our evaluation, showing that this routing scheme is effective and scalable.


Design and Evaluation of a Support Service for Mobile, Wireless Publish/Subscribe Applications

M. Caporuscio, A. Carzaniga, and A.L. Wolf

This paper presents the design and evaluation of a support service for mobile, wireless clients of a distributed publish/subscribe system. A distributed publish/subscribe system is a networked communication infrastructure where messages are published by senders and then delivered to the receivers whose subscriptions match the messages. Communication therefore does not involve the use of explicit addresses, but rather emerges from the dynamic arrangement of publishers and subscribers. Such a communication mechanism is an ideal platform for a variety of internet applications, including multi-party messaging, personal information management, information sharing, on-line news distribution, service discovery, and electronic auctions. Our goal is to support such applications on mobile, wireless host devices in such a way that the applications can, if they chose, be oblivious to the mobility and intermittent connectivity of their hosts as they move from one publish/subscribe access point to another. In this paper we describe a generic, value-added service that can be used in conjunction with publish/subscribe systems to achieve these goals. We detail the implementation of the service and present the results of our evaluation of the service in both wireline and emulated wireless environments.


Forwarding in a Content-Based Network

A. Carzaniga and A.L. Wolf

This paper presents an algorithm for content-based forwarding, an essential function in content-based networking. Unlike in traditional address-based unicast or multicast networks, where messages are given explicit destination addresses, the movement of messages through a content-based network is driven by predicates applied to the content of the messages. Forwarding in such a network amounts to evaluating the predicates stored in a router's forwarding table in order to decide to which neighbor routers the message should be sent. We are interested in finding a forwarding algorithm that can make this decision as quickly as possible in situations where there are numerous, complex predicates and high volumes of messages. We present such an algorithm and give the results of studies evaluating its performance.


A Routing Scheme for Content-Based Networking

A. Carzaniga, M.J. Rutherford, and A.L. Wolf

This paper proposes a routing scheme for content-based networking. A content-based network is a communication network that features a new advanced communication model where messages are not given explicit destination addresses, and where the destinations of a message are determined by matching the content of the message against selection predicates declared by nodes. Routing in a content-based network amounts to propagating predicates and the necessary topological information in order to maintain loop-free and possibly minimal forwarding paths for messages. The routing scheme we propose uses a combination of a traditional broadcast protocol and a content-based routing protocol. We present the combined scheme and its requirements over the broadcast protocol. We then detail the content-based routing protocol, highlighting a set of optimization heuristics. We also present the results of our evaluation, showing that this routing scheme is effective and scalable.


A Lightweight Infrastructure for Reconfiguring Applications

M. Castaldi, A. Carzaniga, P. Inverardi and A. Wolf

We describe Lira, a lightweight infrastructure for managing dynamic recon guration that applies and extends the concepts of network management to component-based, distributed software systems. Lira is designed to perform both component-level recon gurations and scalable application-level recon gurations, the former through agents associated with individual components and the latter through a hierarchy of managers. Agents are programmed on a component-by-component basis to respond to recon guration requests appropriate for that component. Managers embody the logic for monitoring the state of one or more components, and for determining when and how to execute recon guration activities. A simple protocol based on SNMP is used for communication among managers and agents.


Continuous Remote Analysis for Improving Distributed Systems Performance

A. Carzaniga and A. Orso

Engineering a highly distributed system requires the ability to evaluate and optimize the protocols that control the movement and processing of information throughout the system. Because the design of such protocols is often characterized by conflicting goals and trade-offs, the designer must calibrate the parameters of the protocols, seeking the best balance of performance in the most common usage scenarios. Unfortunately, fully testing these calibrations requires experiments conducted on large, expensive testbeds that are very difficult to deploy and maintain.

In this paper, we propose a new approach for the optimization of a highly distributed system's performance. The approach is based on leveraging data collected from fielded components to fine-tune the behavior of the system and its protocols. Captured data is "replayed" in simulations performed directly in the field during off-peak hours. The results of these simulations are then used to control the system directly in the field, and/or to report aggregate performance and behavior information to the system designer.


Design and Evaluation of a Support Service for Mobile, Wireless Publish/Subscribe Applications

M. Caporuscio, A. Carzaniga, and A.L. Wolf

This paper presents the design and evaluation of a support service for mobile, wireless clients of a distributed publish/subscribe system. A distributed publish/subscribe system is a networked communication infrastructure where messages are published by senders and then delivered to the receivers whose subscriptions match the messages. Communication therefore does not involve the use of explicit addresses, but rather emerges from the dynamic arrangement of publishers and subscribers. Such a communication mechanism is an ideal platform for a variety of Internet applications, including multi-party messaging, personal information management, information sharing, on-line news distribution, service discovery, and electronic auctions. Our goal is to support such applications on mobile, wireless host devices in such a way that the applications can, if they chose, be oblivious to the mobility and intermittent connectivity of their hosts as they move from one publish/subscribe access point to another. In this paper we describe a generic, value-added service that can be used in conjunction with publish/subscribe systems to achieve these goals. We detail the implementation of the service and present the results of our evaluation of the service in both wireline and emulated wireless environments.


A Benchmark Suite for Distributed Publish/Subscribe Systems

A. Carzaniga and A.L. Wolf

Building a distributed publish/subscribe infrastructure amounts to defining a service model (or interface) and providing an implementation for it. A typical distributed implementation is architected as a network of dispatcher components, each one implementing appropriate protocols and algorithms, that collectively realize the chosen service model. The service model should provide a value-added service for a wide variety of applications, while the implementation should gracefully scale up to handle an intense traffic of publications and subscriptions. We believe that the design of such service models and implementations must be guided by a systematic evaluation method, which in turns must be based on a carefully chosen benchmark suite. In this paper, we lay out a set of requirements for a benchmark suite for distributed publish/subscribe services, and we outline its primary components. The ideas proposed in this paper are based on our own experience in building and studying publish/subscribe infrastructures, and on existing evaluation frameworks.


An Experience in Evaluating Publish/Subscribe Services in a Wireless Network

M. Caporuscio, A. Carzaniga, and A.L. Wolf

As wireless technology becomes more available, developers of distributed applications are becoming more interested in how that technology affects the performance of their systems. We have developed a distributed publish/subscribe communication service initially hosted on the standard IP-wired network infrastructure, but would now like to rehost that service onto a GPRS wireless network. This paper reports on our experience in attempting to evaluate the performance of the service using an available emulation environment. Our conclusion from our experience to date is that current tools do not model the wireless network at an appropriate level of abstraction. In particular, they do not allow us to study the integration of individual publish/subscribe service-layer elements with GPRS network-layer elements, nor do they allow us to study multiple GPRS clients interacting over the network. Instead we were limited to results related to the interaction between an individual GPRS client and the GPRS network modeled as a monolith.


Reconfiguration in the Enterprise JavaBean Component Model

M.J. Rutherford, K. Anderson, A Carzaniga, D. Heimbigner, and A.L. Wolf

Reconfiguration is the process of applying planned changes to the communication, interconnection, componentization, or functionality of a deployed system. It is a powerful tool for achieving a variety of desirable properties of large-scale, distributed systems, including evolvability, adaptability, survivability, and continuous availability. Current approaches to reconfiguration are inadequate: some allow one to describe a system s range of configurations for a relatively broad class of system architectures, but do not provide a mechanism for actually carrying out a reconfiguration; others provide a mechanism for carrying out certain kinds of limited reconfigurations, but assume a specialized system architecture in order to do so. This paper describes our attempt at devising a reconfiguration mechanism for use with the popular and widely available Enterprise JavaBean (EJB) component container model. We describe extensions to the basic services provided by EJB to support the mechanism, a prototype implementation, and a case study of its application to a representative component-based distributed system.


The Willow Survivability Architecture

J.C. Knight, D. Heimbigner, A.L. Wolf, A. Carzaniga, and J.C. Hill

The Willow architecture provides a comprehensive architectural approach to the provision of survivability in critical information networks. It is based on the notion that survivability of a network requires reconfiguration at both the system and the application levels. The Willow notion of reconfiguration is very general, and the architecture provides reconfiguration mechanisms for both automatic and manual network control. In this paper we summarize the Willow concepts and provide an overview of the Willow architecture. Finally we describe a demonstration application system that has been built on top of a prototype Willow implementation.


The Willow Architecture: Comprehensive Survivability for Large-Scale Distributed Applications

J.C. Knight, D. Heimbigner, A.L. Wolf, A. Carzaniga, J.C. Hill, P. Devanbu, and M. Gertz

The Willow architecture is a comprehensive approach to survivability in critical distributed applications. Survivability is achieved in a deployed system using a unique combination of (a) fault avoidance by disabling vulnerable network elements intentionally when a threat is detected or predicted, (b) fault elimination by replacing system software elements when faults are discovered, and (c) fault tolerance by reconfiguring the system if non-maskable damage occurs. The key to the architecture is a powerful reconfiguration mechanism that is combined with a general control structure in which network state is sensed, analyzed, and required changes effected. The architecture can be used to deploy software functionality enhancements as well as survivability. Novel aspects include: node configuration control mechanisms; a workflow system for resolving conflicting configurations; communications based on wide-area event notification; tolerance for wide-area, hierarchic and sequential faults; secure, and secure, scaleable, and delegatable trust models.


Reconfiguration in the Enterprise JavaBean Component Model

M.J. Rutherford, K. Anderson, A. Carzaniga, D. Heimbigner, and A.L. Wolf

Reconfiguration is the process of applying planned changes to the communication, interconnection, componentization, or functionality of a deployed system. It is a powerful tool for achieving a variety of desirable properties of large-scale, distributed systems, including evolvability, adaptability, survivability, and continuous availability. Current approaches to reconfiguration are inadequate: some allow one to describe a system's range of configurations for a relatively broad class of system architectures, but do not provide a mechanism for actually carrying out a reconfiguration; others provide a mechanism for carrying out certain kinds of limited reconfigurations, but assume a specialized system architecture in order to do so. This paper describes our attempt at devising a reconfiguration mechanism for use with the popular and widely available Enterprise JavaBean (EJB) component container model. We describe extensions to the basic services provided by EJB to support the mechanism, a prototype implementation, and a case study of its application to a representative component-based distributed system.


Fast Forwarding for Content-Based Networking

A. Carzaniga and A.L. Wolf

This paper presents a new algorithm for content-based forwarding, an essential function in content-based networking. Unlike in traditional address-based unicast or multicast networks, where messages are given explicit destination addresses, the movement of messages through a content-based network is driven by predicates applied to the content of the messages. Forwarding in such a network amounts to evaluating the predicates stored in a router's forwarding table in order to decide to which neighbor router the message should be sent. We are interested in finding a forwarding algorithm that can make this decision as quickly as possible in situations where there are large numbers of predicates and high volumes of messages. We present such an algorithm and give the results of studies evaluating its performance.


A Testbed for Configuration Management Policy Programming

A. van der Hoek, A. Carzaniga, D. Heimbigner, and A.L. Wolf

Even though the number and variety of available configuration management systems has grown rapidly in the past few years, the need for new configuration management systems still remains. Driving this need are the emergence of situations requiring highly specialized solutions, the demand for management of artifacts other than traditional source code, and the exploration of wholely new research questions in configuration management. Complicating the picture is the trend toward organizational structures that involve personnel working at physically separate sites. We have developed a testbed to support the rapid development of configuration management systems. The testbed separates configuration management repositories (i.e., the stores for versions of artifacts) from configuration management policies (i.e., the procedures according to which the versions are manipulated) by providing a generic model of a distributed repository and an associated programmatic interface. Specific configuration management policies are programmed as unique extensions to the generic interface, while the underlying distributed repository is reused across different policies. In this paper, we describe the repository model and its interface, and present our experience in using a prototype of the testbed, called NUCM, to implement a variety of configuration management systems.


Security Issues and Requirements for Internet-scale Publish-Subscribe Systems

C. Wang, A. Carzaniga, D. Evans, and A.L. Wolf

Publish-subscribe is a communication paradigm that supports dynamic, many-to-many communications in a distributed environment. Content-based pub-sub systems are often implemented on a peer-to-peer infrastructure that enables information dissemination from information producers (publishers) to consumers (subscribers) through a subscription mechanism. In a wide-area pub-sub network, the pub-sub service must handle information dissemination across distinct authoritative domains, heterogeneous platforms and a large, dynamic population of publishers and subscribers. Such an environment raises serious security concerns. In this paper, we investigate the security issues and requirements that arise in an internet-scale content-based pub-sub system. We distinguish among those requirements that can be achieved with current technology and those that require innovative solutions.


Content-based Networking: A New Communication Infrastructure

A. Carzaniga and A. L. Wolf

We argue that the needs of many classes of modern applications, especially those targeted at mobile or wireless computing, demand the services of content-based publish/subscribe middleware, and that this middleware in turn demands a new kind of communication infrastructure for its proper implementation. We refer to this new communication infrastructure as content-based networking. The service model of this network must directly support the interface of an advanced content-based publish/subscribe middleware service. At the same time, the implementation must be architected as a true distributed network, providing appropriate guarantees of reliability, security, and performance. We do not propose content-based networking as a replacement for IP, nor we advocate an implementation of a publish/subscribe middleware at the network level (i.e., within routers). Instead, we argue that content-based networking must be designed according to established networking principles and techniques. To this end, in this paper, we formulate the foundational concepts of content-based networking, and relate them to the corresponding concepts in traditional networking. We also briefly review our experience with content-based publish/subscribe middleware and suggest some open research problems in the area of content-based networking.


Design and Evaluation of a Wide-Area Event Notification Service

A. Carzaniga, D. S. Rosenblum, and A. L. Wolf

The components of a loosely-coupled system are typically designed to operate by generating and responding to asynchronous events. An event notification service is an application-independent infrastructure that supports the construction of event-based systems, whereby generators of events publish event notifications to the infrastructure and consumers of events subscribe with the infrastructure to receive relevant notifications. The two primary services that should be provided to components by the infrastructure are notification selection (i.e., determining which notifications match which subscriptions) and notification delivery (i.e, routing matching notifications from publishers to subscribers). Numerous event notification services have been developed for local-area networks, generally based on a centralized server to select and deliver event notifications. Therefore, they suffer from an inherent inability to scale to wide-area networks, such as the Internet, where the number and physical distribution of the service's clients can quickly overwhelm a centralized solution. The critical challenge in the setting of a wide-area network is to maximize the expressiveness in the selection mechanism without sacrificing scalability in the delivery mechanism.

This paper presents Siena, an event notification service that we have designed and implemented to exhibit both expressiveness and scalability. We describe the service's interface to applications, the algorithms used by networks of servers to select and deliver event notifications, and the strategies used to optimize performance. We also present results of simulation studies that examine the scalability and performance of the service.


Achieving Scalability and Expressiveness in an Internet-Scale Event Notification Service

A. Carzaniga, D. S. Rosenblum, and A. L. Wolf

This paper describes the design of Siena, an Internet-scale event notification middleware service for distributed event-based applications deployed over wide-area networks. Siena is responsible for selecting the notifications that are of interest to clients (as expressed in client subscriptions) and then delivering those notifications to the clients via access points. The key design challenge for Siena is maximizing expressiveness in the selection mechanism without sacrificing scalability of the delivery mechanism. This paper focuses on those aspects of the design of Siena that fundamentally impact scalability and expressiveness. In particular, we describe Siena's data model for notifications, the covering relations that formally define the semantics of the data model, the distributed architectures we have studied for Siena's implementation, and the processing strategies we developed to exploit the covering relations for optimizing the routing of notifications.


Content-Based Addressing and Routing: A General Model and its Application

A. Carzaniga, D. S. Rosenblum, and A. L. Wolf

The designers of communication networks are being challenged by the emergence of a new class of addressing and routing scheme referred to as content-based addressing and routing. This new approach differs from traditional unicast and multicast schemes in that it performs routing based on the data being transported in a message rather than on any specialized addressing and routing information attached to, or otherwise associated with, the message. An example of an application for content-based addressing and routing is an event notification service, which is a general-purpose facility for asynchronously and implicitly conveying information from generators of events to any and all parties expressing interest in those events. In order to implement content-based addressing and routing, we can adopt well-known and successful network architectures and protocols, provided that we understand how to map the core concepts and functionalities of content-based addressing and routing onto this established infrastructure. Toward that end, we have formulated a general, yet powerful model of addressing and routing that allows us to formalize the crucial aspects of content-based addressing and routing in a surprisingly simple manner. Furthermore, it allows us to treat traditional unicast and multicast addressing and routing uniformly as instances of this more general model. This paper presents our model and demonstrates its utility by showing its application to the design of an existing event notification service.


Interfaces and Algorithms for a Wide-Area Event Notification Service

A. Carzaniga, D. S. Rosenblum, and A. L. Wolf

The components of a loosely-coupled system are typically designed to operate by generating and responding to asynchronous events. An event notification service is an application-independent infrastructure that supports the construction of event-based systems. The two primary services that should be provided to components by the infrastructure are notification selection and notification delivery. Numerous event notification services have been developed for local-area networks, generally based on a centralized server to select and deliver event notifications. Therefore, they suffer from an inherent inability to scale to wide-area networks, such as the Internet, where the number and physical distribution of the service's clients can quickly overwhelm a centralized solution. The critical challenge in the setting of a wide-area network is to maximize the expressiveness in the selection mechanism without sacrificing scalability in the delivery mechanism.

This paper presents SIENA, an event notification service that we have designed to exhibit both expressiveness and scalability. We describe the service's interface to applications, the algorithms used by networks of servers to select and deliver event notifications, and the strategies used to optimize performance. We present results of simulation experiments that examine the efficiency of the service. Finally, we describe a prototype implementation of SIENA.


A Characterization of the Software Deployment Process and a Survey of Related Technologies

A. Carzaniga

Software applications are no longer stand-alone systems. They are increasingly the result of the integration of heterogeneous collections of components, possibly distributed over a computer network. Di erent components can be provided by di erent producers and they can be part of di erent systems at the same time. Moreover, components change and evolve very rapidly, making it di cult to manage the whole system in a consistent way.
In this scenario, a crucial step of the software life cycle is deployment|that is, the activities related to the release, installation, activation, deactivation, update, and removal of components, as well as whole systems.
This paper presents a characterization of the deployment process together with a framework for evaluating technologies that are intended to address the software deployment problem. The framework highlights four primary factors that characterize the maturity of the technologies: process coverage; process changeability; interprocess coordination; and site, product, and deployment policy abstraction. A variety of existing technologies are surveyed and assessed against the proposed framework. Finally, we discuss promising research directions in software deployment.


Designing and Implementing a Distributed Versioning System

A. Carzaniga

DVS is a simple versioning system that adopts a check-in/check-out policy with exclusive locks much like the one implemented by RCS. In addition to the basic functionalities of RCS, DVS provides extensions in two main directions: distribution of artifacts and versioning of collections of artifacts.
DVS has been implemented on top of NUCM 2, a generic distributed platform aimed at providing a policy-neutral programmable interface for realizing configuration management systems. NUCM provides support for storing artifacts and collections of artifacts and their attributes in a set of distributed servers. DVS implements the locking policy and all the related higher level services including check-in and check-out of single artifacts as well as collections, managing locks, change logs, and recursive check-in and check-out.
This paper describes the main design principles underlying DVS and NUCM together with the basics issues regarding their implementation. DVS has been used and is currently being used for collaborative authoring involving several authors distributed over five different sites on the Internet. We also discuss this first experience and the feedback and validation for both DVS and NUCM.


Architectures for an Event Notification Service Scalable to Wide-area Networks

A. Carzaniga

This work is about an infrastructure for supporting event-based applications on a wide-area network.

A wide range of software systems are designed to operate in a reactive manner. In such systems, the high-level control flow is not explicitly programmed, instead it is driven by the occurrence of events. These systems realize their functionality by performing some actions in response to events, possibly using the information associated with the stimulating events. Examples of reactive systems are integrated development environments, work-flow and process analysis systems, graphical user interfaces, network management systems, software deployment systems and security monitors.

There are two major motivations for designing applications as reactive systems. First, some applications are reactive by their nature. Typically, the ones that involve the direct interaction of human agents are characterized by an asynchronous input of relatively small pieces of data. In these cases, and in the general case of "on-line" input, the concept of event is a good modeling and design abstraction. Similarly, the same abstraction is useful for those components that, although not necessarily functioning in an asynchronous way, are integrated by means of some communication mechanisms that introduce asynchronicity in their interactions. The other benefit of adopting an event-based style is that it requires only a loose coupling for the integration of heterogeneous components. Components do not need to export interfaces to be accessed by other components. Components can request some services without addressing a specific server component and, to a certain extent, components can interoperate even if they have been designed and developed separately without any mutual knowledge.

A common logical component of every event-based application is what we call an event observation and notification service, or more concisely an event service. The event service observes the occurrence of events or the occurrence of combinations of events and consequently notifies all the applications or components that have declared their interests in reacting to such events. Because the semantics of event-based applications is substantially independent of the mechanisms that are used to capture and dispatch events of interest, it is convenient to separate the event service from all the applications that make use of it. In accordance to many standardization efforts that have been proposed recently (e.g., the CORBA Event Service), and according to strategic plans and research in network technology, we envision a unified event service implemented as a common "middle-ware" to support the event-based interaction among software components.

The idea of integrating software components by means of a common event service seems to be very promising especially for those distributed applications that are deployed on a wide-area network such as the Internet. For one thing, the vast number of available information sources offers a great deal of opportunities for the development of new applications. New classes of wide-scale event-driven applications can be devised including stock market analysis tools, efficient news and mailing systems, data mining tools, and indexing tools. Also, many existing applications that are already designed to exploit event-based infrastructures can be proficiently integrated at a much higher scale thanks to the "global" connectivity provided by the network. For example, work-flow systems can be federated for companies that have multiple distributed branches or even across corporate boundaries, or else software deployment systems can connect software producers and consumers through the Internet. In general, the asynchronicity, the heterogeneity, and the high degree of loose coupling that characterize wide-area networks suggest that a wide-scale event service would be a good integration infrastructure for existing systems and for new applications.

Focus and contribution of the thesis

This work presents Siena, a project directed towards the design and implementation of a scalable general-purpose event service.

Numerous technologies that realize an event service have been developed and effectively used for quite a long time, examples are Field, SUN ToolTalk, and Yeast. However, most of them are targeted towards single computers or at most local-area networks, and it is very clear that they can not be simply "adapted" to scale to the Internet. In fact, extending the support of an event service to a wide-area network creates new challenges and trade-offs. Not only does the number of objects and events grow tremendously, but also many of the assumptions made for local-area networks, such as, low latency, abundant bandwidth, homogeneous platforms, continuous reliable connectivity, and centralized control, are no longer valid.

Some technologies address issues related to wide-area services. Among them we can find new technologies such as TIB/Rendezvous that specifically provide an event service, but also more mature technologies such as the USENET news infrastructure, IP multicasting, the Domain Name Service (DNS), that, although not explicitly targeted at this problem domain, represent potential or partial solutions to the problem of scalability. The main shortcoming of all of these technologies is that they are specific to some application domain and not flexible enough to be usable as a generic and open infrastructure for integrating event-based applications.

In summary, we see two main challenges in the area of event-based infrastructures, that Siena proposes to address:

Intuitively, a simplistic service can be implemented in a very scalable way, whereas an event service with a rich semantics poses serious scalability limitations. Thus, this thesis focuses on the trade-offs that exist between scalability and expressiveness in a distributed event service.

The contributions of this work are a formal definition of an event service that combines expressiveness with scalability together with the design and implementation of the architectures and algorithms that realize this event service as a distributed infrastructure. One obvious issue that we must face in this research is the validation and verification of the solutions that we propose. To this end, we used a simulation environment by which we performed systematic simulations of our architectures and algorithms in several network scenarios. Here we present the framework that we built and the modeling abstraction that we adopted to perform this analysis. We also discuss some initial results that clearly differentiate the Siena event service from traditional ones. Further simulations will help clarify the trade-offs and differentiators between the alternative solutions that we propose. In addition to the simulation environment, we implemented a real prototype of Siena, consisting of a server that realizes one of our distributed architectures plus a client-side interface with a mapping to the Java language. This prototype has been used in distributed settings of limited scale to support an event-based software deployment system.


A Generic, Reusable Repository for Configuration Management Policy Programming

A. van der Hoek, A. Carzaniga, D. Heimbigner, and A. L. Wolf

Distributed configuration management is intended to support the activities of projects that span multiple sites. NUCM is a testbed that we are developing to help explore the issues of distributed configuration management. NUCM separates configuration management repositories (i.e., the stores for versions of artifacts) from configuration management policies (i.e., the procedures according to which the versions are manipulated) by providing a generic model of a distributed repository and an associated programmatic interface. Specific configuration management policies are programmed as unique extensions to the generic interface, but the underlying distributed repository is reused across different policies. In this paper, we describe the repository model and its interface, discuss their implementation in NUCM, and present how NUCM has been used to implement several, rather different, configuration management policies.


Design of a Scalable Event Notification Service: Interface and Architecture

A. Carzaniga, D. S. Rosenblum, and A. L. Wolf

Event-based distributed systems are programmed to operate in response to events. An event notification service is an application-independent infrastructure that supports the construction of event-based systems. While numerous technologies have been developed for supporting event-based interactions over local-area networks, these technologies do not scale well to wide-area networks such as the Internet. Wide-area networks pose new challenges that have to be attacked with solutions that specifically address issues of scalability. This paper presents Siena, a scalable event notification service that is based on a distributed architecture of event servers. We first present a formally defined interface that is based on an extension to the publish/subscribe protocol. We then describe and compare several different server topologies and routing algorithms. We conclude by briefly discussing related work, our experience with an initial implementation of Siena, and a framework for evaluating the scalability of event notification services such as Siena.


A Characterization Framework for Software Deployment Technologies

A. Carzaniga, A. Fuggetta, R. S. Hall, A. van der Hoek, A. Carzaniga, D. Heimbigner, and A. L. Wolf

Software applications are no longer stand-alone systems. They are increasingly the result of integrating heterogeneous collections of components, both executable and data, possibly dispersed over a computer network. Different components can be provided by different producers and they can be part of different systems at the same time. Moreover, components can change rapidly and independently, making it difficult to manage the whole system in a consistent way. Under these circumstances, a crucial step of the software life cycle is deployment---that is, the activities related to the release, installation, activation, deactivation, update, and removal of components, as well as whole systems.
This paper presents a framework for characterizing technologies that are intended to support software deployment. The framework highlights four primary factors concerning the technologies: process coverage; process changeability; interprocess coordination; and site, product, and deployment policy abstraction. A variety of existing technologies are surveyed and assessed against the framework. Finally, we discuss promising research directions in software deployment.


Software Deployment: Extending Configuration Management Support into the Field

A. van der Hoek, R.S. Hall, A. Carzaniga, D. Heimbigner, and A.L. Wolf

Traditionally, configuration management has only addressed the needs of the software development process. Once a software system leaves the development environment and enters the field, however, there still is a significant role for configuration management. Activities such as release, installation, activation, update, adaptation, deactivation, and de-release constitute the deployment lifecycle; these activities need careful coordination and planning in their own right. This article discusses the dimensions of software deployment, argues why current solutions are not sufficient, and presents two research systems that specifically address software deployment.


Critical Considerations and Designs for Internet-Scale, Event-Based Compositional Architectures

D. S. Rosenblum, A. L. Wolf, and A. Carzaniga

A common architectural style for distributed, loosely-coupled, heterogeneous software systems is a structure based on event generation, observation and notification. A notable characteristic of an architecture based on events is that interaction among architectural components occurs asynchronously, thereby simplifying the composition of autonomous, independently-executing components that may be written in different programming languages and executing on varied hardware platforms.

There is increasing interest in deploying these kinds of distributed systems across wide-area networks such as the Internet. For instance, workflow systems for multi-national corporations, multi-site/multi-organization software development, and real-time investment analysis across world financial markets are just a few of the many applications that lend themselves to deployment on an Internet scale. However, deployment of such systems at the scale of the Internet imposes new challenges that are not met by existing technology.

We have been studying the problem of designing an Internet-scale event observation and notification facility that can serve as a platform for building wide-area distributed systems according to an event-based architectural style. In this paper we briefly outline our achievements to date.


Designing Distributed Applications with Mobile Code Paradigms

A. Carzaniga, G. P. Picco, and G. Vigna

Large scale distributed systems are becoming of paramount importance, due to the evolution of technology and to the interest of market. Their development, however, is not yet supported by a sound technological and methodological background, as the results developed for small size distributed systems often do not scale up. Recently, mobile code languages (MCLs) have been proposed as a technological answer to the problem. In this work, we abstract away from the details of these languages by deriving design paradigms exploiting code mobility that are independent of any particular technology. We present such design paradigms, together with a discussion of their features, their application domain, and some hints about the selection of the correct paradigm for a given distributed application.


Designing and Implementing Inter-Client Communication in the O2 Object-Oriented Database Management System

A. Carzaniga, G. P. Picco, and G. Vigna

One of the requirements for an object-oriented database to support advanced applications is a communication mechanism. The Inter-Client Communication Mechanism (ICCM) is a set of data structures and functions developed for the O2 database, which provides this kind of service. Communication is achieved through shared persistent objects, implementing the basic idea of mailbox. One to one connections are established between different processes accessing the database. Methods and data structure defined in the ICCM support connection set-up, disconnection, and all the basic data transfer facilities. In this paper, we describe the concepts of the ICCM and an overview of its implementation.

this page is maintained by Antonio Carzaniga and was updated on July 07, 2023