Language-Agnostic Integrated Queries in a Polyglot Language Runtime System

Decanato - Facoltà di scienze informatiche

Data: 14 Settembre 2022 / 13:30 - 16:00

USI East Campus, Room C1.03 & MS Teams

You are cordially invited to attend the PhD Dissertation Defence of Filippo Schiavio on Wednesday 14 September 2022 at 13:30 in room C1.03. You can also join online at this link

Abstract:
Language-integrated query (LINQ) frameworks offer a convenient programming abstraction for processing in-memory collections of data, allowing developers to concisely express declarative queries using general-purpose programming languages. Existing LINQ frameworks rely on the type system of statically typed languages such as C# or Java to perform query compilation and execution. As a consequence of this design, they do not support dynamically typed languages such as Python, R, or JavaScript. Such languages are however very popular among data scientists, who would certainly benefit from LINQ frameworks in data-analytics applications. In this dissertation, we propose a new approach to query execution based on query interpretation and just-in-time compilation. We introduce DynQ, a novel query engine which bridges the gap between dynamically typed languages and LINQ frameworks by leveraging just-in-time compilation. From the user prospective, DynQ is a data-processing library which offers SQL and a fluent API as query languages. Internally, DynQ is language-agnostic, since, by leveraging a polyglot language runtime, it brings the LINQ features to multiple languages without requiring one to implement query operators in multiple languages. Moreover, DynQ can execute queries combining data from multiple sources, namely in-memory object collections as well as on-file data and external database systems. DynQ offers efficient query execution for different kinds of workloads by implementing a hybrid interpreted-compiled execution model. Our approach allows executing queries on small datasets through interpretation, without incurring the overhead of query compilation. On the other hand, DynQ leverages just-in-time compilation to speed up the execution of long-running queries. Moreover, DynQ implements reusable compiled queries, an efficient code cache which allows reusing the same dynamically compiled code for multiple related queries. In this way, DynQ can optimize high-throughput workloads based on a fluent API, i.e., applications which make use of data-processing libraries mostly for executing many queries on small datasets, such as e.g. in micro-services, as well as applications which make use of data-processing libraries to perform repetitive queries. Our evaluation of DynQ shows performance comparable with equivalent hand-optimized code, and in line with common data-processing libraries and embedded databases, making DynQ an appealing query engine for standalone analytics applications and for data-intensive server-side workloads. Moreover, thanks to reusable compiled queries, DynQ can also speed up applications that heavily use data-processing libraries on small datasets via a fluent API.

Dissertation Committee:
- Prof. Walter Binder, Università della Svizzera italiana, Switzerland (Research Advisor)
- Prof. Matthias Hauswirth, Università della Svizzera italiana, Switzerland (Internal Member)
- Prof. Cesare Pautasso, Università della Svizzera italiana, Switzerland (Internal Member)
- Prof. Shigeru Chiba, The University of Tokyo, Japan (External Member)
- Prof. Hidehiko Masuhara, Tokyo Institute of Technology, Japan (External Member)
- Prof. Heiko Schuldt, University of Basel, Switzerland (External Member)