
From May until October 2013, I'm a Research Assistant at Oracle Labs. I'm also a visiting researcher at the Computer Laboratory at the University of Cambridge, although my next in-person visit probably won't be until October. Until recently I was a postdoctoral researcher at USI's Faculty of Informatics in Lugano, Switzerland. Contact information is at the bottom of this page.
I am a practical computer scientist with wide interests. My goal is to make it easier and cheaper to develop useful, high-quality software systems. So far, my work has mostly focused on programming languages, other programming tools, and the systems that support them—including language runtimes and operating systems.
I am also something of a “systems” researcher. It's been a while since I did any work in the areas usually labelled that way, but I see “systems” as a mindset rather than a research topic. It means I am primarily interested in the practical consequences of an idea; its abstract properties are of interest only subserviently. (If you can come up with a better definition of the “systems mindset”, let me know.)
Currently my work is on the Alphabet Soup project, with a focus on infrastructure for debuggers and other observation-based tools. My work in Lugano was based within the FAN project, which has yielded a few publications recently. Don't hesitate to contact me to find out more about either of these. You can also see a periodic snapshot of what I'm working on
I have some background projects concerning composition of mismatched software, dynamic language implementation, debuggers, runtimes and the like. You can read more about those below. A recurring theme is that I prefer integrating programming infrastructure at relatively low levels in the software stack, to the extent that this is sensible (which I believe can be surprising).
Previously I was a PhD student at the University of Cambridge's Computer Laboratory, based in the Networks and Operating Systems group under the supervision of Dr David Greaves. My PhD work centred on the problem of building software by composition of ill-matched components, for which I developed the Cake language. I remain an occasional visitor to the Computer Laboratory, and am also a Fellow of the Cambridge Philosophical Society.
From January 2011 until March 2012 I was a research assistant in the Department of Computer Science at the University of Oxford. There I worked as a James Martin Fellow, within the research programme of the Oxford Martin School's Institute for the Future of Computing. My work mostly focused on constucting a continuum between state-space methods of program analysis (notably symbolic execution) with syntactic methods (such as type checking). There is more work to do—keep a look out for papers.
I also did my Bachelor's degree in computer science in Cambridge, graduating in 2005. I then stayed on for a year as a Research Assistant, before starting my PhD in 2006. See my history section for more information.
My professional development experience includes some spells working for Opal Telecom and ARM around my Bachelor's studies. More recently, I have been consulting for Ellexus.
During summer 2007 I took an internship at Fraser Research, doing networking research.
You can find a version of my CV here, though be aware that it may be a little rough or incomplete at times. If you're recruiting for jobs in the financial sector, even technical ones, please don't bother reading my CV or contacting me, as I promise I am not interested.
Here's a snapshot of what I was working on as of February 2013, with links to any publications in existence. There will be papers about all these things at some point. (As of May 2013, my non-Oracle work is dormant until October.)
Abstract: The Java Virtual Machine (JVM) today hosts implementations of numerous languages. To achieve high performance, JVM implementations rely on heuristics in choosing compiler optimizations and adapting garbage collection behavior. Historically, these heuristics have been tuned to suit the dynamics of Java programs only. This leads to unnecessarily poor performance in case of non-Java languages, which often exhibit systematic difference in workload behavior. Dynamic metrics characterizing the workload help identify and quantify useful optimizations, but so far, no cohesive suite of metrics has adequately covered properties that vary systematically between Java and non-Java workloads. We present a suite of such metrics, justifying our choice with reference to a range of guest languages. These metrics are implemented on a common portable infrastructure which ensures ease of deployment and customization.
This paper describes an effort to extend the space of dynamic metrics defined at the JVM level so that they cover dimensions which exhibit variation between Java and non-Java languages running on the JVM. It describes both a new selection of metrics, and an infrastructure on which they are implemented. One of the parts I find neat is the query-based definition of a large subset of the metrics. Currently this is mostly a convenience, but I hope that in future work, this will prove useful for identifying new kinds of profiling information which is cheap enough to collect and use during dynamic compilation. (In other words, we want cheap observations which do a reasonable job of predicting the workload properties characterised by more expensive metrics, like the ones in this paper.)
Abstract: Dynamic program analysis tools based on code instrumentation serve many important software engineering tasks such as profiling, debugging, testing, program comprehension, and reverse engineering. Unfortunately, constructing new analysis tools is unduly difficult, because existing frameworks offer little or no support to the programmer beyond the incidental task of instrumentation. We observe that existing dynamic analysis tools re-address recurring requirements in their essential task: of maintaining state which captures some property of the analysed program. This paper presents a general architecture for dynamic program analysis tools which treats the maintenance of analysis state in a modular fashion, consisting of mappers decomposing input events spatially, and updaters aggregating them over time. We show that this architecture captures the requirements of a wide variety of existing analysis tools.
This paper is a move towards enabling a more event-driven, reactive style of programming in the creation of dynamic analysis tools. We built an analysis framework, FRANC, based around a library of instrumentation primitives (event sources) together with higher-level “state-oriented” components for decomposing and aggregating events. The neatest idea in the paper is to separate the spatial decomposition of incoming events from their temporal aggregation. This allows a “thin waist” hourglass design which opens up greater opportunity for composition and recombination of independent components.
Abstract: Bytecode instrumentation is a preferred technique for building profiling, debugging and monitoring tools targeting the Java Virtual Machine (JVM), yet is fundamentally dangerous. We illustrate its dangers with several examples gathered while building the DiSL instrumentation framework. We argue that no Java platform mechanism provides simultaneously adequate performance, reliability and expressiveness, but that this weakness is fixable. To elaborate, we contrast internal with external observation, and sketch some approaches and requirements for a hybrid mechanism.
Shortly after joining USI, it became clear that a lot of our group's implementation work was concerned with working around shortcomings with the JVM. Moreover, we had seen some worrying problems with no apparent solution. The reason is that the JVM platform actively encourages tool construction by bytecode instrumentation. But writing tools which instrument bytecode safely are invariably hard and often (depending on the instrumentation) impossible. I don't just mean impossible for non-specialist programmers to get right; I mean outright impossible. Specifically, there is no way to avoid the risk of introducing real and disastrous bugs to the instrumented program. The reason is that instrumentation code is not protected from the base program, and runs at near-arbitrary points, so easily creates deadlock and reentrancy problems, which there is no mechanism to guard against. (Unlike in process-level approaches, “avoid shared data” is not an option. Even dropping to native code doesn't guarantee anything.) There are several more modest difficulties also. This paper is my attempt at turning my colleagues' war stories into a manifesto for VM improvements. It argues that instrumentation is a bad mechanism on which to base VM observation interfaces, and sketches out some alternatives. I take the opportunity to grind one or two personal axes, including the one about how in-VM debugger servers are a step backwards from compiler-generated debugging information.
Abstract: Current VM designs prioritise implementor freedom and performance, at the expense of other concerns of the end programmer. We motivate an alternative approach to VM design aiming to be unobtrusive in general, and prioritising two key concerns specifically: foreign function interfacing and support for runtime analysis tools (such as debuggers, profilers etc.). We describe our experiences building a Python VM in this manner, and identify some simple constraints that help enable low-overhead foreign function interfacing and direct use of native tools. We then discuss how to extend this towards a higher-performance VM suitable for Java or similar languages.
This is a short paper about some in-progress work that was begun with Conrad Irwin when he was a final-year undergraduate in Cambridge and I was his project supervisor. Conrad implemented a usable subset of Python which uses DWARF debugging information to interface with native libraries, rather than the usual wrapper generation step. This paper extends that idea in a few ways, culminating in a design which can (we hope!) allow native tools (gdb, Valgrind, etc.) to operate seamlessly across programs written in a diverse mixture of languages, while avoiding conventional foreign function interfacing overheads. I am continuing the implementation effort.
Abstract: Tools for composing software impose homogeneity requirements on what is composed—that modules must share a language, target the same libraries, or share other conventions. This inhibits cross-language and cross-infrastructure composition. We observe that a unifying representation of software turns heterogeneity of components into a matter of styles: recurring interface patterns that cross-cut large numbers of codebases. We sketch a rule-based language for capturing styles independently of composition context, and describe how it applies in two example scenarios.
This is a short paper describing an idea which I developed in the last technical chapter of my PhD thesis. The idea is that interfaces differ stylistically—in things like how they pass arguments, how they report errors, and in some less routine ways—and that by capturing these styles independently of composition context, we could achieve a lot more compositions with a lot less programmer effort. It's pitched as an extension to Cake; so far it's unimplemented, with the exception of a few foundations in the compiler; I intend to implement it one day, but have no idea when I'll get time. The most compelling content of the paper is therefore the preliminary survey (read: brainstorm) of styles in well-known interfaces most readers will recognise.
Abstract: Software's expense owes partly to frequent reimplementation of similar functionality and partly to maintenance of patches, ports or components targeting evolving interfaces. More modular non-invasive approaches are unpopular because they entail laborious wrapper code. We propose Cake, a rule-based language describing compositions using interface relations. To evaluate it, we compare several existing wrappers with reimplemented Cake versions, finding the latter to be simpler and better modularised.
This is a long research paper describing the design, implementation and evaluation of the basic Cake language and runtime.
Abstract: Conventional tools yield expensive and inflexible software. By requiring that software be structured as plug-compatible modules, tools preclude out-of-order development; by treating interoperation of languages as rare, adoption of innovations is inhibited. I propose that a solution must radically separate the concern of integration in software: firstly by using novel tools specialised towards integration (the “integration domain”), and secondly by prohibiting use of pre-existing interfaces (“interface hiding”) outside that domain.
This is a relatively “long view” on the position underlying my PhD work, expounding my gut feeling that the way we build software is bizarrely fragile and unrealistic, owing to the expectation that big pieces of software should be made from perfectly-fitting smaller pieces without any sort of “glue” or integration material. This is what makes software maintenance expensive, software evolution difficult and software functionality inflexible. I'm always especially glad to receive comments on the argument and general story presented by this paper.
Abstract: Existing black-box adaptation techniques are insufficiently powerful for a large class of real-world tasks. Meanwhile, white-box techniques are language-specific and overly invasive. We argue for the inclusion of special-purpose adaptation features in a configuration language, and outline the benefits of targetting binary representations of software. We introduce Cake, a configuration language with adaptation features, and show how its design is being shaped by two case studies.
This is a short work-in-progress paper describing one part of my PhD work. The idea is that composition of software in binary form is desirable, both for convenience and because binaries offer a somewhat language-neutral model of software. The problem of linking together mismatched components in binary form, specifically at the object code level (meaning these components could be written in any number of languages including C and C++), has had little attention so far. However, a fairly descriptive model of binaries is already provided by debugging info standards like DWARF. The thesis of this work is that a domain-specific configuration language, with its baseline at the DWARF level of abstraction, can be a practical tool for describing compositions of mismatched binaries, where the language explicitly provides for adaptation to address mismatches in a black-box fashion. The main contribution is describing the proposed features of our language, which is called Cake, and the case-study experiences which are shaping it.
Writing a large survey is a task only a fool would undertake... so I had a go. Practical software adaptation techniques means techniques for bridging or modifying the interfaces (mostly...) of software components. It doesn't include adaptive techniques, as found in reflective and/or mobile middleware. The paper is unfortunately a bit disjointed, coming in two halves: first, it covers twentyish approaches found in the literature, and classifies them on what they do and how. Secondly it then discusses (borrowing some bits from my earlier workshop paper) various other aspects of such techniques: adapting component instances versus implementations, static versus dynamic components, elegance of model versus supporting heterogeneity, the utility of various distinctions between computational and communicational code, and what “loosely coupled” really means. In hindsight, a better survey would take all these latter issues and fold them into the first half's taxonomy, visiting the work in some order which allows elaboration on the more discursive matters as they are raised. It also needs a neater way, most likely a better graphical language, to describe the taxonomised nature of each technique, rather than relying on somewhat impenetrable verbal descriptions.
Abstract: Existing work on software connectors shows significant disagreement on both their definition and their relationships with components, coordinators and adaptors. We propose a precise characterisation of connectors, discuss how they relate to the other three classes, and contradict the suggestion that connectors and components are disjoint. We discuss the relationship between connectors and coupling, and argue the inseparability of connection models from component programming models. Finally we identify the class of configuration languages, show how it relates to primitive connectors and outline relevant areas for future work.
ACM copyright notice: © ACM, 2007. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in SYANCO ’07. http://doi.acm.org/10.1145/1294917.1294918
This is a somewhat strident and sweeping workshop paper where I try to decode the meaning and intentions of the many and various references to “software connectors” found in the literature. I then talk a bit about how this relates to coordination, adaptation and the concept of coupling, and give a sketchy outline of how I think compositional and communication-oriented concerns could or should be abstracted and supported by tools. Apart from the latter, there's not really any research contribution, except an attempt to hold to account the vague and flimsy motivational arguments that appear in the introductions of lots of (often otherwise good) papers.
See also my author page on DBLP (with much credit to Michael Ley for this superb service).
This is quite a nice introduction, albeit brief and rather dense, to my PhD work. Or at least the problem, approach, ideas etc., since not much actual work is mentioned....
This is a thesis proposal I submitted after about 12 months of PhD work. It reads nicely enough but is far too ambitious!
My research interests are outlined at the top of this page. In summary: although broad, they fall mostly within the intersections of systems, programming languages and software engineering.
I keep a calendar of approximate submission deadlines and event dates for most of the conferences and workshops that I might conceivably attend, together with a few for which it's verging on inconceivable. In case it's useful, here it is.
For my PhD I worked on supporting software adaptation at the level of the operating system's linking and loading mechanisms. Here “adaptation” means techniques for connecting and combining software components which were developed independently, and therefore do not have matched interfaces. My emphasis was on techniques which are practical, adoptable and work with a wide variety of target code, at the (hopefully temporary) expense of safety and automation.
The main focus of my work was Cake, a special-purpose language for describing relations between the interfaces of binary components (specifically, relocatable object code). Cake makes heavy use of DWARF debugging information, and can be considered interesting in several ways: as a domain-specific rule-based programming language; as a “composition”, “configuration” or “linking” language; as a dynamic language; as a runtime system sharing commonalities with garbage collectors and debuggers. It does not really make contributions in the domains of module systems or linking models.
To find out more, please browse my publications, and do contact me for more information. There will hopefully be one or two more papers appearing on additional work that I did during my PhD years. My dissertation will be available too, once I have the final version ready.
I'm very grateful to EPSRC and Cambridge Philosophical Society for the funds which supported my PhD research work and some related travel, and to the Graduate Research Fund and Emily & Gordon Bottomley Fund of Christ's College, EuroSys, The Royal Academy of Engineering, ACM SIGSOFT and ACM SIGPLAN for additional support of my research travel and conference attendance.
In Cambridge, I coordinated the NetOS group talklets from January 2009 until January 2010. I also had a librarian-like role of curating a small group library and keeping a very vague track of the various books we had lying around (local users: see /usr/groups/netos/library). Finally, I looked after (in a rather neglectful fashion) the Atlas Room BBC Micro (about which I should write more some time).
I'm a member of the ACM, ACM SIGSOFT, SIGPLAN and SIGOPS.
Currently I am on the programme committee for PPPJ 2013. I am also publicity chair for SC 2013.
In the recent past I have been on the programme committee for RESoLVE 2012, at ASPLOS, and the shadow programme committee for Eurosys 2012.
Previously, I was privileged to contribute external reviews for a few submissions to EuroSys 2009, one for ESOP 2010 and one for SVT at SAC 2012.
For now, my research “leadership” is confined to the student projects I've been known to supervise, which are in the Teaching section. I am always interested in working with enthusiastic Bachelor's, Master's and doctoral students. I maintain a list of project ideas, and am always happy to talk about other ideas.
During 2005–06 I was a Research Assistant in Cambridge on the XenSE and Open Trusted Computing projects, under Steven Hand. Both projects seek to implement a practical secure computing platform, using virtualisation (and similar technologies) as the isolation mechanism.
XenSE never had a web page of its own, but you might want to look at the abstract on the project's EPSRC Grant Portfolio page, or check out the mailing list.
OpenTC is a large EU-funded project involving many major industrial and academic partners, focused on the use of Trusted Computing Group technology to realise many common secure computing use cases.
As part of my work as an RA, I became interested in secure graphical user interfaces including L4's Nitpicker, a minimal secure windowing system. I began work on ports of this system to Linux, XenoLinux and the Xen mini-OS: the Linux version became mostly functional (but not yet secure!) while the others were stymied by various limitations with shared memory on Xen. These limitations are mostly fixed now, but I haven't had time to revisit the project since. Feel free to contact me for more information. If you wanted to take these up, I'd be glad to hear from you.
Right now, in Lugano, I am not currently doing any teaching work.
Previously, in Oxford I did a little, and in Cambridge I did quite a lot. This section is therefore mostly historical, but might be useful nonetheless, and will no doubt be revived at some point when my teaching load increases.
During spring 2011 I was a tutor for the Digital Systems course in Oxford.
In Cambridge I “supervised” (tutored) many systems and programming courses from the Computer Science Tripos. The list below includes both current and past courses I supervised, together with any additional materials I prepared.
In April 2010 I gave a lecture to the MPhil in Advanced Computer Science class in Cambridge, as part of the Cambridge Programming Research Group mini-series within the Research Students' Lecture series. My lecture was entitled “Modularity – what, why and how”. Contact me for slides. Other lectures in the CPRG mini-series were given by Dominic Orchard, Max Bolingbroke and Robin Message.
During Michaelmas 2009, in Cambridge, I demonstrated the MPhil course Building an Internet Router, run by Andrew Moore.
I'm interested in supervising bright and enthusiastic Bachelor's and Master's students for their individual projects. For ideas and to find out what I'm interested in, see my list of suggested projects. I'm also always extremely happy to talk to students who have their own ideas.
I've supervised several projects in the past, in Cambridge; Bachelor's students at Cambridge can read my thoughts about Part II projects, see the project suggestions for 2010–11 from me and others in the NetOS group, or from the entire Lab and beyond. Previously I've been fortunate enough to work with the following final-year students:
Note: as of late 2011, I have started using GitHub. Over time, code for all my larger projects will start to appear on my GitHub page. This page is likely to remain the more complete list, for the time being.
My research work involves building software. Inevitably, this software is never “finished”, rarely reaches release-worthy state, and usually rots quickly even then. So not much here is downloadable yet; this will improve over time, but in the meantime I list everything and invite you to get in touch if anything sounds interesting. My main projects, past and present, are:
I've also submitted small patches to various open-source projects including LLVM (bugfix to bitcode linker), binutils (objcopy extension), gcc (documentation fix for gcj), Knit (compile fixes), Datascript (bug fixes), DICE (bug fixes), pdfjam (support “page template” option) and Claws Mail (support cookies file in RSS plugin). Some of them have even been applied. :-)
Apart from my main development projects, I sometimes produce scripts and other odds and ends which might be useful to other people. Where time permits, I try to package them half-decently and make them available here.
I have written some scripts which attempt to retrieve decent BibTeX for a given paper (as a PDF or PostScript file). details
I've written a nifty script for printing papers, which helps people save paper, share printed-out papers and discover perhaps unexpected collaborators within their institution. details
I have a Makefile which downloads and compiles Tripos past-paper questions. It's pretty much self-documenting. Here it is.
I have built a sizable collection of vaguely useful shell scripts and other small fragments. One day “soon” I will get round to publishing them. The biggest chunks are my login scripts, which use Subversion to share a versioned repository of config files across all the Unix boxes that I have an account on, and the makefile and m4 templates that build this web page. I need to clean these up a bit. In the meantime, if you're interested in getting hold of them, let me know.
Occasionally I write down some thoughts which somehow relate to my work. They now appear in the form of a blog. Posts are categorised into research, and teaching, development, publishing and meta strands.
I have the beginnings of a personal web page. It's very sparse right now. Have a look, if you like.
| Office | FN07 |
| deduce from firstname.lastname@cl.cam.ac.uk | |
| Post | Dr Stephen Kell Computer Laboratory, University of Cambridge 15 JJ Thomson Avenue Cambridge, CB3 0FD United Kingdom |
Content updated at Fri May 31 14:50:00 CEST 2013.
validate this page