Studying strengths and weaknesses of code recommenders

Decanato - Facoltà di scienze informatiche

Data: 19 Dicembre 2023 / 09:00 - 12:00

USI East Campus, Room D0.03

You are cordially invited to attend the PhD Dissertation Defence of Matteo Ciniselli on Tuesday 19 December 2023 at 09:00 in room D0.03 (USI East Campus).

Abstract:
Given the high costs of software development and maintenance, tools and techniques have been proposed both in industry and academia to speed up programming activities. Some of these techniques focus on automating the generation of source code, by predicting the code tokens the developer would write starting from the ones already typed. We use the term code recommender to refer to these approaches. In this thesis, we study the strengths and the weaknesses of code recommenders from several perspectives. We started by investigating the accuracy of recently proposed Deep Learning (DL) models in generating non-trivial and possibly multiple code statements. Indeed, previous work usually tested these techniques in the relatively easy task of predicting the single next token the developer is likely to write. We pushed the boundaries of the predictions to entire blocks of code, showing that DL models can achieve excellent results when predicting a few code tokens but their accuracy steadily decreases when asked to predict entire code statements. Still, even in such a challenging scenario, DL models are able to produce correct suggestions in ~29% of cases. Given such a finding and the stunning capabilities of recently released products such as GitHub Copilot, we decided to investigate the extent to which DL models tend to copy code from the training set. Surprisingly, we found that DL models rarely copy verbatim from the training data, mostly generating original code. Finally, we studied the extent to which DL models generalize across different versions of the same programming language, showing that a model trained on a language version vi exhibits a strong drop in performance when asked to predict completions on a version vj ? vi. Then, since most of the tools proposed in the literature are based on researchers’ intuitions, we ran a survey with 80 practitioners to create a taxonomy of characteristics of code recommenders they consider important (e.g., adapting the recommendations to the developer’s programming style). Such a taxonomy can be used to inform future research in the field aimed at targeting the weaknesses of code recommenders as perceived by practitioners. To show that, we tackle one of the identified weaknesses (i.e., limited knowledge of the coding context) and show how the performance of code recommenders can be boosted by enriching the coding context they are aware of before triggering a recommendation.

Dissertation Committee:
- Prof. Gabriele Bavota, Università della Svizzera italiana, Switzerland (Research Advisor)
- Prof. Michele Lanza, Università della Svizzera italiana, Switzerland (Internal Member)
- Prof. Laura Pozzi, Università della Svizzera italiana, Switzerland (Internal Member)
- Prof. Martin Pinzger, Universität Klagenfurt, Austria (External Member)
- Prof. Michele Tufano, Microsoft, USA (External Member)