Software Documentation: Automation and Challenges

Decanato - Facoltà di scienze informatiche

Data: 18 Giugno 2020 / 15:30 - 17:00

You are cordially invited to attend the PhD Dissertation Defense of Emad Aghajani on Thursday June 18th, 2020 at 15:30.
Please note that given the updated Covid-19 restrictions, the dissertation defense will be held online.
You can join here (Password: 436014)

Abstract:
Despite the undeniable practical benefits of documentation during software development and evolution activities, its creation and maintenance is often neglected, leading to inadequate and even inexistent documentation. Thus, it is not unusual for developers to deal with unfamiliar code they have difficulties in comprehending. Browsing the official documentation, or accessing online resources, such as Stack Overflow, can help in this "code comprehension" activity that, however, remains highly time-consuming. Enhancing the code comprehension process has been the goal of several works aimed at automatically documenting software artifacts. Although these techniques addressed the issue, they exhibit a number of major limitations such as working at a coarse-grained level, and not allowing to document a single line of code of interest. While the creation of such novel systems entails conceptual and technical challenges related to the collection, inference, interpretation, selection, and presentation of useful information, it also requires solid empirical foundations on software developers' needs — "what" information is (or is not) useful "when" to developers. Our thesis is that empirical knowledge about software documentation issues experienced and considered relevant by practitioners is instrumental to lay the foundations for the next-generation tools and techniques for automated software documentation. To this aim, in this dissertation we present our research accomplishments towards automating developer documentation on two fronts: (1) empirical studies on the nature of software documentation with a specific focus on documentation issues experienced by software developers, and (2) development of tools supporting the code comprehension process. In the former direction, we conducted a large-scale empirical study, where we mined, analyzed, and categorized a large number of documentation-related artifacts and developed a detailed taxonomy of documentation issues from which we infer a series of actionable proposals both for researchers and practitioners. We validated our findings by surveying professional software practitioners. In the latter direction, we developed ADANA, a framework which generates fine-grained code comments for a given piece of code at the granularity level intended by the developer. Our contributions to the body of software documentation knowledge shed light on unseen facts about overlooked software documentation matter and lay the foundations for the next-generation tools and techniques for automated software documentation.

Dissertation Committee:
- Prof. Michele Lanza, Università della Svizzera italiana, Switzerland (Research Advisor)
- Prof. Gabriele Bavota, Università della Svizzera italiana, Switzerland (Research co-Advisor)
- Prof. Cesare Pautasso, Università della Svizzera italiana, Switzerland (Internal Member)
- Prof. Paolo Tonella, Università della Svizzera italiana, Switzerland (Internal Member)
- Prof. Anthony Cleve, University of Namur, Belgium (External Member)
- Prof. Nicole Novielli, University of Bari, Italy (External Member)