Automatic Generation of Test Oracles from Natural Language Specifications

Decanato - Facoltà di scienze informatiche

Data: 7 Aprile 2022 / 09:30 - 11:00

USI Campus EST, room D1.15, Sector D // Online on MS Teams

You are cordially invited to attend the PhD Dissertation Defence of Arianna Blasi on Thursday 7 April 2022 at 09:30 in room D1.15 or online on MS Teams.

Abstract:
This thesis proposes a framework to automatically derive test oracles from natural language specifications. We studied and developed cost-effective techniques to derive oracles from information commonly available in natural language about the code. Despite previous research in software testing, the oracle problem, that is, the challenge to distinguish correct from incorrect behavior, is still largely open. Contemporary test case generators rely on either simple and incomplete implicit oracles or on regression oracles that refer to the results of running previous versions of the program. Implicit oracles can reveal exceptions and program crashes, but miss semantically relevant issues. Regression oracles can detect deviations from the behavior observed in former versions of the program, but not in new functionalities. Many approaches generate powerful and complete test oracles from formal specifications that are still not the most common practice in software development. The main goal of this thesis is to define techniques that use natural language annotations, which current approaches largely ignore, to generate effective test oracles without additional human effort. Most software systems are supported by textual information, such as annotations, comments, and wikis. This information is typically informal and unstructured, and often combines natural language expressions with developers' jargon. We also observe that informal artifacts are prone to human mistakes. In a nutshell, informal artifacts are not always reliable and are hard to exploit automatically. This thesis defines approaches to automatically interpret and translate informal and unstructured information that combines natural language expressions with developers' jargon into actionable test oracles, while overcoming human errors that affect their quality. We first automatically verify the consistency between the code and its documentation, and signal developers' inconsistencies that we detect at a fine-grained level. We then process the pruned specifications to automatically generate test oracles in the form of executable assertions. We designed and developed approaches that process both structured and unstructured Javadoc specifications, to derive both descriptive and prescriptive assertions for the methods of a Java class. We successfully experimented with the approaches on popular, widely-used and open-source Java systems. This thesis opens new research directions towards the automated exploitation of natural language artifacts beyond Javadoc specification, to derive powerful test oracles. The intuitions and observations our study proposes can be generalized and applied to many other software related artifacts.

Dissertation Committee:
- Prof. Mauro Pezzè, Università della Svizzera italiana, Switzerland (Research Advisor)
- Prof. Alessandra Gorla, IMDEA Software Institute, Spain (Research co-Advisor)
- Prof. Carlo Alberto Furia, Università della Svizzera italiana, Switzerland (Internal Member)
- Prof. Paolo Tonella, Università della Svizzera italiana, Switzerland (Internal Member)
- Prof. Earl Barr, University College London, UK (External Member)
- Prof. Michael Pradel, University of Stuttgart, Germany (External Member)