Mutation Testing of Deep Learning Systems

Staff - Faculty of Informatics

Date: 20 April 2023 / 15:30 - 16:30

USI Campus Est, room D1.14, Sector D

You are cordially invited to attend the PhD Dissertation Defence of Nargiz Humbatova on Thursday 20 April 2023 at 15:30 in room D1.14

Abstract:
Deep Learning (DL) solutions are increasingly adopted, but how to test them remains a major open research problem. Existing and newly proposed testing techniques are adapted to DL systems, including mutation testing. However, no approach has investigated the possibility of simulating the effects of real DL faults by means of mutation operators. We present a comprehensive taxonomy of faults in deep learning systems. We manually analysed artefacts collected from GitHub commits and Stack Overflow posts to study the variety of DL-specific faults. Interviews with DL practitioners describing the problems they have encountered have enriched our taxonomy with a number of additional faults that did not emerge from the other two sources. On the basis of the constructed taxonomy and other 2 empirical studies, we defined 35 DL mutation operators. We have implemented 24 of these operators into DEEPCRIME, the first source-level pre-training mutation tool based on real DL faults. We have assessed our mutation operators to understand their characteristics: sensitivity to the changes in the quality of test data and whether they produce interesting, i.e., killable but not trivial, mutations. To help developers use mutation testing to improve test sets, we propose DEEPMETIS, a search-based tool, that automatically generates new test inputs that increase the test set’s ability to detect artificially injected defects. Experimental results show that DEEPMETIS is effective in augmenting a given test set. We evaluate state-of-the-art techniques that can be used to repair model performance or to locate the root cause of a fault. We use a benchmark of real-world mistakes made by developers while designing DNN models and artificial faulty models generated by mutating the model code. Empirical evaluation shows that the random baseline is comparable to or sometimes better than existing model repair approaches, while for more complex models all repair techniques fail to find fixes. Fault localisation techniques also fail to detect any fault in a number of cases, and produce suggestions far from the available ground truth. Our findings call for further research to develop more sophisticated techniques for Deep Learning repair and fault localisation.

Dissertation Committee:
- Prof. Paolo Tonella, Università della Svizzera italiana, Switzerland (Research Advisor)
- Prof. Laura Pozzi, Università della Svizzera italiana, Switzerland (Internal Member)
- Prof. Carlo Alberto Furia, Università della Svizzera italiana, Switzerland (Internal Member)
- Prof. Shin Yoo, Korea Advanced Institute of Science and Technology, Republic of Korea (External Member)
- Prof. Lionel Briand, University of Ottawa, Canada (External Member)
- Dr. Gunel Jahangirova, King’s College London, UK (External Member)