Events
13
September
2022
13.
09.
2022
15
September
2022
15.
09.
2022
29
September
2022
29.
09.
2022
06
October
2022
06.
10.
2022
13
October
2022
13.
10.
2022

Why do we encourage even more missingness when having missing data?

Staff - Faculty of Informatics

Date: 9 May 2019 / 16:30 - 17:30

USI Lugano Campus, room SI-006, Informatics building (Via G. Buffi 13)

Speaker:
Richard Torkar, Chalmers University of Technology and University of Gothenburg, Sweden

Abstract:
Most would argue that in order to conduct estimations one should not rely exclusively on expert opinion, but also on data of a more quantitative nature using unbiased data collection approaches. To this end, researchers have published studies making use of, among others, the International Software Benchmarking Standards Group's data repository (ISBSG). One could make an argument that this data set, and similar data sets, have several things in common with data collected in industry, i.e., missing data, disparate quality in data collection procedures, and variety of data types collected, are issues we see also in empirical software engineering research in general.

The prevailing strategy to handle missing data in empirical software engineering research is to merely remove cases of missing data (listwise deletion). We believe that this strategy is suboptimal and, generally speaking, not good for our research discipline. Even in cases when data can be classified according to the quality of the data collection procedure, as is the case with the ISBSG data sets, one sees that our community often chooses only to use a subset of data, classified to be of the highest quality. In short, we believe that data of low quality should be seen as better than no data at all, and the general rule of thumb should be never to throw away data.

We will present a case where we apply techniques for data imputation in addition to conducting Bayesian data analysis on effort estimation data using the ISBSG data set.

Biography:
Richard Torkar works as a professor at Chalmers and the University of Gothenburg, Sweden. Richard is mainly conducting research with industry in very varying topics, lately e.g., behavioral software engineering, applications of meta-heuristic algorithms, Markov Chain Monte Carlo diagnostics, and using Bayesian statistics as a foundation for machine learning applications.

Host: Prof. Carlo A. Furia