Getting everything wrong without doing anything right! On the perils of large-scale analysis of Github data

Jan Vitek, Northeastern University Boston, USA

Github has a wealth of data, trying to mine those data for insights about the software development process is irresistible. This talk is a cautionary tale of what can go wrong if care and healthy skepticism are not applied to the results obtained from data torture. Jan Vitek will tell us about a study that aimed to link the choice of programming language to software defect and how that study failed at more or less every juncture. This talk will touch on how reproduction studies can help us regain trust in the results we cite and on how to make our results reproducible.

Jan Vitek is a Professor of Computer Science at Northeastern University in Boston and holds an ERC grant at Czech Technical University in Prague. He is cited for his work on coverage detection in sensor networks, on protein backbone NMR assignment, on crack detection for wind turbines, on calculi for mobility, and some assorted programming language and software engineering research. He still occasionally writes software. He chaired SIGPLAN, founded Fiji systems, and was part of founding team of He was program chair of ESOP, ECOOP, VEE, Coordination, and TOOLS. He was on the steering committees of ECOOP, JTRES, TRANSACT, ICFP, OOPSLA, POPL, PLDI and LCTES.

