Dataset Not Anonymized, Reject. A Personal Account of Ethics, Privacy, and Anonymization Issues in SE Research
Istituto del software
Data: 25 settembre 2025 / 16:30 - 17:30
USI Campus EST, Room D0.03
Speaker: Marco Raglianti, Università della Svizzera italiana
Abstract: Research in software engineering puts many ethical roadblocks (or at least checkpoints) on our path when it comes to performing experiments, conducting surveys, or even analyzing public developer conversations. And I am not talking about forging the answers for that last survey whose reply is two weeks late, despite tomorrow’s deadline for your core PhD paper. Just as an example, when analyzing GitHub repositories, we are (or rather should be) thinking about the potential consequences of our discoveries about developer productivity and the quality of their code. Anonymizing the presented results is not always as straightforward as it seems. And here we start to see the shortcomings of our (at least my) education about ethics.
Starting from the paper by Gold and Krinke, 2020 (Ethical Mining – A Case Study on MSR Mining Challenges), I will go through some episodes in my academic experience that shaped my approach to the problem but also raise important questions on how deep we should go for due diligence. Some institutions (e.g., Carleton University) require preemptive authorization by an Ethics Review Board for the study to be conducted, even for a Master Thesis. While others, like USI, only recently approved (December 2023) an Ethics Code of Conduct, whose chapter 3 includes three (3!) full pages on integrity and ethics of scientific research. The referenced documents just show how inadequate is our preparation to handle the nuances and implications of ethical considerations about our research work. Do we really need to care? Can we improve our current practices on a personal or even institutional level? Are there some inevitable shortcomings tied to the current peer review model?
Biography: Dr. Marco Raglianti is a postdoctoral research fellow in the REVEAL (Reverse Engineering, Visualization, Evolution Analysis Lab) research group, at the Software Institute, USI. He obtained his PhD in Software Engineering just a few months ago in the same lab, under the supervision of Prof. Dr. Michele Lanza, while his Bachelor and Master degrees in Computer Science from the University of Pisa date back to 2006 and 2012, respectively. His current research involves visualizing the documentation landscape of software systems, a natural continuation of his PhD work. He is also enjoying the new perspective that comes from co-supervising and contributing to other research lines in the lab, like, eXtended Reality (XR) software visualization and interaction (Mattia), sonification of software and its evolution (Carmen), and a TBA-yet software visualization topic (Ian). He is also interested in forms of communication becoming software documentation and investigated Discord conversations and their summarization (both human- and LLM-generated) as a potential source to aid in program comprehension, maintenance, and evolution.
Chair: Marco Paganoni
*************************
In February 2019, the Software Institute started its SI Seminar Series. Every Thursday afternoon, a researcher of the Institute will publicly give a short talk on a software engineering argument of their choice. Examples include, but are not limited to novel interesting papers, seminal papers, personal research overview, discussion of preliminary research ideas, tutorials, and small experiments.
On our YouTube playlist you can watch some of the past seminars. On the SI website you can find more details on the next seminar, the upcoming seminars, and an archive of the past speakers.