Technical report detail

Understanding and Classifying the Quality of Technical Forum Questions

by Luca Ponzanelli, Andrea Mocci, Alberto Bacchelli, Michele Lanza

Technical questions and answers (Q&A) services have become a valuable resource for developers. A prominent example of technical Q&A website is Stack Overflow (SO), which relies on a growing community of more than two millions of users who actively contribute by asking questions and providing answers. To maintain the value of this resource, poor quality questions among the more than 6,000 asked daily have to be filtered out. Currently, poor quality questions are manually identified and reviewed by selected users in SO; this costs considerable time and effort. Automating the process would save time and unload the review queue, improving the efficiency of SO as a resource for developers. We present an approach to automate the classification of questions according to their quality. We present an empirical study that investigates how to model and predict the quality of a question by considering as features both the contents of a post (e.g., from simple textual features to more complex readability metrics) and community-related aspects (e.g., popularity of a user in the community). Our findings show that there is indeed the possibility of at least a partial automation of the costly SO review process.

Technical report 2014/02, June 2014

BibTex entry

@techreport{14understanding, author = {Luca Ponzanelli and Andrea Mocci and Alberto Bacchelli and Michele Lanza}, title = {Understanding and Classifying the Quality of Technical Forum Questions}, institution = {University of Lugano}, number = {2014/02}, year = 2014, month = jun }