Statistical Inference for Quality Control in Crowd-sourcing

Faculty of Informatics - Academic Studies Administration

Date: 2 October 2018 / 15:30 - 16:30

USI Lugano Campus, room SI-006, Informatics building (Via G. Buffi 13)

Speaker:	Mark Carman
	Monash University, Australia
Date:	Tuesday, October 2, 2018
Place:	USI Lugano Campus, room SI-006, Informatics building (Via G. Buffi 13)
Time:	15:30-16:30

Abstract:

Crowd-sourcing platforms such as CrowdFlower and Amazon's Mechanical Turk allow tasks requiring human level intelligence (such as image annotation) to be easily and cheaply outsourced to online workers. These platforms have proven extremely useful for collecting the large quantities of labeled data required for training supervised machine learning algorithms. Since crowd-workers can vary in their abilities and motivation, the quality of responses from individual workers cannot be guaranteed. Thus various quality control mechanisms have been developed in order to improve label quality. These mechanisms work by either vetting workers before responses are solicited or by using statistical inference procedures to infer the correct answer to questions after conflicting responses are received. In this talk I will discuss our recent work on improving these statistical inference procedures in a number of directions, namely by (i) taking worker motivations into account, (ii) leveraging contextual information such as how long it takes each worker to respond, (iii) automatically detecting subjective questions and distinguishing subjectivity from difficulty, and (iv) leveraging correlations between response classes in order to improve label quality.

Biography:

Mark Carman is a Senior Lecturer at Monash University, a top-100 rated university in Melbourne, Australia. He joined Monash in 2010 after doing a postdoc at the University of Lugano. He received his PhD from the University of Trento in 2006 having spent his PhD tenure at both the Fondazione Bruno Kessler (FBK) and the Information Sciences Institute (ISI) of USC. Mark's research lies in Data Science with a particular focus on problems in Information Retrieval. He has worked on techniques for learning web search rankings, scaling machine learning algorithms to large data quantities, robust clustering in high dimensions, improving quality-control in crowd-sourcing, and personalising search results and recommended content. Other applications of his work include speeding up digital forensic investigations, detecting sentiment and sarcasm in text, correcting errors in OCR output, and estimating user expertise in social media. Mark has authored a large number of publications in prestigious venues, including full papers at SIGIR, KDD, IJCAI, CIKM, ECIR, WSDM, HT, CoNLL, EACL, HCOMP and ICDAR, and articles in TOIS, IR, JMLR, ML, PR, JAIR, CS&L, JASIST, DI and CSUR. Moreover, he has served on the program committees of many IR/DM/AI conferences, including SIGIR, WSDM, CIKM, ECIR, KDD, WWW, EMNLP, ACML, IJCAI and AAAI and is currently an Associate Editor for the journal TOIS.

Webpage:

http://users.monash.edu.au/~mcarman/

Host:

Prof. Fabio Crestani