Past seminars at the Faculty of Informatics

The Faculty of Informatics is pleased to announce a seminar given by Filippo Menczer
TITLE: Modeling Statistical Properties of Written Text
SPEAKER: Filippo Menczer, Indiana University, Bloomington
DATE: Friday, June 26th, 2009
PLACE: USI Università della Svizzera italiana, room SI-006, Informatics building (Via G. Buffi 13)
TIME: 15.30
Written text is one of the fundamental manifestations of human
language, and the study of its universal regularities can give clues
about how our brains process information and how we, as a society,
organize and share it. Among these regularities, only Zipf's law has
been explored in depth. Other basic properties, such as the existence
of bursts of rare words in specific documents, have only been studied
independently of each other and mainly by descriptive models. As a
consequence, there is a lack of understanding of linguistic processes
as complex emergent phenomena. Beyond Zipf's law for word frequencies,
here we focus on burstiness, Heaps' law describing the sublinear
growth of vocabulary size with the length of a document, and the
topicality of document collections, which encode correlations within
and across documents absent in random null models. We introduce and
validate a generative model that explains the simultaneous emergence
of all these patterns from simple rules. As a result, we find a
connection between the bursty nature of rare words and the topical
organization of texts and identify dynamic word ranking and memory
across documents as key mechanisms explaining the non trivial
organization of written text. Our research can have broad implications
and practical applications in computer science, cognitive science and
linguistics. Joint work with Mariangeles Serrano and Alessandro
Filippo Menczer is an associate professor of informatics and computer
science, adjunct associate professor of physics, and a member of the
cognitive science program at Indiana University, Bloomington.  He holds a
Laurea in Physics from the University of Rome and a Ph.D. in Computer
Science and Cognitive Science from the University of California, San Diego.
Dr.  Menczer has been the recipient of Fulbright, Rotary Foundation, and
NATO fellowships, and a Career Award from the National Science Foundation.
He is the Associate Director of the Center for Complex Networks and Systems
Research in the IU School of Informatics, a Fellow-at-large of the Santa Fe
Institute, and a Lagrange Senior Fellow at the Institute for Scientific
Interchange Foundation in Torino, Italy. His research is supported by
the NSF and focuses on Web, text, and data mining, social Web applications,
distributed and intelligent Web information systems, and modeling of
complex information networks.
HOST: Prof. Fabio Crestani

URL 1: