The Retrieval of Legal Information
Staff - Faculty of Informatics
Start date: 28 September 2010
End date: 29 September 2010
DATE: Tuesday, September 28th 2010
PLACE: USI Università della Svizzera italiana, room A24, Red building (Via G. Buffi 13)
Legal information is ubiquitous in nowadays digital repositories. There are two major aspects of legal information that are difficult to treat with general algorithmic solutions of the problem of retrieval in repositories: (1) the pragmatic nature of legal information, and (2) the usage of technical terms whose meaning is strongly context-dependent.
PRGMATICS OF LEGAL TEXTS
Traditional techniques for text processing include text classification, text clustering, information extraction from text. They are based upon the idea that a text speaks about things that the same text, directly or indirectly, describes. For instance, a Curriculum vitae describes, in particular, the job experience of an individual. However, legal texts do not describe, but direct. They do not contain descriptions, but instructions, norms, prescriptions. When we look at a repository as a source of information, it makes sense, not only to look for information within the text, but also to infer information from the text. In other terms, we need a technology that retrieves information and generates the missing parts by inference, but inferential structure of systems that provide norms is not the same of systems with descriptive information. Due to the second point above, we need a solution that envelopes inference in normative systems.
LEGAL TERMINOLOGY AND TEXT RETRIEVAL
The meaning of legal terms strongly depends on the purpose of the document. For instance, the term duty is differently interpreted in contracts and in bylaws. Therefore, the classification of text based upon presence of similar terms need a direct analysis of the context in which this similarity holds, to provide a full answer to the question. In the seminar I will envision the architecture of a system for legal text retrieval, by defining inferential mechanisms, contextual definitions for legal terms and sketching a general method for establishing the similarity of legal text based upon an extension of a classical method known, in text mining, by the term bag of words. The extension of the bag of words model and the introduction of the new inferential mechanisms, and the legal definitions based upon contexts are then used to illustrate how a classical method of text retrieval based on bag of words, in particular the classification algorithm, k-nearest neighbor, can be modified for legal texts.
Matteo Cristani is Assistant Professor of Computer Science in the University of
Verona since April 1997. He got a PhD in Electronics and Computer Science for Industrial
Applications in the University of Padova (Italy) in 1995 with the PhD Thesis "Ragionamento temporale metrico con reti di vincoli: algoritmi per compiti di ragionamento esatto ed approssimato"
(Metric Temporal Reasoning with Constraint Networks: Algorithms for Exact and Approximate Reasoning Tasks) under the supervision of Prof. E. Pagello