Query Refinement for Patent Prior Art Search

Decanato - Facoltà di scienze informatiche

Data d'inizio: 23 Giugno 2014

Data di fine: 24 Giugno 2014

You are cordially invited to attend the PhD Dissertation Defense of Parvaz MAHDABI on Monday, June 23 at 14h30 in room SI-006 (Informatics building)
 
Abstract:
A patent is a contract between the inventor and the state, granting a limited time period to the inventor to exploit his invention. In exchange, the inventor must put a detailed description of his invention in the public domain.
Patents can encourage innovation and economic growth but at the time of economic crisis patents can hamper such growth. The long duration of the application process is a big obstacle that needs to be addressed to maximize the benefit of patents on innovation and economy. This time can be significantly improved by changing the way we search the patent and non-patent literature.

Despite the recent advancement of general information retrieval and the revolution of Web Search engines, there is still a huge gap between the emerging technologies from the research labs and adapted by major internet search engines, and the systems which are in use by the patent search communities.

In this thesis we investigate the problem of patent prior art search in patent retrieval with the goal of finding documents which describe the idea of a query patent. A query patent is a full patent application composed of hundreds of terms which does not represent a single focused information need. Other relevant evidences (e.g. classification tags, and bibliographical data) provide additional details about the underlying information need of the query patent.
 
The first goal of this thesis is to estimate a uni-gram query model from the textual fields of a query patent. We then improve the initial query representation using noun phrases extracted from the query patent. We show that expansion in a query-dependent manner is useful. The second contribution of this thesis is to address the term mismatch problem from a query formulation point of view by integrating multiple relevance evidences associated with the query patent. To do this, we enhance the initial representation of the query with the term distribution of community of inventors related to the topic of the query patent. We then build a lexicon using classification tags and show that query expansion using this lexicon and considering the proximity information (between query and expansion terms) can improve the retrieval performance. We performed an empirical evaluation of our proposed models on two patent datasets. The experimental results show that our proposed models can achieve significantly better results than the baseline and other enhanced models.

Dissertation Committee:

  • Prof. Fabio Crestani, Università della Svizzera italiana, Switzerland (Research Advisor)
  • Dr. Monica Landoni, Università della Svizzera italiana, Switzerland (Research co-Advisor)
  • Prof. Kai Hormann, Università della Svizzera italiana, Switzerland (Internal Member)
  • Prof. Evanthia Papadopoulou, Università della Svizzera italiana, Switzerland (Internal Member)
  • Prof. Gareth J. E. Jones, Dublin City University, Ireland (External Member)
  • Prof. Andreas Rauber, Vienna University of Technology, Austria (External Member)