Treffer: Regularized query classification using search click information

Title:
Regularized query classification using search click information
Source:
Pattern recognition. 41(7):2283-2288
Publisher Information:
Oxford: Elsevier Science, 2008.
Publication Year:
2008
Physical Description:
print, 14 ref
Original Material:
INIST-CNRS
Document Type:
Fachzeitschrift Article
File Description:
text
Language:
English
Author Affiliations:
Yahoo! Research Labs, 3333 Empire Avenue, Burbank, CA 91504, United States
ISSN:
0031-3203
Rights:
Copyright 2008 INIST-CNRS
CC BY 4.0
Sauf mention contraire ci-dessus, le contenu de cette notice bibliographique peut être utilisé dans le cadre d’une licence CC BY 4.0 Inist-CNRS / Unless otherwise stated above, the content of this bibliographic record may be used under a CC BY 4.0 licence by Inist-CNRS / A menos que se haya señalado antes, el contenido de este registro bibliográfico puede ser utilizado al amparo de una licencia CC BY 4.0 Inist-CNRS
Notes:
Computer science; theoretical automation; systems

Telecommunications and information theory
Accession Number:
edscal.20268001
Database:
PASCAL Archive

Weitere Informationen

Hundreds of millions of users each day submit queries to the Web search engine. The user queries are typically very short which makes query understanding a challenging problem. In this paper, we propose a novel approach for query representation and classification. By submitting the query to a web search engine, the query can be represented as a set of terms found on the web pages returned by search engine. In this way, each query can be considered as a point in high-dimensional space and standard classification algorithms such as regression can be applied. However, traditional regression is too flexible in situations with large numbers of highly correlated predictor variables. It may suffer from the overfitting problem. By using search click information, the semantic relationship between queries can be incorporated into the learning system as a regularizer. Specifically, from all the functions which minimize the empirical loss on the labeled queries, we select the one which best preserves the semantic relationship between queries. We present experimental evidence suggesting that the regularized regression algorithm is able to use search click information effectively for query classification.