Project Title: | Improved Web search methods |
Participants: | Allan Borodin, Hyun Chul Lee |
Project Description: | Current search engines make heavy
use of graph-theoretic methods to rank the results retrieved
after a Web search. PageRank, the ranking method used by Google,
and the Hub-Authority method used by Teoma, derive their ranking
from an iterative process where the rank a web page is updated
in each step according to the rank of the pages it links to, or
that link to it. The general idea behind these methods is that a
hyperlink to a page is like a vote of confidence in that page.
Hence, a page with many in-links will be more important, and if
the pages pointing to it have a high ranking themselves, then
the page will be even more important. The effectiveness of such
methods has been persuasively demonstrated by the success of
Google. However, many questions regarding the limits of their
performance remain open The first phase of our project focused notions of stability, convergence, and similarity between methods. Mathematically precise definitions of these concepts were formulated, and an analysis of the most common ranking methods based on these concepts has been performed. In the future, we will extend this work to a probabilistic analysis utilizing some of the probabilistic graph models being proposed in the literature, and also being studied in the context of this project. More generally, we are considering the question: what makes a search query hard for current search engines? In particular, we will attempt to define classes of information needs that are hard to resolve using current search engines. Defining such classes may then lead to new methods that make such information needs easier to resolve. |