Short Course: Ranking and labeling graphs: Analysis of links and node attributes

Instructor: Soumen Chakrabarti, IIT Bombay

We will study techniques for ranking and labeling nodes in a graph, based on the link structure of the graph as well as attributes of the nodes. Ranking and labeling have obvious applications in Web search and page classification, but the range of applications is widening to finer-grained entity-relationship graphs where nodes represent entities like people, emails, papers, organizations and locations and edges represent relations like works-for, wrote, cited, is-located-in and so on. Applications also include annotating unstructured and semistructured sources with type tags which can then be indexed for search.

On the subject of ranking, we will cover the following topics (two hours):

Hyperlink induced topic search (HITS); identifying the relevant subgraph, connections with SVD/PCA and term-document random walks and translation models
Pagerank; dead ends and teleport, choice of decay profiles, large-scale computational issues
Comparison between HITS and Pagerank; scoring and systems issues, content focus and topic drift, topological sensitivity, score and rank stability
Probabilistic variants of HITS: SALSA, PHITS
Pagerank variants; personalized Pagerank, absorbing random walks and applications to page staleness detection, link spam detection via trust and mistrust propagation, viral marketing, graph-based similarity and graph fingerprints
Adding content to HITS: vector-space model, anchor text, HTML tag-tree
Adding content to Pagerank: word proximity search, the "intelligent surfer", topic-sensitive Pagerank, ObjectRank
Learning to rank: learning a weight vector for dot-product scoring and ranking
Extending to Markov walks and the conductance matrix
Maximum entropy network flows: spotting a hidden favored community
Edge typed and weights in the conductance matrix

On the subject of labeling, we will cover the following topics (1.5 hours):

Examples of network influence: Web page topics, social networks and purchase habits
Applications: page classification, text tagging, entity resolution, viral marketing
Markov fields, cliques, potential functions, weights
Inferencing and model estimation
Easy cases: chains and trees
General graphs: relaxation labeling
Associative networks: linear and quadratic programming relaxations
Max-margin transudctive labeling

Only basic calculus and matrix algebra will be assumed. There might be some small experiments using Scilab and Java and small-scale data sets.