Short Course: Ranking and labeling graphs:
Analysis of links and node attributes
Slides
We will study techniques for ranking and labeling nodes in a graph,
based on the link structure of the graph as well as attributes of the
nodes. Ranking and labeling have obvious applications in Web search
and page classification, but the range of applications is widening to
finer-grained entity-relationship graphs where nodes represent
entities like people, emails, papers, organizations and locations and
edges represent relations like works-for, wrote, cited, is-located-in
and so on. Applications also include annotating unstructured and
semistructured sources with type tags which can then be indexed for
search.
On the subject of ranking, we will cover the following topics
(two hours):
- Hyperlink induced topic search (HITS); identifying the relevant
subgraph, connections with SVD/PCA and term-document random walks
and translation models
- Pagerank; dead ends and teleport, choice of decay profiles,
large-scale computational issues
- Comparison between HITS and Pagerank; scoring and systems issues,
content focus and topic drift, topological sensitivity, score and
rank stability
- Probabilistic variants of HITS: SALSA, PHITS
- Pagerank variants; personalized Pagerank, absorbing random walks
and applications to page staleness detection, link spam detection via
trust and mistrust propagation, viral marketing, graph-based
similarity and graph fingerprints
- Adding content to HITS: vector-space model, anchor text,
HTML tag-tree
- Adding content to Pagerank: word proximity search, the
"intelligent surfer", topic-sensitive Pagerank,
ObjectRank
- Learning to rank: learning a weight vector for dot-product
scoring and ranking
- Extending to Markov walks and the conductance matrix
- Maximum entropy network flows: spotting a hidden favored
community
- Edge typed and weights in the conductance matrix
On the subject of labeling, we will cover the following topics
(1.5 hours):
- Examples of network influence: Web page topics,
social networks and purchase habits
- Applications: page classification, text tagging, entity
resolution, viral marketing
- Markov fields, cliques, potential functions, weights
- Inferencing and model estimation
- Easy cases: chains and trees
- General graphs: relaxation labeling
- Associative networks: linear and quadratic programming
relaxations
- Max-margin transudctive labeling
Only basic calculus and matrix algebra will be assumed. There
might be some small experiments using
Scilab and Java and small-scale
data sets.