Location: Room 430, Goldberg Computer Science Building, 6050 University Avenue (Building E600 on Studley Campus Map), Halifax, Nova Scotia, Canada
Time: Tuesday 1130-1300 (Thursday 1130-1300, or other times are also possible, if necessary).
The MoMiNIS Seminar Series provides a research oriented forum for prominent researchers to present their current research on the Modelling and Mining of Networked Information Spaces. The seminars are meant to appeal to a broad audience, and present both theoretical results in graph theory, machine learning and text mining, and their practical ramifications in areas such as Web mining, social network analysis, network management and security, and digital libraries.
Seminar Co-ordinators: Jeannette Janssen, Evangelos Milios, Nur Zincir-Heywood
Date&Time |
Speaker |
Topic |
Slides |
Host |
June 21, 2012 | Ricardo
Baeza-Yates Yahoo! Research, Barcelona |
Search Engines and Social Media | jj, eem |
June 23 or 24, 2012 |
Litvak University of Twente |
Extremal properties of Web graphs | jj |
June 23 or 24, 2012 |
Chayes Microsoft Research Cambridge, Mass |
A weak distributional limit for preferential attachment graphs (tentative) | jj |
TBA | Mourad Debbabi Concordia University |
Network security protocols | nzh |
Nov. 21, 2008 | Allan Borodin,
Univ. of Toronto |
Personalized Search, Community Extraction in Blog sites | jj, eem |
Mar. 3, 2009 | George
Forman, HP Labs |
What Are You Talking About? Topic Recognition via Machine Learning Text Classification and Quantification | nzh, eem |
Mar. 11, 2009 | Ellen Zegura,
Georgia Tech |
NetMark: Selecting a Benchmark of Network Topologies | jj |
Aug. 12, 2009, 2.30pm | Natasa Przulj,
Dept. of Computer Science, UC Irvine |
From Network Topology to Biological Function and Disease | jj |
Aug. 13, 2009 |
Russell Greiner,
Dept. of Computer Science, U. of Alberta |
Budgeted Learning of Probabilistic Classifiers | eem |
Dec. 10, 2009 10:30am |
Stan Matwin, |
Privacy and Data Mining: New Developments and Challenges | eem |
Feb. 18, 2010 2:30pm |
Hugh Chipman,
Acadia Univ. |
Mixed Membership Stochastic Blockmodels for multi-recipient transactions on a network (joint work with M. Mahdi Shafiei). | jj, eem |
Mar. 30, 2010, 2:30am | Aaron Clauset Santa Fe Institute |
The trouble with community detection in complex networks | jj |
Sep. 17, 2010 | Edo Airoldi Harvard University |
Modeling approaches for analyzing complex networks | jj |
March 24, 2011 | Bernie Hogan Oxford Internet Institute the University of Oxford |
Capture of online networks | ag, jj |
May 3, 2011 | Frank Tompa University of Waterloo |
Finding implicit lists and tables in web pages | eem, nzh |
May 10, 2011 | Laks V.S. Lakshmanan University of British Columbia |
Musings on Next Generation Recommender Systems | eem |
May 19, 2011 | Julita Vassileva University of Saskatchewan |
Sharing experience in Social Computing, Persuasion and Science Outreach | nzh, eem |
May 19, 2011 | Jian Pei Simon Fraser University |
Query Friendly Compression and Analysis of Social Networks Using Multi-Position Linearization | jj |
March 15, 2012 | Denilson
Barbosa University of Alberta |
Towards Summarizing and Making Sense of the Blogosphere | eem |
Text classification and quantification via machine learning
Speaker: Dr. George Forman
Hewlett Packard Labs, Port Orchard, WA
Date: Tuesday, March 3, 2009
Time: 11:30 a.m.
Location: Jacob Slonim Conference Room (430), 6050 University Ave., Halifax
In theory, practice is the same as theory, but in practice it is not. In the
process of applying proven text classification methods from the
research literature to business-driven problems at Hewlett-Packard, I have encountered
substantial failures and gaps. Investigating the
failures in detail has repeatedly led to new discoveries and perspectives for
research that are simply not afforded by the academic
benchmark datasets and problem formulations. In this talk, I will describe two
interesting applications of supervised machine learning
that we have deployed inside Hewlett-Packard, as well as the challenging, fundamental
research opportunities they have led to.
Short Biography:
George Forman is a senior research scientist at Hewlett-Packard Labs. His research
interests stem from practical issues that arise in the
application of machine learning to industrial problems, e.g. feature selection,
robustness, small training sets, and novel problem
formulations, such as quantification. His Ph.D. in Computer Science & Engineering
is from the University of Washington, Seattle, 1996
Speaker URL:
Title: NetMark: Selecting a Benchmark
of Network Topologies
Speaker : Dr. Ellen Zegura, Georgia Tech
Wednesday, March 11, 9.30am, Colloquium room, Chase building
Prof. Zegura's research work concerns the development of wide-area (Internet) networking services and, more recently, mobile wireless networking. Wide-area services are utilized by applications that are distributed across multiple administrative domains (e.g., web, file sharing, multi-media distribution). Her focus is on services implemented both at the network layer, as part of network infrastructure, and at the application layer. In the context of mobile wireless networking, she is interested in challenged environments where traditional ad-hoc and infrastructure-based networking approaches fail. These environments have been termed Disruption Tolerant Networks.
Ellen W. Zegura received the B.S. degree in Computer Science (1987), the B.S.
degree in Electrical Engineering (1987), the M.S. degree in
Computer Science (1990) and the D.Sc. in Computer Science (1993) all from Washington
University, St. Louis, Missouri. Since 1993, she has been on the faculty in
the College of Computing at Georgia Tech. She was an Assistant Dean in charge
of Space and Facilities Planning from Fall 2000 to January 2003. She served
as Interim Dean of the College for six months in 2002. Since February 2003,
she has been an Associate Dean, with responsibilities ranging from Research
and Graduate Programs to Space and Facilities Planning. She has spent five years
as the user representative in the planning of the Klaus Advanced Computing Technologies
Building, scheduled to open in Fall 2006. Starting in August 2005, she has chaired
the Computing Science and Systems Division of the College of Computing. She
is the proud mom of two girls, Carmen (born in August 1998) and Bethany (born
in May 2001).
Title: From Network Topology to
Biological Function and Disease
Speaker: Natasa Przulj, Dept. of Computer Science, UC Irvine
Date: August 12, 2.30pm
We discuss our new tools that are advancing network analysis towards a theoretical
understanding of the structure of biological networks. Analogous to tools for
analyzing and comparing genetic sequences, we are developing new tools that
decipher large network data sets, with the goal of improving biological understanding
and contributing to development of new therapeutics. We demonstrate that local
node similarity corresponds to similarity in biological function and involvement
in disease. We introduce a systematic highly constraining measure of a network's
local structure and demonstrate that protein-protein interaction (PPI) networks
are better modeled by geometric graphs than by any previous model. The geometric
model is further corroborated by demonstrating that PPI networks can explicitly
be embedded into a low-dimensional geometric space. We also present a new network
alignment algorithm.
Dr.Przulj is an Assistant Professor in the Department of Computer Science, UC
Irvine. She is also a member of the UCI Cancer Center, the UCI Center for Complex
Biological Systems (CCBS), the UCI's program in Mathematical, Computational
and Systems Biology (MCSB), and the UCI’s Institute for Genomics and Bioinformatics
(IGB). She received an NSF CAREER award for 2007-2011. She is on the Editorial
Review Board of the International Journal of Knowledge Discovery in Bioinformatics
(IJKDB). Dr. Przulj's research involves applications of graph theory, mathematical
modeling, and computational techniques to solving large-scale problems in computational
and systems biology. I am interested in computational and theoretical solutions
to practical problems in many areas of systems biology, planar cell polarity,
proteomics, cancer informatics, and chemo-informatics.
Title:Budgeted Learning of Probabilistic
Speaker: Russell Greiner, Dept. of Computer Science, Un. of Alberta
Researchers often use clinical trials to collect the data needed to evaluate
some hypothesis, or produce a classifier. During this process, they have to
pay the cost of performing each test. Many studies will run a comprehensive
battery of tests on each subject, for as many subjects as their budget will
allow -- ie, "round robin" (RR). We consider a more general model,
where the researcher can sequentially decide which single test to perform on
which specific individual; again subject to spending only the available funds.
Our goal here is to use these funds most effectively, to collect the data that
allows us to learn the most accurate classifier.
We first explore the simplified "coins version" of this task. After
observing that this is NP-hard, we consider a range of heuristic algorithms,
both standard and novel, and observe that our "biased robin" approach
is both efficient and much more effective than most other approaches, including
the standard RR approach. We then apply these ideas to learning a naive-bayes
classifier, and see similar behavior. Finally, we consider the most realistic
model, where both the researcher gathering data to build the classifier, and
the user (eg, physician) applying this classifier to an instance (patient) must
pay for the features used --- eg, the researcher has $10,000 to acquire the
feature values needed to produce an optimal $30/patient classifier. Again, we
see that our novel approaches are almost always much more effective that the
standard RR model.
This is joint work with Aloak Kapoor, Dan Lizotte and Omid Madani.
After earning a PhD from Stanford, Russ Greiner worked in both academic and
industrial research before settling at the University of Alberta, where he is
now a Professor in Computing Science and the founding Scientific Director of
the Alberta Ingenuity Centre for Machine Learning, which won the ASTech Award
for "Outstanding Leadership in Technology" in 2006. He has been Program
Chair for the 2004 "Int'l Conf. on Machine Learning", Conference Chair
for 2006 "Int'l Conf. on Machine Learning", Editor-in-Chief for "Computational
Intelligence", and is serving on the editorial boards of a number of other
journals. He was elected a Fellow of the AAAI (Association for the Advancement
of Artificial Intelligence) in 2007, and was awarded a McCalla Professorship
in 2005-06 and a Killam Professorship in 2007. He has published over 100 refereed
papers and patents, most in the areas of machine learning and knowledge representation.
The main foci of his current work are (1) bioinformatics and medical informatics;
(2) learning effective probabilistic models and (3) formal foundations of learnability.
Title: Privacy and Data Mining:
New Developments and Challenges
Speaker: Stan Matwin, University of Ottawa
Privacy and Data Mining: New Developments and Challenges
There is little doubt that data mining technologies create new challenges in
the area of data privacy. In this talk, we will review some of the new developments
in Privacy-preserving Data Mining. In particular, we will discuss techniques
in which data mining results can reveal personal data, and how this can be prevented.
We will look at the practically interesting situations where data to be mined
is distributed among several parties. We will mention new applications in which
spatio-temporal data can lead to identification of personal information. We
will argue that methods that effectively protect personal data, while at the
same time preserve the quality of the data from the data analysis perspective,
are some of the principal new challenges before the field.
Title: Mixed-Membership Stochastic
Block-Models for Transactional Data
Speaker: Hugh Chipman, Acadia University
Time: Thursday, February 18, 2.30pm
Transactional network data arise in many fields. Although social network models
have been applied to transactional data, these models typically assume binary
relations between pairs of nodes. We develop a latent mixed membership model
capable of modelling richer forms of transactional data. Estimation and inference
are accomplished via a variational EM algorithm. Simulations indicate that the
learning algorithm can recover the correct generative model. We further present
results on a subset of the Enron email dataset. This is joint work with Mahdi
About the speaker: Dr. Hugh Chipman is a Canada Research Chair in Mathematical
Modelling at Acadia University, and the director of the Acadia Centre for Mathematical
Modelling and Computations. His research focuses on statistical models for extracting
information from such large and complex datasets. He completed his PhD studies
at the University of Waterloo in 1994, and held a faculty position at the University
of Chicago before moving to Acadia. In 2009, he was awarded the CRM-SSC Prize
for his outstanding contributions to the application of Bayesian statistical
inference for data analysis.
Title: The trouble with community
detection in complex networks
Speaker: Aaron Clauset, Santa Fe institute
Date: Tuesday, March 30, 2.30pm
Abstract: Although widely used in practice, the performance of the popular network
clustering technique called "modularity
maximization" is not well understood when applied to networks with unknown
modular structure. In this talk, I'll show that precisely in
the case we want it to perform the best---that is, on modular networks---the
modularity function Q exhibits extreme degeneracies,
in which the global maximum is hidden among an exponential number of high-modularity
solutions. Further, these degenerate solutions can
be structurally very dissimilar, suggesting that any particular high- modularity
partition, or statistical summary of its structure,
should not be taken as representative of the other degenerate solutions. These
results partly explain why so many heuristics do
well at finding high-modularity partitions and why different heuristics can
disagree on the modular composition the same network.
I'll conclude with some forward-looking thoughts about the general problem of
identifying network modules from connectivity data alone,
and the likelihood of circumventing this degeneracy problem.
Title: Modeling approaches for
analyzing complex networks
Speaker: Edo Airoldi, Department of Statistics, FAS Center for Systems Biology,
Faculty of Arts & Sciences, Harvard University
Date: Friday Sept 17, 2010, 9:30 a.m.
Abstract: Networks are ubiquitous in science and have become a focal point
for discussion in everyday life. Formal statistical models for the
analysis of network data have emerged as a major topic of interest in diverse
areas of study, and most of these involve a collections of
measurements on pairs of objects. Probability models on graphs date back to
1959. Along with empirical studies in social psychology and sociology
from the 1960s, these early works generated an active “network community”
and a substantial literature in the 1970s. This effort moved
into the statistical literature in the late 1970s and 1980s, and the past decade
has seen a burgeoning network literature in statistical
physics and computer science. The growth of the World Wide Web and the emergence
of online “networking communities” such as Facebook and
LinkedIn, and a host of more specialized professional network communities has
intensified interest in the study of networks and
network data. In this talk, I will review a few ideas that are central to this
burgeoning literature, placing emphasis on modeling approaches
available for data analysis, and review some of the recent work that is going
on in my group.
Speaker Bio: In December 2006, Dr. Airoldi received a Ph.D. from Carnegie Mellon,
working on statistical machine learning and the
analysis of complex systems with Stephen Fienberg and Kathleen Carley. His dissertation
introduced statistical and computational elements of graph theory that support
data analysis of complex systems and their evolution. Till December 2008, he
was a postdoctoral fellow in the Lewis-Sigler Institute for Integrative Genomics
of Princeton University working with Olga Troyanskaya, David Botstein, and James
Broach. He developed mechanistic models to gain computational insights into
aspects of the molecular and cellular biology that are not directly observable
with experimental probes. He has been working closely with biologists and in
the areas of cellular differentiation, cellular development and cancer, since.
Speaker URL:
Title: Facebook as a data capture
site: Techniques, Traps, Terms and Conditions
This talk will give an overview of the sorts of social network data that are
accessible through the Facebook API and some of issues that come with downloading
and processing this data. In the first part of the talk, I review several pieces
of software that allow for the download and capture of social networks, including
NodeXL, NetVizz, NameGenWeb, iGraph and Pajek. I walk through different routines
and cover efficiency through FQL queries. The talk will also walk through three
recent examples of privacy leaks with the Facebook data (The "Taste, Ties
and Time" data set, Pete Warden's open profiles data and the Oxford 100
schools data set) and how privacy issues inhibited their full use. I tie this
to the evolving developer terms of use on Facebook, as well as some of the other
emergent API issues (such as Twitter's recent decision to no longer whitelist
accounts). My intention is to end the talk by reinforcing the importance of
careful and minimal data collection efforts rather than a cavalier approach
indifferent to the risks of real world data. I also wish to make an appeal to
technical fields whose ethics procedures tend to be inadequate for this sort
of semi-private and sensitive data.
Slides -- Slides from March 25 seminar in the Social Media Lab
Bernie Hogan is a Research Fellow at the Oxford Internet Institute. He specializes
in novel methods for online data capture and analysis,
especially via social media. Recent work has focused on the capture analysis
of Facebook networks, particularly through his application
namegenweb, which downloads a social network for visualization in network programs
such as NodeXL. Past work included an online audit
study of racism on Craigslist, pen and paper methods for visualizing social
networks, the analysis of profile photos and techniques for
online surveys of spouses and partners. Bernie received his dissertation from
the University of Toronto in 2009 under Barry
Wellman. This thesis won the Dordick award for Best Dissertation from the Communication
and Technology section of the International
Communication Association.
Speaker's contact info:
Dr Bernie Hogan
Research Fellow, Oxford Internet Institute
University of Oxford
Title: Towards Summarizing and
Making Sense of the Blogosphere
Speaker: Prof. Denilson Barbosa
Department of Computing Science, Univ. of Alberta
Date: Thursday March 15, 2012
Time: 2:40 p.m.
Location: Jacob Slonim Conference Room (430), Computer Science
Dalhousie University
6050 University Avenue, Halifax
The extraction of structured information from text is a fast improving subfield
of Natural Language Pro- cessing which has been re-invigorated with the ever-increasing
availability of user-generated textual content online. One environment which
stands out as a source of invaluable information is the blogosphere–the
network of social media sites, in which individuals express and discuss opinions,
facts, events, and ideas pertaining to their own lives, their community, profession,
or society at large. Indeed, the automatic extraction of reliable information
from the blogosphere promises a viable approach for discovering very rich social
data: the issues that engage society in thousands of collective and parallel
conversations online. Considerable attention has been given to the problem of
automatically extracting and studying the social dynamics among the participants
(i.e., authors) in shared environments like the blogosphere. In that line of
work, the goal is to understand how the network of humans conversing in the
blogosphere is formed, evolves over time, and influences others in their own
opinions. Our goal, on the other hand, is to extract the network of entities,
facts, ideas and opinions expressed in social media sites, as well as the relationships
among them. Such structured data can be organized as one or more information
networks, which in turn are powerful metaphors for the study and visualization
of various kinds of complex systems. In this talk, I will cover the basic NLP
tools that are necessary for automatically extracting information networks from
social media text, relying to a large extent on the experiences gathered on
our ongoing SONEX project.
Speaker Bio:
I am an Associate Professor at the University of Alberta, where I joined in
2008. I completed my Ph.D. at the University of Toronto in 2005, working on
XML data management and took an academic job at the University of Calgary between
2005 and 2008. I am interested in databases on their own merit, and also on
the application of database and information retrieval principles to the management
of linked data. I am a member of the NSERC Strategic Network on Business Intelligence,
where I work on information extraction, and the Canadian Writing Research Collaboratory,
where I work text mining, data management for prosopography, and document engineering.
Speaker url:
Host: Evangelos Milios