Collaborative Hierarchical Clustering in the Browser for Scatter/Gather on the Web

Topics: Cluster analysis, Hierarchy, Web page Pages: 26 (4833 words) Published: January 13, 2013
Collaborative Hierarchical Clustering in the Browser for
Scatter/Gather on the Web
Weimao Ke and Xuemei Gong
Laboratory for Information, Network & Computing Studies
College of Information Science and Technology
Drexel University, 3141 Chestnut St, Philadelphia, PA 19104,
Scatter/Gather is a powerful browsing model for exploratory information seeking. However, its potential on the web scale has not been demonstrated due to
scalability challenges of interactive clustering. We have
developed in previous research a two-stage method to
support on-the-fly Scatter/Gather, in which an offline
module pre-computes a hierarchical structure to support constant time on-line interaction. In this work, we focus on the offline hierarchy construction and develop
a novel distributed approach to hierarchical agglomerative clustering (HAC). Relying on Javascript that is commonly supported by browsers, the distributed clustering method has the potential to scale with growing traffics of a site. We show in experiments that a moderate increase in the number of parallel processes (in visitors’ browsers) leads to a dramatic decrease of clustering time. This demonstrates great potentials in supporting large-scale Scatter/Gather interactions on the web. We present preliminary analysis of clustering effectiveness and a related Scatter/Gather prototype for web search.

text clustering, Scatter/Gather, distributed computing,
parallel clustering, browser server, Javascript, interactive information retrieval, exploratory search

can be conducted without explicit query specification
(Cutting et al., 1992). Based on iterative user selection
and interactive text clustering, Scatter/Gather offers
a powerful tool for navigating a large, complex information space. It enables the user to explore inherent associations among documents and topics in the data,
supporting learning and investigation (Hearst and Pedersen, 1996). However, major challenges associated with clustering
efficiency and scalability have hindered the adoption of
Scatter/Gather in IR practice. In particular, many clustering algorithms are computationally complex. Even efficient classic methods such as k-means are of linear
time complexity, far from efficient to support on-the-fly
clustering on a large number of documents. The use of
Scatter/Gather for web browsing is desirable but practically challenging because of the web’s scale and dynamics. Until we can properly address these challenges, real-world applications of Scatter/Gather are unlikely

to emerge.
Notwithstanding its great potential in interactive IR,
Scatter/Gather research has so far focused on rather
small data collections. Its efficiency and effectiveness on the web scale remain unaddressed. The research aims
to study scalable approaches to interactive clustering.
A major objective is to identify a scalable clustering architecture that can support Scatter/Gather interactions on the evolving web. Ultimately this will lead to new
development of web browsing techniques.

Information retrieval (IR) systems such as web search
engines play important roles in connecting people with
information. While searching is a widely accepted approach to finding information, browsing represents another basic IR paradigm. Among classic browsing models, Scatter/Gather is a unique approach in which searches

This is the space reserved for copyright notices.
ASIST 2012, October 26–30, 2012, Baltimore, MD, USA.
Copyright notice continues right here.

Scatter/Gather is a highly interactive model for collection browsing and information retrieval based on text clustering (Cutting et al., 1992). It supports progressive
query specification through user-system interaction and
clustering. In each Scatter/Gather iteration, the system
presents to the user a set of clusters (topical groups
of documents) in the information collection. The user
then picks one or...

References: Arthur, H. (2012).
(1998). Learning to extract symbolic knowledge
from the world wide web
Gong, X., Khare, R., and Ke, W. (2012).
(1995). Scatter/Gather as a tool for the navigation of
retrieval results
Hearst, M. A. and Pedersen, J. O. (1996). Reexamining
the cluster hypothesis: Scatter/Gather on retrieval
Jain, A. K., Murty, M. N., and Flynn, P. J. (1999). Data
clustering: a review
Ke, W., Mostafa, J., and Liu, Y. (2008). Toward responsive visualization services for scatter/gather browsing. Proceedings of the American Society for Information Science and Technology, 45(1):1–10.
Ke, W., Sugimoto, C. R., and Mostafa, J. (2009). Dynamicity vs. effectiveness: Studying online clustering
for scatter/gather
Lovins, J. B. (1968). Development of a stemming algorithm. Mechanical Translation and Computational
Linguistics, 11:22–31.
Manning, C. D., Raghavan, P., and Sch¨ tze, H. (2008).
Witten, I. H. and Frank, E. (2005). Data Mining: Practical machine learning tools and techniques. Morgan
Kaufmann, San Francisco, 2nd edition.
Continue Reading

Please join StudyMode to read the full document

You May Also Find These Documents Helpful

  • web-browser Research Paper
  • Web Browser Research Paper
  • It220 Web Browsers Research Paper
  • The Evolution of Web Browsers Essay
  • Web Browser Essay
  • Web Browser Essay

Become a StudyMode Member

Sign Up - It's Free