Preview

Collaborative Hierarchical Clustering in the Browser for Scatter/Gather on the Web

Powerful Essays
Open Document
Open Document
4833 Words
Grammar
Grammar
Plagiarism
Plagiarism
Writing
Writing
Score
Score
Collaborative Hierarchical Clustering in the Browser for Scatter/Gather on the Web
Collaborative Hierarchical Clustering in the Browser for
Scatter/Gather on the Web
Weimao Ke and Xuemei Gong
Laboratory for Information, Network & Computing Studies
College of Information Science and Technology
Drexel University, 3141 Chestnut St, Philadelphia, PA 19104

wk@drexel.edu, xg45@drexel.edu
ABSTRACT
Scatter/Gather is a powerful browsing model for exploratory information seeking. However, its potential on the web scale has not been demonstrated due to scalability challenges of interactive clustering. We have developed in previous research a two-stage method to support on-the-fly Scatter/Gather, in which an offline module pre-computes a hierarchical structure to support constant time on-line interaction. In this work, we focus on the offline hierarchy construction and develop a novel distributed approach to hierarchical agglomerative clustering (HAC). Relying on Javascript that is commonly supported by browsers, the distributed clustering method has the potential to scale with growing traffics of a site. We show in experiments that a moderate increase in the number of parallel processes (in visitors’ browsers) leads to a dramatic decrease of clustering time. This demonstrates great potentials in supporting large-scale Scatter/Gather interactions on the web. We present preliminary analysis of clustering effectiveness and a related Scatter/Gather prototype for web search.

Keywords text clustering, Scatter/Gather, distributed computing, parallel clustering, browser server, Javascript, interactive information retrieval, exploratory search

can be conducted without explicit query specification
(Cutting et al., 1992). Based on iterative user selection and interactive text clustering, Scatter/Gather offers a powerful tool for navigating a large, complex information space. It enables the user to explore inherent associations among documents and topics in the data, supporting learning and investigation (Hearst and Pedersen, 1996).
However,



References: Arthur, H. (2012). (1998). Learning to extract symbolic knowledge from the world wide web Gong, X., Khare, R., and Ke, W. (2012). (1995). Scatter/Gather as a tool for the navigation of retrieval results Hearst, M. A. and Pedersen, J. O. (1996). Reexamining the cluster hypothesis: Scatter/Gather on retrieval Jain, A. K., Murty, M. N., and Flynn, P. J. (1999). Data clustering: a review Ke, W., Mostafa, J., and Liu, Y. (2008). Toward responsive visualization services for scatter/gather browsing. Proceedings of the American Society for Information Science and Technology, 45(1):1–10. Ke, W., Sugimoto, C. R., and Mostafa, J. (2009). Dynamicity vs. effectiveness: Studying online clustering for scatter/gather Lovins, J. B. (1968). Development of a stemming algorithm. Mechanical Translation and Computational Linguistics, 11:22–31. Manning, C. D., Raghavan, P., and Sch¨ tze, H. (2008). Witten, I. H. and Frank, E. (2005). Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann, San Francisco, 2nd edition.

You May Also Find These Documents Helpful

  • Satisfactory Essays

    In effect, in a social network there are online and offline communities of people with similar interests.…

    • 1646 Words
    • 6 Pages
    Satisfactory Essays
  • Powerful Essays

    data is widely available and there is an imminent need for turning such data into useful information. This need is fulfilled by the process of exploration and analysis, by automatic or semi-automatic means, of large quantities of data provided by Data Mining. In case of a single system with few processors, there are restrictions on the speed of processing as well as the size of the data that can be processed at a time. The speed as well as the limit on the size of the data to be processed can be increased if data mining is carried out in parallel fashion with the help of the coordinated systems connected in LAN. But the problem with this solution is that LAN is not elastic, i.e. the number of systems in which the work is to be distributed on basis of the size of the data to be processed cannot be changed. Our main aim is to distribute data to be analyzed in various nodes in cloud. For optimum data distribution and efficient data mining as per user’s desire, various algorithms must be implemented.…

    • 3006 Words
    • 13 Pages
    Powerful Essays
  • Good Essays

    UK Internet Groth Statistics: Top 5 Trends. (2010, December 26). Retrieved October 15, 2011, from…

    • 717 Words
    • 3 Pages
    Good Essays
  • Powerful Essays

    path to working clusters - with the hope to make it less thorny for those who follow. Seriously, folks,…

    • 3076 Words
    • 13 Pages
    Powerful Essays
  • Good Essays

    The result of a cluster analysis is the formation of a number of groups. The members of…

    • 8577 Words
    • 35 Pages
    Good Essays
  • Satisfactory Essays

    For a typical mid-size deployment, for example, you can deploy lightweight versions of Splunk, called forwarders, on the machines where the data originates. The forwarders consume data locally, and then forward it across the network to another Splunk component, called the indexer. The indexer does the heavy lifting; it indexes the data and runs searches. It should reside on a machine by itself.…

    • 457 Words
    • 2 Pages
    Satisfactory Essays
  • Better Essays

    A key feature of the new social computing trends is the use of easy-to-use, lightweight, mostly open-source computing tools. Examples include blogs, wikis, social bookmarking, peer-to-peer networks, open source communities, photo and video sharing communities, and online business networks. Many of the popular online networks have been growing dramatically; with the most spectacular examples being Facebook and YouTube, “each of which have attracted significantly high investments from leading players in the industry; both the growth and the high profile investments resemble events from the dot-com era” (Schneider, 2006, p. 16). It is important to note that despite being lightweight and mostly free, these tools do not compromise quality, and indeed many enterprise computing applications do make use of them in demanding environments.…

    • 1230 Words
    • 5 Pages
    Better Essays
  • Good Essays

    Interaction is the communication between a user and some device or system. Frameworks provide a way to create what the interaction between the user and a device should be. It allows us to test and resolve any issues during this interaction process. Testing can be performed as whole and not just as individual components (Helm, 2008).…

    • 885 Words
    • 4 Pages
    Good Essays
  • Powerful Essays

    Technology in the Office

    • 2154 Words
    • 9 Pages

    many computers linked together in a single site. This is as opposed to a wide…

    • 2154 Words
    • 9 Pages
    Powerful Essays
  • Powerful Essays

    GROUP RECOMMENDATION USING EXTERNAL FOLLOWEE FOR SOCIAL TV XiaoyanWang1, Lifeng Sun1, ZhiWang1 and Da Meng2 1 Department of Computer Science and Technology, Tsinghua University, Beijing, China Department of Computer Science and Technology, Beijing University of Posts and Telecommunications, Beijing, China 1 muyushiok@gmail.com, 1sunlf@tsinghua.edu.cn, 1wangzhi04@mails.tsinghua.edu.cn, 2mengda0710@126.com 2 Abstract—Group recommendation plays a significant role in Social TV systems, where online friends form into temporary groups to enjoy watching video together and interact with each other. Online microblogging systems introduce the "following" relationship that reflects the common interests between users in a group and external representative followees outside the group. Traditional group recommendation only considers internal group members’ preferences and their relationship. In our study, we measure the external followees’ impact on group interest and establish group preference model based on external experts’ guidance for group recommendation. In addition, we take advantage of the current watching video to improve context-aware recommendations.…

    • 4498 Words
    • 18 Pages
    Powerful Essays
  • Good Essays

    wide and thin as we connect to that vast network of information accessed by the touch of…

    • 916 Words
    • 3 Pages
    Good Essays
  • Good Essays

    Skinput Technology

    • 8326 Words
    • 34 Pages

    Chris Harrison1,2, Desney Tan2, Dan Morris2 Human-Computer Interaction Institute Carnegie Mellon University 5000 Forbes Avenue, Pittsburgh, PA 15213 chris.harrison@cs.cmu.edu…

    • 8326 Words
    • 34 Pages
    Good Essays
  • Powerful Essays

    Joeran Beel, Stefan Langer, Bela Gipp, and Andreas Nürnberger. 2014. The Architecture and Datasets of Docear’s Research Paper Recommender System. In Proceedings…

    • 6703 Words
    • 27 Pages
    Powerful Essays
  • Satisfactory Essays

    grapevine

    • 386 Words
    • 2 Pages

    The cluster like grapes, have several groups of people linked together by a cluster or chain of communication.…

    • 386 Words
    • 2 Pages
    Satisfactory Essays
  • Satisfactory Essays

    Today, the term "Global Village" can be used to describe the Internet and World Wide Web.[citation needed] On the Internet, physical distance is even less of a hindrance to the real-time communicative activities of people, and therefore social spheres are greatly expanded by the openness of the web and the ease at which people can search for online communities and interact with others who share the same interests and concerns. Therefore, this technology fosters the idea of a conglomerate…

    • 884 Words
    • 4 Pages
    Satisfactory Essays

Related Topics