Preview

A Parameterized Approach to Spam-Resilient Link Analysis of the Web

Powerful Essays
Open Document
Open Document
13573 Words
Grammar
Grammar
Plagiarism
Plagiarism
Writing
Writing
Score
Score
A Parameterized Approach to Spam-Resilient Link Analysis of the Web
1422

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS,

VOL. 20,

NO. 10,

OCTOBER 2009

A Parameterized Approach to Spam-Resilient
Link Analysis of the Web
James Caverlee, Member, IEEE, Steve Webb, Member, IEEE,
Ling Liu, Senior Member, IEEE, and William B. Rouse, Fellow, IEEE
Abstract—Link-based analysis of the Web provides the basis for many important applications—like Web search, Web-based data mining, and Web page categorization—that bring order to the massive amount of distributed Web content. Due to the overwhelming reliance on these important applications, there is a rise in efforts to manipulate (or spam) the link structure of the Web. In this manuscript, we present a parameterized framework for link analysis of the Web that promotes spam resilience through a source-centric view of the
Web. We provide a rigorous study of the set of critical parameters that can impact source-centric link analysis and propose the novel notion of influence throttling for countering the influence of link-based manipulation. Through formal analysis and a large-scale experimental study, we show how different parameter settings may impact the time complexity, stability, and spam resilience of Web link analysis. Concretely, we find that the source-centric model supports more effective and robust rankings in comparison with existing Web algorithms such as PageRank.
Index Terms—Internet search, information search and retrieval, information storage and retrieval, information technology and systems, distributed systems, systems and software, Web search, general, Web-based services, online information services.

Ç
1

INTRODUCTION

T

HE Web is arguably the most massive and successful distributed computing application today. Millions of
Web servers support the autonomous sharing of billions of
Web pages. From its earliest days, the Web has been the subject of intense focus for organizing, sorting, and understanding its massive amount of data.



References: Statistics,” Proc. Seventh Int’l Workshop the Web and Databases (WebDB), 2004. First Int’l Workshop Adversarial Information Retrieval on the Web (AIRWeb), 2005. [3] C. Mann, “Spam þ Blogs ¼ Trouble,” Wired, 2006. [4] J.M. Kleinberg, “Authoritative Sources in a Hyperlinked Environment,” J. ACM, vol. 46, no. 5, 1999. Stanford Univ., 1998. Conf. Data Mining (ICDM), 2001. technical report, Stanford Univ., 2003. World Wide Web Conf. (WWW), 2004. World Wide Web Conf. (WWW), 2004. Patterns,” Proc. 14th ACM Conf. Hypertext and Hypermedia, 2003. Proc. 15th Int’l World Wide Web Conf. (WWW), 2006. Data Bases (VLDB), 2004. Principles of Distributed Computing (PODC), 2007. 14th Int’l World Wide Web Conf. (WWW), 2005. Wide Web Conf. (WWW), 2002. Conf. Web Intelligence (WI), 2005. Interest Group on Information Retrieval (SIGIR), 2005. 31st Int’l Conf. Very Large Data Bases (VLDB), 2005. World Wide Web Conf. (WWW), 2007. [30] M. Kendall and J.D. Gibbons, Rank Correlation Methods. Edward Arnold, 1990. (SIGIR), 2001. Technology, vol. 2, no. 3, 2002. Data Bases (VLDB), 2004. (ASLIB), vol. 56, no. 1, 2004. (SIGIR), 2004.

You May Also Find These Documents Helpful

  • Good Essays

    The Domain Name System creates it likely to allocate domain terms to crowds of Internet users in an expressive way, liberated of each user's physical site. Because of this, World-Wide Web hyperlinks and Internet contact info can continue reliable and endless smooth if the present Internet direction-finding preparations change or the member uses a portable device. Internet domain names are at ease to recall than IP addresses. Persons take benefit of this once they narrate expressive URLs and e-mail addresses without having to see how the mechanism will really find them.…

    • 453 Words
    • 2 Pages
    Good Essays
  • Good Essays

    Reiter, A. (2008, 2 5). Internet Evolution. Retrieved 12 5, 2010, from Internet Evolution: http://www.internetevolution.com/author.asp?section_id=526&doc_id=144810…

    • 879 Words
    • 4 Pages
    Good Essays
  • Satisfactory Essays

    Mat 540 Quiz

    • 819 Words
    • 4 Pages

    Which of the following refers to developing useful information from the links included in the Web documents?…

    • 819 Words
    • 4 Pages
    Satisfactory Essays
  • Good Essays

    Email Bomb Attacks

    • 102 Words
    • 1 Page

    One variation on the mail bomb automatically subscribes a targeted user to hundreds or thousands of high volume Internet mailing lists, which fill the user’s mailbox and / or mail server. Bombers call this attack list linking. Examples of these mail bomb programs comprises of Unabomber, Extreme Mail, Avalanche, Voodoo, and Kaboom.…

    • 102 Words
    • 1 Page
    Good Essays
  • Better Essays

    Leadership Analysis Paper

    • 1468 Words
    • 6 Pages

    Sergey Brin; Lawrence Page (1998). "The Anatomy of a Large-Scale Hypertextual Web Search Engine". Stanford University. Stanford University. Retrieved 01 March 2014…

    • 1468 Words
    • 6 Pages
    Better Essays
  • Powerful Essays

    mine the most relevant results in the index. Although the precise workings of these algorithms are kept at least as secret as Coca-Cola’s formula they are usually based on two main functions: keyword analysis (for evaluating pages along such dimensions as frequency of specific words) and link analysis (based on the number of times a page is linked to from other sites and the rank of these other sites) (see Figure 1).…

    • 4479 Words
    • 18 Pages
    Powerful Essays
  • Best Essays

    3. Bidgol H., “The Internet Encyclopedia”, Volume 3, 2004, J Wiley and Sons, New Jersey…

    • 3847 Words
    • 16 Pages
    Best Essays
  • Satisfactory Essays

    Fire Truck Crash

    • 318 Words
    • 2 Pages

    A high percentage of users follow unknown links, which can lead to a malicious website. Malicious…

    • 318 Words
    • 2 Pages
    Satisfactory Essays
  • Powerful Essays

    Humanities Course Paper

    • 1596 Words
    • 7 Pages

    Through the past several decades the advancement of technology has evolved. Among one of the advancements was the Internet. The Internet is a worldwide of networks connecting millions of computers. Through the Internet countries are able to exchange data, news and opinions. It started in the 1960’s when the internet was originally being used for government which later evolved to the world (Computer history museum, 2006). Over the past forty years the internet has changed technology of computers and how the world communicates, online banking, social networking, and online shopping. The Internet we know today grew from seeds planted by the U.S. Government. The Department of Defense issued a twenty thousand dollar contract on December 6, 1967 for the purpose of studying the design and specification of a computer network (Internet History from ARPANET to Broadband, 2007). It was conceived by the Advanced Research Projects Agency (ARPA) of the U.S. government in 1969 and was first known as the ARPANet (Computer history museum, 2006). The ARPA laid the groundwork which later became the internet. By 1992 the Internet has one million hosts (Ganna, 2006). Through the years the Internet has changed the way people live and run businesses. My project will walk through the discovery and evolution of the Internet.…

    • 1596 Words
    • 7 Pages
    Powerful Essays
  • Powerful Essays

    Do Artifacts Have Politics

    • 2293 Words
    • 10 Pages

    Introna, Lucas D. and Nissenbaum, Helen (2000) Shaping the Web: Why the Politics of Search Engines Matters…

    • 2293 Words
    • 10 Pages
    Powerful Essays
  • Best Essays

    Larry Page

    • 2395 Words
    • 10 Pages

    The idea began while searching a dissertation theme about exploring the mathematical properties of the World Wide Web. According to John Battelle, founder of “Wired” magazine page assumed that web links where just citations so his project named “Backrub” was about classifying and counting all the backlinks of the World Wide Web and according to Page it would make…

    • 2395 Words
    • 10 Pages
    Best Essays
  • Powerful Essays

    [2] F. Maggi et al (2013). Two years of Short URLs Internet Measurement: Security Threats and…

    • 6032 Words
    • 25 Pages
    Powerful Essays
  • Powerful Essays

    Mozart to Metallica: A Comparison of Musical Sequences and Similarities Stuart Cunningham, Vic Grout & Harry Bergen Centre for Applied Internet Research (CAIR), University of Wales, NEWI Plas Coch Campus, Mold Road, Wrexham, LL11 2AW, North Wales, UK Tel: +44(0)1978 293583 Fax: +44(0)1978 293168 s.cunningham@newi.ac.uk | v.grout@newi.ac.uk | h.x.bergen@web.de Abstract Musical composition is a creative art, but is restricted by the limitations of the finite musical information that can be expressed.…

    • 5491 Words
    • 22 Pages
    Powerful Essays
  • Good Essays

    Google vs. Yahoo

    • 466 Words
    • 2 Pages

    Today, in our time of current technology we tend to rely on it more in everyday life. When using the internet, two of the most important websites are Google and Yahoo. They are two of the world’s biggest search engines, and also provide many other web and multimedia services to the world.…

    • 466 Words
    • 2 Pages
    Good Essays
  • Satisfactory Essays

    Studies and sport

    • 504 Words
    • 2 Pages

    In search of a dissertation theme, Page had been considering—among other things—exploring the mathematical properties of the World Wide Web, understanding its link structure as a huge graph.[3] His supervisor, Terry Winograd, encouraged him to pick this idea (which Page later recalled as "the best advice I ever got"[4]) and Page focused on the problem of finding out which web pages link to a given page, based on the consideration that the number and nature of such backlinks was valuable information for an analysis of that page (with the role of citations in academic publishing in mind).[3]…

    • 504 Words
    • 2 Pages
    Satisfactory Essays