Text Mining

Only available on StudyMode
  • Topic: Information retrieval, Value added, Latent semantic analysis
  • Pages : 6 (1648 words )
  • Download(s) : 65
  • Published : September 4, 2008
Open Document
Text Preview
Vaim Faqs
Steve Kimbrough
May 10, 2006
1 What does the word vaim mean?
The word vaim has two meanings:
1. In Estonian, the word vaim means ghost (http://et.wikipedia.org/wiki/Vaim). See http://www.logosdictionary.com/pls/dictionary/new dictionary.gdic.sl? phrase code=5573701 for the pronunciation.

2. The word vaim is an acronym, abbreviating Value-Added Information Mash. The two meanings are unrelated. It is permitted to pronounce both words in the Estonian fashion.
2 What is a mash?
From http://en.wikipedia.org/wiki/Mashup %28web application hybrid%29: A mashup is a website or web application that seamlessly combines content from more than one source into an integrated experience.

And
The etymology of this term almost certainly derives from its similar use in pop music where DJ’s take the vocal track from one song and combine it with the instrumental track of another song resulting in an entirely new composition. In the lingo, Web mashing results in a Web mash or mashup. See the Wikipedia article for examples and further information. Programmable Web (http://www.programmableweb. com/) is a Web site devoted to Web mashing.

1
3 What is an information mash?
The original (or at least an early) reference appeared in a blog by Ellen Miller of the Sunlight Foundation (www.sunlightfoundation.com) on April 28, 2006. In her blog (http://www. sunlightfoundation.com/node/465) she writes:

Information Mashing. Don’t you just love that term? It’s one of the ma jor goals of Sunlight and while we’ve been working on it for the past couple of months we have a ways to go before it happens in any substantial way. Our goal is simple: integrate in a user-friendly way individual data sets (like campaign contributions, lobbyists and government contracts) that makes the whole larger than the sum of its parts.

We’d like to create something we’ve dubbed an “Accountability Matrix.” A website where, with one click you can look up a ma jor donor and see not just their campaign contributions, but also their lobbying expenditures, the names of members who’ve flown on their private jet, the names of former congressional staffers they’ve hired, and so on.

In a nutshell, we want to make information more liquid and more accessible to the public.
Although the information mashing she writes about is broadly on the sub ject of politics and current events, the concept of information mashing is not so restricted. An information mash is any sub ject-focused aggregation of information from multiple sources that achieves the-whole-is-larger-than-the-sum-of-its-parts status. In other words, meaningful, useful, non-trivial integration of information from several sources. 4 What is a value-added information mash?

Value-added implies the presence of a significant additional element of information process- ing, indexing, categorization, and so on. Information is not only collected and aggregated, but new information is added, typically through indexing, association of items in the dif- ferent aggregates, and other processing. The information masher may also add original information, not available from other sources.

Value added will often come from employment of advanced software technologies. Ex- amples include language translation, information extraction [JM02], associative indexing and retrieval [LD97], word pattern visualization [DKP00], data mining techniques, text mining techniques [WIZD05], literature-based discovery (aka: knowledge discovery) tech- niques [GLF02], faceted classification (http://www.kmconnection.com/DOC100100.htm), concordances, and others, such as [BK02].

2
5 Could you be just a little more specific about uses and
applications?
Yes, just a little. Detailed discussion is apt in other venues. Perhaps the key idea is association. An information mash facilitates finding significant associations (or significant lack of association) among information items (data, documents,...
tracking img