What is big data?
“Big data is not a precise term; rather it's a characterization of the never ending accumulation of all kinds of data, most of it unstructured. It describes data sets that are growing exponentially and that are too large, too raw or too unstructured for analysis using relational database techniques. Whether terabytes or petabytes, the precise amount is less the issue than where the data ends up and how it is used.”------Cite from EMC’s report “Big data: Big opportunity to create business value”.
When explosion happened in mobile network, cloud computing and internet technology, more and more different information appeared. In the past, the numerous terabyte data could be a disaster for any company, because it means high cost of storage and high performance CPU. However, in nowadays, companies discovered many facts they haven’t thought about these data before. Companies started to use data analytics technology to find business values from these terabyte or petabyte data. It seems to be a big opportunity instead of disaster for companies now. Data is not only defined as structured data. When we talking about big data, it could be categorized into three types of data: structured data, unstructured data, and semistructured data (Please see Chart I). Especially when internet and mobile internet developed rapidly, the unstructured data and semistructured data exploded. For example, a bank could draw a conclusion by analyze unstructured data to find out why number of churn increased. Most definitions of big data all talk about the size of data. However, size, or volume, is not the only characteristic of big data. There are other two characteristics, variety and velocity. Variety means big data generates from several of sources. Data type was no longer connected to structured data. According to the EMC’s report, most of big data related to unstructured data. Velocity means the speed of data production. Data was no long structured data which was stored in the structured database. Data could come from anywhere and anytime: mobile, censors, devices, manufacturing machine etc. The stream of data generates in real time. This means company’s action should be taken with this speed.
Structured data| Structured data is organized in structure. These data can be read and stored by computer. The form of structured data is structured data base that store specific data by methodology of columns and rows. | Unstructured data| Unstructured data refers to the data without identified structure. For example, video, audio, picture, text and so on. These data also called loosely structured data. | Semistructured data| Semistructured data organized in semantic entities. The data’s size and type in one group could be different. For example, XML and RSS feeds. This data try to reconcile the real world with computer based database.| Chart I. Three types of data.
Big data analytics
Big data analytics is not a technique. It is a terms that contains a lot of technologies (See EXHIBITION I). Based on enterprise’s different requirement, each program will use different technology to analyze data. However, with the big data’s development, some of these techniques become popular and useful. On the basis of the exhibition II, advanced analytics, visualization, real time, in-memory databases and unstructured data have strong-to-moderate commitment and strong potential growth. The traditional techniques, for example, OLAP tools and hand-coded SQL, have gradually lost their place. When a bank want to find the reason why the number of customer churn increased, or marketing department decide to push precise advertisement to their customer, they need to analyze customer behavior. These data from customer service emails, phone call records, sales interview reports, login data from mobile devices, and so on. Almost all of these data cannot be analyzed by traditional data analytic techniques....