Preview

Datacube Computation

Better Essays
Open Document
Open Document
4285 Words
Grammar
Grammar
Plagiarism
Plagiarism
Writing
Writing
Score
Score
Datacube Computation
Data Cube Computation

Overview
• Efficient Computation of Data Cubes
– General Strategies for Cube Computation – Multiway Array Aggregation for Full Cube Computation – Computing Iceberg Cubes
• BUC

– High-dimensional OLAP: A Minimal Cubing Approach – Computing Cubes with Complex Conditions

• Exploration and Discovery in Multidimensional Databases
– Discovery-Driven

• Summary

General Strategies for Cube Computation
• Sorting, hashing, and grouping operations are applied to the dimension attributes in order to reorder and cluster related tuples • Aggregates may be computed from previously computed aggregates, rather than from the base fact table
– Smallest-child: computing a cuboid from the smallest, previously computed cuboid – Cache-results: caching results of a cuboid from which other cuboids are computed to reduce disk I/Os – Amortize-scans: computing as many as possible cuboids at the same time to amortize disk reads – Share-sorts: sharing sorting costs cross multiple cuboids when sort-based method is used – Share-partitions: sharing the partitioning cost across multiple cuboids when hash-based algorithms are used

Multiway Array Aggregation for Full Cube Computation
• Computes a full data cube by using a multidimensional array as basic structure • Typical MOLAP approach
– Partition the array into chunks
• Chunk: A subcube small enough to fit into memory

– Compute aggregates by visiting cube cells
• The order in which the cube cells are visited can be optimized to reduce memory access, storage cost

Consider a 3-D data array containing three dimensions: A, B, C - Each dimension is divided into 4 chunks - A (a0, a1, a2, a3), B (b0, b1, b2, b3), C (c0, c1, c2, c3) - A has 40 different values, B has 400, and C has 4000 - So each partition of A has size of 10, B has 100, and C has 1000 - Full cube computation requires to compute - The base cuboid ABC which is already computed (the 3-D array) - The 2-D cuboids AB, AC, BC - The 1-D

You May Also Find These Documents Helpful

  • Best Essays

    Nt1310 Unit 4 Exercise 1

    • 1486 Words
    • 6 Pages

    As it is evident from the related work discussed in the section 2, when small files are stored on HDFS, disk utilization is not a bottleneck. In general, small file problem occurs when memory of NameNode is highly consumed by the metadata and BlockMap of huge numbers of files. NameNode stores file system metadata in main memory and the metadata of one file takes about 250 bytes of memory. For each block by default three replicas are created and its metadata takes about 368 bytes [9]. Let the number of memory bytes that NameNode consumed by itself be denoted as α. Let the number of memory bytes that are consumed by the BlockMap be denoted as β. The size of an HDFS block is denoted as S. Further assume that there are N…

    • 1486 Words
    • 6 Pages
    Best Essays
  • Satisfactory Essays

    CS 220 – Programming w/ Data Structures: You have missed one assignment and one quiz. Your instructor has extended your assignment due date to this Sunday, April 10. Your instructor has also let you to take your Quiz # 2 during his office hours during this week. Let me know if you need additional support to study for this quiz. Your grade to date in this class is 30.2/37 81.62% B.…

    • 354 Words
    • 2 Pages
    Satisfactory Essays
  • Satisfactory Essays

    LYT2 Task2

    • 4061 Words
    • 12 Pages

    Stein, S. S., Gerding, E. H., Rogers, A. C., Larson, K. K., & Jennings, N. R. (2011). Algorithms…

    • 4061 Words
    • 12 Pages
    Satisfactory Essays
  • Powerful Essays

    COAS B1 Ch04

    • 4621 Words
    • 28 Pages

    The large object has much less surface area for each ‘unit’ of volume; it has a much lower surface area to volume ratio.…

    • 4621 Words
    • 28 Pages
    Powerful Essays
  • Good Essays

    Intro to Computers

    • 609 Words
    • 3 Pages

    | Allows users to organize data in rows and columns and perform calculations and recalculate when data changes.…

    • 609 Words
    • 3 Pages
    Good Essays
  • Satisfactory Essays

    1. When you designed a file system in the first section of this lab, why did you choose the structure that you selected?…

    • 350 Words
    • 2 Pages
    Satisfactory Essays
  • Satisfactory Essays

    Data Analysis

    • 1650 Words
    • 7 Pages

    5. A tabular method that can be used to summarize the data on two variables simultaneously is called…

    • 1650 Words
    • 7 Pages
    Satisfactory Essays
  • Satisfactory Essays

    Unit 3 Assessment

    • 321 Words
    • 3 Pages

    1. When you designed a file system in the first section of this lab, why did you choose the structure that you…

    • 321 Words
    • 3 Pages
    Satisfactory Essays
  • Good Essays

    Data Mining Soltions

    • 1720 Words
    • 7 Pages

    Question 1: Assume a base cuboid of 10 dimensions contains only three base cells: (1) (a1, b2, c3, d4; ..., d9, d10), (2) (a1, c2, b3, d4, ..., d9, d10), and (3) (b1, c2, b3, d4, ..., d9, d10), where a_i != b_i, b_i != c_i, etc. The measure of the cube is count. 1, How many nonempty cuboids will a full data cube contain? Answer: 210 = 1024 2, How many nonempty aggregate (i.e., non-base) cells will a full cube contain? Answer: There will be 3 ∗ 210 − 6 ∗ 27 − 3 = 2301 nonempty aggregate cells in the full cube. The number of cells overlapping twice is 27 while the number of cells overlapping once is 4 ∗ 27 . So the final calculation is 3 ∗ 210 − 2 ∗ 27 − 1 ∗ 4 ∗ 27 − 3, which yields the result. 3, How many nonempty aggregate cells will an iceberg cube contain if the condition of the 4, iceberg cube is "count >= 2"? Answer: There are in total 5 ∗ 27 = 640 nonempty aggregate cells in the iceberg cube. To calculate the result: fix the first three dimensions as (***), (a1**), (*c1*), (**b3) or (*c1b3), and vary the rest seven ones. 4, How many closed cells are in the full cube? Answer: There’re 6 closed cells in the full cube: 3 base cells; (a1, *, *, d4, …, d10); (*, c2, b3, d4, …, d10) : count 2; (*, *, *, d4, .., d10): count 3. Question 2: (Half open questions, make sure your algorithm and assumptions are correct, no need to be very specific) Suppose a base cuboid has the following tuples:…

    • 1720 Words
    • 7 Pages
    Good Essays
  • Satisfactory Essays

    1. When you designed a file system in the first section of this lab, why did you choose the structure that you…

    • 336 Words
    • 3 Pages
    Satisfactory Essays
  • Powerful Essays

    Easy to use information delivery tool. Contour CubeMaker produces Contour microcubes - multidimensional databases that contain both data and metadata describing the user interface layouts. These very compact cubes can then be transferred via the Internet or attached to an email message for use in a ContourCube application providing interactive analysis and reporting. Using Contour CubeMaker in conjunction with your ContourCube powered application allows you to create and update Web pages that provide data filtering and summarizing capabilities to a wide range of users, Windows applications for corporate users, Internet/intranet sites with downloadable microcubes, mailing systems for report distribution.…

    • 1540 Words
    • 7 Pages
    Powerful Essays
  • Good Essays

    Difference of Two Squares

    • 708 Words
    • 3 Pages

    When you have a pair of cubes, carefully apply the appropriate rule. By "carefully", I mean "using parentheses to keep track of everything, especially the…

    • 708 Words
    • 3 Pages
    Good Essays
  • Better Essays

    Of all the popular sorting algorithms, I have chosen to research and explain in detail an algorithm known as the ‘Quicksort’. Quicksort is a popular and speedy sorting algorithm that is the multi-purpose, sorting algorithm of choice for many mathematicians and computer scientists. Though of course the choosing of an algorithm comes down to which algorithm is best suited to the clients needs, and is dependent on the specific set of data to be sorted, Quicksort has proven to fulfill the required criteria on many occasions.…

    • 1130 Words
    • 5 Pages
    Better Essays
  • Satisfactory Essays

    Dbms

    • 318 Words
    • 2 Pages

    Physical view - how and where the data are physically arranged and stored on storage medium…

    • 318 Words
    • 2 Pages
    Satisfactory Essays
  • Good Essays

    3. It must keep track of the memory locations used by each process. It should also know which part of the memory is free to use.…

    • 2963 Words
    • 12 Pages
    Good Essays

Related Topics