Overview

• Efficient Computation of Data Cubes

– General Strategies for Cube Computation – Multiway Array Aggregation for Full Cube Computation – Computing Iceberg Cubes • BUC

– High-dimensional OLAP: A Minimal Cubing Approach – Computing Cubes with Complex Conditions

• Exploration and Discovery in Multidimensional Databases – Discovery-Driven

• Summary

General Strategies for Cube Computation

• Sorting, hashing, and grouping operations are applied to the dimension attributes in order to reorder and cluster related tuples • Aggregates may be computed from previously computed aggregates, rather than from the base fact table – Smallest-child: computing a cuboid from the smallest, previously computed cuboid – Cache-results: caching results of a cuboid from which other cuboids are computed to reduce disk I/Os – Amortize-scans: computing as many as possible cuboids at the same time to amortize disk reads – Share-sorts: sharing sorting costs cross multiple cuboids when sort-based method is used – Share-partitions: sharing the partitioning cost across multiple cuboids when hash-based algorithms are used

Multiway Array Aggregation for Full Cube Computation

• Computes a full data cube by using a multidimensional array as basic structure • Typical MOLAP approach – Partition the array into chunks

• Chunk: A subcube small enough to fit into memory

– Compute aggregates by visiting cube cells

• The order in which the cube cells are visited can be optimized to reduce memory access, storage cost

Consider a 3-D data array containing three dimensions: A, B, C - Each dimension is divided into 4 chunks - A (a0, a1, a2, a3), B (b0, b1, b2, b3), C (c0, c1, c2, c3) - A has 40 different values, B has 400, and C has 4000 - So each partition of A has size of 10, B has 100, and C has 1000 - Full cube computation requires to compute - The base cuboid ABC which is already computed (the 3-D array) - The 2-D cuboids AB, AC, BC - The 1-D cuboids A, B, C - The 0-D (apex) cuboid denoted by ALL The Problem Statement: Devise an intelligent technique which will minimize the memory requirement to compute the full cube

Consider a 3-D data array containing three dimensions: A, B, C

Suppose you want to compute BC cuboid - Without any intelligent method, one needs memory for the whole cuboid BC which is 400*4000 But a better method exists … - BC can be computed by first computing b0c0 then b1c0 and so on - b0c0 can be computed by aggregating chunks 1 to 4 - b1c0 can be computed by aggregating chunks 5 to 8 - b2c0 can be computed by aggregating chunks 9 to 12 - and so on - So, by first computing b0c0 then b1c0 and so on we need memory only for one chunk to compute entire BC cuboid - Note: Each chunk of BC has 100*1000 memory units - Compare this with the brute force method of 400*4000!

How many chunks of memory we need to compute the cuboid AC: - Try to compute a0c0: Aggregate chunks 1, 5, 9, and 13 - Similarly to compute a1c0: Aggregate chunks 2, 6, 10, and 14 - Similarly to compute a2c0: Aggregate chunks 3, 7, 11, and 15 - Similarly to compute a3c0: Aggregate chunks 4, 8, 12, and 16 - If we want to compute the cuboid AC in one scan of the array, we need 4 chunks of memory, i.e., when we scan from 1 to 16, memory for a0c0, a1c0, a2c0 and a3c0 are required - Each chunk of AC has 10*1000 number of memory units

How many chunks of memory we need to compute the cuboid AB: - Try to compute a0b0: Aggregate chunks 1, 17, 33, and 49 - Similarly to compute a1b0: Aggregate chunks 2, 18, 34, and 50 - Similarly to compute a2b0: Aggregate chunks 3, 19, 35, and 51 - Similarly to compute a3b0: Aggregate chunks 4, 20, 36, and 52 -… - Similarly to compute a3b3: Aggregate chunks 16, 32, 48, and 64 - If we want to compute the cuboid AB in one scan of the array, we need 4*4 =16 chunks of memory, i.e., when we scan from 1 to 64, memory for a0b0, … a3b3 are required - Note: Each chunk of AB has 10*100 memory units...