# Data Warehouse

Only available on StudyMode
• Published : March 4, 2013

Text Preview
Data warehousing logical design
Mirjana Mazuran mazuran@elet.polimi.it

December 15, 2009

1/18

Outline

Data Warehouse logical design
ROLAP model
star schema snowﬂake schema

Exercise 1: wine company Exercise 2: real estate agency

2/18

Introduction
Logical design

Starting from the conceptual design it is necessary to determin the logical schema of data We use ROLAP (Relational On-Line Analytical Processing) model to represent multidimensional data ROLAP uses the relational data model, which means that data is stored in relations Given the DFM representation of multidimensional data, two schemas are used: star schema snowﬂake schema

3/18

ROLAP
Star schema

Each dimension is represented by a relation such that:
the primary key of the relation is the primary key of the dimension the attributes of the relation describe all aggregation levels of the dimension

A fact is represented by a relation such that:
the primary key of the relation is the set of primary keys imported from all the dimension tables the attributes of the relation are the measures of the fact

Pros and Cons
few joins are needed during query execution dimension tables are denormalized denormalization introduces redundancy 4/18

ROLAP
Star schema: example

MaxAmount Amount Year Month Accident Day NrOfAccidents Cost IDCustomer IdPolicy Motivation Class

Time IdTime Day Month Year Accident IdTime IdPolicy IdCustomer IdMotivation NrOfAccidents Cost Customer IdCustomer Birthday Gender City Region

Policy IdPolicy Class Amount MaxAmount

BYear Sex

City Region

Motivation IdMotivation Motivation

5/18

ROLAP
Snowﬂake schema

Each (primary) dimension is represented by a relation:
the primary key of the relation is the primary key of the dimension the attributes of the relation directly depend by the primary key a set of foreign keys is used to access information at diﬀerent levels of aggregation. Such information is part of the secondary dimensions and is stored in dedicated relations

A fact is represented by a relation such that:
the primary key of the relation is the set of primary keys imported from all and only the primary dimension tables the attributes of the relation are the measures of the fact

Pros and Cons
denormalization is reduced less memory space is required a lot of joins can be required if they involve attributes in secondary dimension tables 6/18

ROLAP
Snowﬂake schema: example

Category Type Year Month Sale Quantity Income Discount IDCustomer Region BYear Sex City State IdFurniture Material

Time IdTime Day Month Year Sale IdTime IdFurniture IdCustomer Quantity Income Discount Customer IdCustomer IdPlace BirthYear Sex

Furniture IdFurniture Material Type Category

Day

Place IdPlace City Region State

7/18

Exercise 1
Wine company

An online order wine company requires the designing of a datawarehouse to record the quantity and sales of its wines to its customers. Part of the original database is composed by the following tables: CUSTOMER (Code, Name, Address, Phone, BDay, Gender) WINE (Code, Name, Type, Vintage, BottlePrice, CasePrice, Class) CLASS (Code, Name, Region) TIME (TimeStamp, Date, Year) ORDER (Customer, Wine, Time, nrBottles, nrCases) Note that the tables represent the main entities of the ER schema, thus it is necessary to derive the signiﬁcant relashionships among them in order to correctly design the data warehouse. 8/18

Exercise 1: A possible solution
Snowﬂake schema

FACT Sales MEASURES Quantity, Cost DIMENSIONS Customer, Area, Time, Wine → Class Customer CustomerCode Birthday Gender SalesTable WineCode CustomerCode OrderTimeCode Quantity Cost Wine WineCode ClassCode Type Vintage

Time TimeCode Date Year

Class ClassCode Region

9/18

Exercise 2
Real estate agency

Let us consider the case of a real estate agency whose database is composed by the following tables: OWNER (IDOwner, Name, Surname, Address, City, Phone)...