-
Online analitical processing allow users to interactively navigate on the DWH
-
navigation is done trough chains of operators
OLAP OPERATORS
roll-up
-
adds aggregation to the data collection
drill-down
- removes aggregation from the data collection
slice-and-dice
- set a dimension to a specific value reducing the data collection dimensions
pivot
- change in layouts of the collection of the data
drill-across
- create a liink between data co compare them
drill-through
- switches from the multidimensional data model into a operational data
EXTRACTION TRANSFORMATION AND LOADING (ETL)
The ETL process aims to get data from sources, improve general data quality, transform data according to the schema and loads it in the DWH
---
title: ETL
---
flowchart TD
A[EXTRACTION\nextract data from sources]
B[CLEANSING\nimprovements to the quality\nremoving duplicates]
C[TRASFORMATION\ndata processing according to the schema]
D[LOADING\nload data in the DWH]
A --> B
B --> C
C --> D
EXTRACTION
The extraction phase aims to get data from the datasources, there are 2 possible approaches: STATIC or INCREMENTAL
---
title: EXTRACTION
---
flowchart TD
A[APPROACHES]
B[STATIC\nDWH is populated for the first time]
C[INCREMENTAL\nthe DWH is updated with new data regularly]
A --> B & C
Each approach is more suitable for certain types of data:
types of data | types of extraction |
---|---|
structured data (from databases or formatted files ) | static (for the first DWH population operation) |
unstructured data (from social media) | incremental (for the update operations on the DWH) |
CLEANSING
- data are processed to improve the quality, data are standardized and mistakes are corrected
SOLUTION FOR DATA INCONSISTENCIES
Dictionary based techniques
- they make use of dictionaries and lookup tables to fix typing errors
Aproximate merging
- needed when merging data from different sources and there is no common key
TRANSFORMATION
- data are altered to match the information schema on the DWH
DENORMALIZATION
- for relational database data are rearranged to reduce the number of queries to do on manipulation fase
LOADING
REFRESH
- the DWH is completely rewritten with new data
UPDATE
- only changes on source are applied to the DWH existent data are not canceled or modified