Data Modeling


RDBMS and Data Modeling

RDBMS Concepts - Database - Rows - columns - Types of Keys - ER Model - Relationships between tables - Normalisation - Types - Data integrity - Entity Integrity - Referential Integrity Primary key - Foreign Key relationship - Join types - Relationship (1:1 ,1:M and M:M) - Locks Normalised Schema - Dimensional Modeling - Basic Design Principles - OLTP Schema - Siebel Data Model - Base Table - Intersection Table - Audit Trail table - Transaction Tables - Non Transaction tables

Design Lifecycle

Identify Business Process Requirements - Requirements Gathering - Analysis - Identify the Grain - Identify the Dimensions - Identify the Facts - Physical design considerations - Logical Data modeling - Physical Data modeling - ER modeling - Dimensional modeling - Types of Schemas - Star - Snow Flake - Multi Star Schema - Data ware House Design Approaches - Top Down approach - Bottom-Up Approach - Hybrid Approach

Dimensions

Dimension Definition - Types - Conformed Dimensions - Dimensional Hierarchies - Slowly changing dimensions - Types - Type 1 - Type 2 - Type 3 - Hybrid Types - Fast changing dimensions - Time Dimension - Garbage Dimensions - Role-Playing Dimensions - Multi-Valued Dimensions - Bridge Tables - Heterogeneous Dimensions - Hot Swappable Dimensions - Mini Dimension – Casual dimension – Dirty Dimension - Extension Tables

Facts

Facts - Definition - Fact Table Design - Fact Table Types - Transaction Fact table - Periodic Fact Table - Event Fact tables - Fact less Fact Tables - Degenerate Dimension - Fact Table Granularity - Conformed Facts - Year-to-Date Facts - Non-Additive Facts - Semi-Additive Facts Textual Fact - Aggregate Tables - Multiple Granular Fact Tables - Fact Table Model for Slowly Changing Dimensions
Standard

What are the different types of Schema in Dimensional Modeling?

1) Star Schema
2) Snow Schema
3) Galaxy Schema


What is the difference between a Dimension Table and Fact Table?

Dimension Tables are tables which stores the description of the business entities.

Fact tables are tables which stores the relationship between the dimensions and its associated measures. Fact Table records an event or transaction or it can be a periodic snapshot of data.

What is a Role Playing Dimension?

Role Playing Dimensions are those dimensions which plays different roles in a schema, based on its join condition with the Fact Table.

Eg The table Team_lead(W_POSITION_D) Joins with the fact table and brings Team Lead related data, the Table Manager(W_POSITION_D) Joins with the fact table and brings manager data and Table Senior Manager(W_POSITION_D) Joins with the fact table and brings Senior manager related data

The joins are as follows

Team_lead(W_POSITION_D.ROW_WID=Fact(W_ABC_F).TM_LEAD_WID
Manager(W_POSITION_D).ROW_WID=Fact(W_ABC_F).MGR_WID
Senior Manager(W_POSITION_D).ROW_WID=Fact(W_ABC_F).SEN_MGR_WID

The Dimension W_POSITION_D here is a Role Playing dimension, it plays different roles based on its join with the fact table.

What is a Time Dimension?

Time Dimension are those dimension which stores all time related data, these dimensions are joined with the fact to analyse the behaviour of the metrics with respect to various time frames

eg A record in a fact table is populated whenever a customer buys a product. The Fact table stores the customer wid,employee_wid, product_wid,purchase date_wid, cost. Purchase date(W_DAY_D) contains month,year,week,quarter etc for all dates from the year 2000 to 2020

The Dimension table Purchase date(W_DAY_D) can be joined with the Fact(W_ABC_F) to analyse the behaviour of the metrics like number of products sold year wise, month wise, Quarter wise etc

Purchase date(W_DAY_D) here acts as a time dimension which is used to analyse the behaviour of the metrics with respect to time.

What is a Junk Dimension or Garbage Dimension?

Junk Dimension or Garbage Dimensions contains Random values populated through all possible combinations. These Dimension should be queried with the fact table to obtain desired results. Attributes of these tables contains Flags, Codes etc.

Eg A record in a Fact is Populated whenever a customer buys a product from an Employee. The Fact table stores the Cust_wid, Emp_wid, Prod_wid, Trans_id, Purchase_dt_wid, Cost. Every Customer can give feedback about the Product and Employee for every transaction and also The employee too can give Feedback about the Customer for every Transaction. These Feedbacks can take values (Excellent,Satisfactory,Poor,No Feedback)

I will introduce a Junk dimension with columns row_wid,Cust_emp,emp_prod,emp_cust. This table will store 64 Records for all possible Combinations of feedbacks.

Junk Dimension's wid will be introduced in the Fact Table. This dimension can be joined with the fact table to obtain the results.

What is a Multi Valued Dimension?

Multi Valued Dimensions are those dimensions that has multiple values for a given Fact measurement.

eg A Fact Record is created when a Sales Rep Team creates a project with a customer. Revenue is computed at Project level. Hence The fact table will contain Sales rep team's primary employee wid, customer wid, project wid, revenue etc.

Joins
W_EMPLOYEE_D.ROW_WID=W_FACT_F.PR_EMP_WID AND
W_EMPLOYEE_D.INTEGRATION_ID=W_SALES_TEAM_D.PR_EMP_ID

W_SALES_TEAM_D contains ROW_WID,INTEGRATION_ID,PR_EMP_ID,EMP_ID,EMP_NAME,TEAM_ID. This table stores the team id , EMP_ID stores all its team members

Sales Rep team's Primary employee wid will be joined with the Employee dimension which in turn will be joined with the Sales team employee dimension. Revenue is computed at Project level and every project can be associated with multiple team members. Multi Valued dimensions are introduced seperately to maintain the granularity of the Fact Table. These dimensions are usually associated with a weighted measure.

Here W_SALES_TEAM_D acts as a Multi valued dimension which takes multiple values for a given fact record

What are the different Types of Fact Tables?

Transaction Fact table
Periodic Fact Table
Event Fact Table
Factless Fact Table

What is a Factless Fact Table?

Factless Fact Tables are those tables which stores only the relationship between the dimensions. The main facts which can be churned out of these tables are count related measures.

eg This Factless Fact Table stores the students, projects, courses, professor, year related data of a college. It should be noted that many students can register for a given course and similarly a course can also be registered by many students. This Table can give the details about the student, professor, project details of a given course. Facts which can be churned out of these factless fact tables are like number of Students attending a given course, which professor has taught highest number of courses.

What is a Transaction Fact table?

A Transaction fact table records only one row per transaction. This type of Fact tables are implemented when the business is interested in the reporting at a lowest Granularitity.

Eg A Transaction may be defined as a customer purchasing a product at a given location. This Fact table will record the time of purchase eg 23-06-2010 17:09:34, customer wid, employee_wid,location_wid,product_wid ,revenue etc.

Generally this type of fact tables grows very fast as it inserts one row per every transaction.

What is a Periodic Fact Table?

A Periodic Fact table records one row for every period. This Period can be in year, Quarter of an year, month etc. This Type of Fact Table is implemented when the business is interested in reporting at a period level.

Eg This Fact Table as per the above example will record the month_wid eg 199801, customer_wid,employee_wid,location_wid , revenue etc

This type of fact tables can be formed by aggregating the transaction based fact table at a period level like year or month of a year etc.

What is Event Fact Table or Accumlating Snap shot Fact table?

Event Fact Tables records one row for entire life time of an event. This type of Fact table is implemented when the business is interested in reporting and analysing the various dates associated to an event.

Eg An Insurance company may wish to track many dates related to Customers policy. Here a Customer taking a policy is an event. There can be various dates associated to it like when the sales rep approached the customer, when the policy was created, when the first payment was done etc. This fact table stores one record and updates the various dates as when they approach for an Event.

This type of Fact tables are generally associated with various time dimensions to analyse the behaviour of the events with respect to various time frames

1 comment:

  1. Excellent work I have never seen a site like this before

    ReplyDelete