Sunday, July 1, 2018

Is SCD2 really needed?

SCD 2 is naturally applied in data warehousing design, also Bitemporal design approach is also sometimes applied. When talking about high level picture, all these concepts are intended to resolve the relationship between dimension and fact.  Based on my experiences from many practical data warehouse projects, I am trying to ask myself whether the SCD 2, or even surrogate keys is really necessary or not.  This document is to argue that surrogate key, SCD2 and Bitemporal are not needed.

Referring back to the two design goals of a dimensional model per Kimball; create an easy to understand data structure while providing fast query response times.

First of all, let address the performance issue. Fast response times often mean efficient joins between facts and dimensions.  By using surrogate keys we can select a data type, integer, which offers high performance. However, the performance is insignificant given that many serious data warehousing use Teradata or Netezza.  To apply for the surrogate keys and SCD2, the development and maintenance cost are much higher than simple relationship. We need either to write complicated SQL, and to use ETL tool to generate both dimensions and facts during development. It is also difficult to query fact data as we need to join dimension tables to get data understandable. Because of  incremental IT personnel costs, and additional soft costs, there is no justification for using surrogate key and SCD2 for better performance.

Secondly, with respect to create an easy to understand data structure, the actually data structure without surrogate keys and SCD2 is still illustrated as star scheme, which still can be easily modeled with different BI tools. 

Below are a few methods to model data structures:

1. Simple attribute(s) ( or SCD0): 
we can simply consider these attributes as fact dimension. normally, the fact table has primary key, we can either leave these attributes in fact table, or separate them as fact dimension without any changes. If modeling it in Cognos, we can either model an individual attribute as a corresponded dimension, or make the all attribute as one fact dimension. The latter is better approach.

2. Dimension (SCD1):
 If we don’t use surrogate key, then we don’t need to process fact and dimension, or put surrogate key in fact and insert surrogate key in dimension.  We can simply keep the dimension related column as it in fact. With respect to dimension, there are two different cases:

2.1 Create a static reference table like dimension, with or without different hierarchies.

2.2 If this dimension is dynamic, then we can consider this column as distinct value as independent dimension. the dimension will be automatically extracted from fact.

3. Dimension (SCD2):
For the sake of discussion, we use customers dimension as sample. This table has snapshot month and customer ID with a lot of dimension attributes.  All change history of attributes in customer are automatically preserved. There are two ways to proceed:

3.1 Keep the customer dimension as is, then create relationship between customer and fact with two columns joined: customer ID and Snapshot month.
  • Advantage: once modeled, you will have all possible columns from customer for reports, and t is fully flexible
  • Disadvantage: the customer dimension could be huge, but only small percentage of customer data is used, this will result in a poor performance.

3.2 Bring only applied customer related attribute into fact table. for example, marriage status. We can add this attribute into fact table, and then apply the concept SCD0 and SCD1
  • Advantage: Very good performance and easy to understand.
  • Disadvantage:  It is not flexible, we need to bring new columns into fact table if we need to add new customer related new attribute.

Friday, June 1, 2018

how to debug a complicated DMR based report

This document will have walk through to demonstrate how to debug a complicated DMR report.
1. Problem
Below is Discontinuous Crosstab reports with problem in the marked area. However, when checking data with Cognos workspace advanced, we get correct data, and therefore we consider it as report problem.

This report above is Discontinuous crosstab, which has many measures with calculation on X side, and many different dimensions on Y side 

2. Analysis
it is very difficult to pin down the problem. We get correct result when removing most of measures and all other dimensions.

 This report above is Discontinuous crosstab, which has many measures with calculation on X side, and many different dimensions on Y side . the only solution is to remove measures and dimensions one by one, respectively.

3. Debug
Step 1. Remove measure 1, 2 , 3, to save report as X1, X2, X3, ….

Step 2. Define a new job and run all these reports and then identify until which measure get this problem

Step 3. Remove this identified measure(s), but the result still NOT correct. in this sample, it is almost impossible to pin down the issue, as there are two many combinations.

Step 4. However, when looking into generated SQL between report with problem and report without problem, we find the generated SQL is very different, from JOIN to UNION. but it is very difficult to see the real problem.

Step 5. Remove dimension 1, 2 , 3, to save report as Y1, Y2, Y3, ….

Step 6. Define a new job and run all these reports and then identify until which dimension get this problem

Step 7. Define a new job and run all these reports and then identify until which dimension get this problem

Step 8. Remove this identified dimension, check report and confirm that the generated report is correct. then we can be sure that this dimension has problem.

Step 9. Generate two SQLs below and compare them
- SQL generated with correct result, aka GOOD SQL
- SQL generated with wrong result, aka  BAD SQL
For this case, it is INNER JOIN with this identified dimension.  There are some dimension items that exist in FACT table, but does NOT exist in this dimension, therefore remove some data from FACT table as long as this dimension is used.

Tuesday, May 1, 2018

A pragmatic approach for reference data management: Cognos BI Modelling and implementation issues

 
Context
- Netezza as databases
- Cognos as report tool
 
Goal
- Make development easier – All ETLs will use the centralized table to get dimension business key, also all these map will be used to validate dimension tables as needed
- Make maintenance easier – when some changes applies duo to business logic change, we don’t need to change ETL and Cognos model
 
Assumptions
- Focused area is from staging data to data mart data
- SCD concept with surrogate key is NOT applied

Detail assumption of staging table
- All these dimension related columns are not NULL.
- If the column is NULL, then setup as ‘-99’ for string, or as -99 for number.
- If the column cannot be found from reference table, then setup as ‘-1’ for string, or as -1 for number.
 
Capabilities
- Handle single value mapping
- Handle data range mapping
- Handle default
- Generate exception between map and dimensions
 
Mapping table
-- DDL
DROP TABLE MPL_BI.REFERENCE_DATA_MAP IF EXISTS;
CREATE TABLE MPL_BI.REFERENCE_DATA_MAP 
(
 SOURCE CHARACTER VARYING(100) NOT NULL,
 PARAMETER_NAME CHARACTER VARYING(100) NOT NULL,
 SINGLE_VALUE CHARACTER VARYING(100),
 LOW_BOUND_VALUE DECIMAL(20,2), 
 LOW_BOUND_INCL_VALUE DECIMAL(20,2),
 HIGH_BOUND_VALUE DECIMAL(20,2),
 HIGH_BOUND_INCL_VALUE DECIMAL(20,2),
 DIMENSION_NAME CHARACTER VARYING(100) NOT NULL,
 DIMENSION_ITEM_NAME CHARACTER VARYING(100) NOT NULL,
 NOTE CHARACTER VARYING(1000),
 LAST_UPDATE_USER CHARACTER VARYING(100) ,
 LAST_MOFIFIED_DATE  DATE ,
 START_DATE DATE NOT NULL,
 END_DATE DATE
);
Sample Data
INSERT INTO MPL_BI.REFERENCE_DATA_MAP values('ALL','MAP_SINGLE_VALUE','Current (0)',NULL,NULL,NULL,NULL,'DIM_SINGLE_VALUE','NOT DELINQUENT',NULL,NULL,NULL,'2018-03-01',NULL);
.............
select * from MPL_BI.REFERENCE_DATA_MAP;
 
SOURCEPARAMETER_NAMESINGLE_VALUELOW_BOUND_VALUELOW_BOUND_INCL_VALUEHIGH_BOUND_VALUEHIGH_BOUND_INCL_VALUEDIMENSION_NAMEDIMENSION_ITEM_NAMENOTELAST_UPDATE_USERLAST_MOFIFIED_DATESTART_DATEEND_DATE
ALLMAP_SINGLE_VALUECurrent (0)NULLNULLNULLNULLDIM_SINGLE_VALUENOT DELINQUENTNULLNULLNULL2018-03-01 00:00:00NULL
ALLMAP_SINGLE_VALUECYC1. 1-30NULLNULLNULLNULLDIM_SINGLE_VALUECLASS 0NULLNULLNULL2018-03-01 00:00:00NULL
ALLMAP_SINGLE_VALUECYC2. 31-60NULLNULLNULLNULLDIM_SINGLE_VALUECLASS 1NULLNULLNULL2018-03-01 00:00:00NULL
ALLMAP_SINGLE_VALUECYC3. 61-90NULLNULLNULLNULLDIM_SINGLE_VALUECLASS 2NULLNULLNULL2018-03-01 00:00:00NULL
ALLMAP_SINGLE_VALUECYC4. 90+NULLNULLNULLNULLDIM_SINGLE_VALUECLASS 3NULLNULLNULL2018-03-01 00:00:00NULL
ALLMAP_SINGLE_VALUENPNANULLNULLNULLNULLDIM_SINGLE_VALUENPNANULLNULLNULL2018-03-01 00:00:00NULL
ALLMAP_SINGLE_VALUEDWONULLNULLNULLNULLDIM_SINGLE_VALUEDWONULLNULLNULL2018-03-01 00:00:00NULL
ALLMAP_SINGLE_VALUEDEFAULTNULLNULLNULLNULLDIM_SINGLE_VALUENOT IDENTIFIEDNULLNULLNULL2018-03-01 00:00:00NULL
ALLMAP_BANDNULLNULL-1.00NULL-999.00DIM_BANDMISSINGNULLNULLNULL2018-03-01 00:00:00NULL
ALLMAP_BANDNULLNULL0.00NULL0.00DIM_BANDZERONULLNULLNULL2018-03-01 00:00:00NULL
ALLMAP_BANDNULLNULL1.00NULL50.00DIM_BANDFORCEDNULLNULLNULL2018-03-01 00:00:00NULL
ALLMAP_BANDNULLNULL51.00NULL599.00DIM_BAND<600NULLNULLNULL2018-03-01 00:00:00NULL
ALLMAP_BANDNULLNULL600.00NULL619.00DIM_BAND600-619NULLNULLNULL2018-03-01 00:00:00NULL
ALLMAP_BANDNULLNULL620.00NULL639.00DIM_BAND620-639NULLNULLNULL2018-03-01 00:00:00NULL
ALLMAP_BANDNULLNULL640.00NULL659.00DIM_BAND640-659NULLNULLNULL2018-03-01 00:00:00NULL
ALLMAP_BANDNULLNULL660.00NULL679.00DIM_BAND660-679NULLNULLNULL2018-03-01 00:00:00NULL
ALLMAP_BANDNULLNULL680.00NULL699.00DIM_BAND680-699NULLNULLNULL2018-03-01 00:00:00NULL
ALLMAP_BANDNULLNULL700.00NULL719.00DIM_BAND700-719NULLNULLNULL2018-03-01 00:00:00NULL
ALLMAP_BANDNULLNULL720.00NULL739.00DIM_BAND720-739NULLNULLNULL2018-03-01 00:00:00NULL
ALLMAP_BANDNULLNULL740.00NULL759.00DIM_BAND740-759NULLNULLNULL2018-03-01 00:00:00NULL
ALLMAP_BANDNULLNULL760.00NULL779.00DIM_BAND760-779NULLNULLNULL2018-03-01 00:00:00NULL
ALLMAP_BANDNULLNULL780.00NULL799.00DIM_BAND780-799NULLNULLNULL2018-03-01 00:00:00NULL
ALLMAP_BANDNULLNULL800.00NULL819.00DIM_BAND800-819NULLNULLNULL2018-03-01 00:00:00NULL
ALLMAP_BANDNULLNULL820.00NULL839.00DIM_BAND820-839NULLNULLNULL2018-03-01 00:00:00NULL
ALLMAP_BANDNULLNULL840.00NULL859.00DIM_BAND840-859NULLNULLNULL2018-03-01 00:00:00NULL
ALLMAP_BANDNULLNULL860.00NULL1000.00DIM_BAND860+NULLNULLNULL2018-03-01 00:00:00NULL
ALLMAP_BANDDEFAULTNULLNULLNULLNULLDIM_BANDMISSINGNULLNULLNULL2018-03-01 00:00:00NULL

Constraints
SELECT PARAMETER_NAME,SINGLE_VALUE, count(*) FROM MPL_BI.REFERENCE_DATA_MAP
WHERE END_DATE IS NULL AND SINGLE_VALUE IS NOT NULL
GROUP BY PARAMETER_NAME,SINGLE_VALUE HAVING COUNT(*) > 1;

 
SELECT PARAMETER_NAME,LOW_BOUND_VALUE,LOW_BOUND_INCL_VALUE,HIGH_BOUND_VALUE,HIGH_BOUND_INCL_VALUE, count(*) FROM MPL_BI.REFERENCE_DATA_MAP
WHERE END_DATE IS NULL  AND (LOW_BOUND_VALUE IS NOT NULL OR LOW_BOUND_INCL_VALUE IS NOT NULL OR HIGH_BOUND_VALUE IS NOT NULL OR HIGH_BOUND_INCL_VALUE IS NOT NULL)
GROUP BY PARAMETER_NAME,LOW_BOUND_VALUE,LOW_BOUND_INCL_VALUE,HIGH_BOUND_VALUE,HIGH_BOUND_INCL_VALUE HAVING COUNT(*) > 1;

Apply for ETL
DROP TABLE MPL_BI.STG_TABLE IF EXISTS;
CREATE TABLE MPL_BI.STG_TABLE
(
 ACCOUNT int,
 DELQ CHARACTER VARYING(100),
 ORIGINAL_BUREAU_SCORE int
);
INSERT INTO MPL_BI.STG_TABLE VALUES (1,'Current (0)',-1);
INSERT INTO MPL_BI.STG_TABLE VALUES (2,'CYC1. 1-30',610);
INSERT INTO MPL_BI.STG_TABLE VALUES (3,'Other Value',NULL);
SELECT
 ACCOUNT,
 NVL(MAP_SINGLE_VALUE.DIMENSION_ITEM_NAME,MAP_DELQ_STATUS_COPS_0.DIMENSION_ITEM_NAME) AS MEMBER_DELQ,
 NVL(MAP_BAND.DIMENSION_ITEM_NAME,MAP_BUREAU_SCORE_20PT_BAND_INCLUDING_FORCED_0.DIMENSION_ITEM_NAME) AS MEMBER_ORIGINAL_BUREAU_SCORE
FROM MPL_BI.STG_TABLE S
LEFT JOIN MPL_BI.REFERENCE_DATA_MAP MAP_SINGLE_VALUE ON (MAP_SINGLE_VALUE.PARAMETER_NAME ='MAP_SINGLE_VALUE' AND S.DELQ=MAP_SINGLE_VALUE.SINGLE_VALUE)
LEFT JOIN MPL_BI.REFERENCE_DATA_MAP MAP_DELQ_STATUS_COPS_0 ON (MAP_DELQ_STATUS_COPS_0.PARAMETER_NAME ='MAP_SINGLE_VALUE' AND MAP_DELQ_STATUS_COPS_0.SINGLE_VALUE = 'DEFAULT')
LEFT JOIN MPL_BI.REFERENCE_DATA_MAP MAP_BAND ON (MAP_BAND.PARAMETER_NAME ='MAP_BAND' AND S.ORIGINAL_BUREAU_SCORE BETWEEN MAP_BAND.LOW_BOUND_INCL_VALUE AND MAP_BAND.HIGH_BOUND_INCL_VALUE)
LEFT JOIN MPL_BI.REFERENCE_DATA_MAP MAP_BUREAU_SCORE_20PT_BAND_INCLUDING_FORCED_0 ON (MAP_BUREAU_SCORE_20PT_BAND_INCLUDING_FORCED_0.PARAMETER_NAME ='MAP_BAND' AND MAP_BUREAU_SCORE_20PT_BAND_INCLUDING_FORCED_0.SINGLE_VALUE = 'DEFAULT');
 
 Apply for Cognos Dimensions
select * FROM MPL_BI.REFERENCE_DATA_MAP
WHERE PARAMETER_NAME IN ('MAP_SINGLE_VALUE')
AND DIMENSION_ITEM_NAME NOT IN (SELECT DELQ_STATUS from MPL_BI.DIM_CONFORMED_DELQ_STATUS);
select * FROM MPL_BI.REFERENCE_DATA_MAP
WHERE PARAMETER_NAME IN ('MAP_BAND')
AND DIMENSION_ITEM_NAME NOT IN (SELECT CURR_BUREAU_SCORE___20PT_BAND_INCLUDING_FORCED from MPL_BI.DIM_CONFORMED_BUREAU_SCORE_20PT_BAND_INCLUDING_FORCED);
 


Sunday, April 1, 2018

How to change member caption without changing ETL and reports in DMR model

1 Context
This document is intended to demonstrate an idea to handle caption without changing ETL and reports. Sometimes, there is requirement to change name, such as brand, product on all reports. There are many different ways to proceed, such as
1) Change ETL to change name across the board. This method will request review for Framework manager and all reports, especially where the individual member is directly used in report.
2) Change Report across the board. This method will have huge effort.

2 Solution & implementation
The idea is to take advantage of framework manager model, but not to make any changes for ETL and reports. As known, each member will have default roles _BusinessKey, _MemberCaption, _MemberDescription

Also, please note that caption will be used in all reports by default. We can assign default column to _BusinessKey, and create new column for caption. Doing this way, all reports will still work as the report specification uses _businessKey, and therefore the caption change will NOT have impact on report specification. But it will have change report result when executing.

Two samples are listed below
1)
Before

 After


2)
Before






After

Thursday, February 1, 2018

How to create a report running current month by default or date range if specified

This document is intended to share two solution tips, which can applied for many difference cases.
Given that we need to create a report to satisfy requirements:
1) This report can be scheduled without any specifying parameter, by default, it will execute with current month.
2) This report can specify data range if needed.
Based on this requirement, we need to create a prompt page as below









If user select specify data range, then start date and end date will appear for users to spcifiy.

Create a fake value prompt to avoid refreshing report
When a user change the option from current month to specify date range, the start date and end date will be appear.  Obviously, you need to make prompt to set up as auto prompt.

 

Doing so will make report submit right away and move to real report page. To avoid this action, we need to make a faked value prompt to take  value from the selection.  We need to also make this faked value as NOT visible




Certainly, we need to make date range conditional with render variable

 

Make filter optional to allow user and Cognos schedule to run report against current month

Make default moth is used when date range is not specified. Cognos can only support  case 1 (IF
THEN) and  case 2 (CASE WHEN), doesn’t support the 3rd case.  Make sure this filter is OPTIONAL.

 
Case 1: filter works, as Cognos send SQL after applying the filter, in this case, ( [Sales (query)].[Time].[Date] BETWEEN _first_of_month(_add_months (current_date, -1 ))

if ( ?p_selected? = 'CURRENT'  ) THEN     
 ( [Sales (query)].[Time].[Date] BETWEEN _first_of_month(_add_months (current_date, -1 )) AND _last_of_month (_add_months (current_date, -1 )))
ELSE ( [Sales (query)].[Time].[Date]   BETWEEN ?Report Start Date? AND ?Report End Date? )


Case 2: Filter works , same reason as above

CASE WHEN ( ?p_selected? = 'CURRENT'  ) THEN     
 ( [Sales (query)].[Time].[Date] BETWEEN _first_of_month(_add_months (current_date, -1 )) AND _last_of_month (_add_months (current_date, -1 )))
ELSE ( [Sales (query)].[Time].[Date]   BETWEEN ?Report Start Date? AND ?Report End Date? )
END


Case3 filter doesn’t work, as Cognos doesn’t know how to handle another two filters. In this case, the filter WON’T apply at all. all data will be retrieved.

(
 ( ?p_selected? = 'CURRENT'  ) 
 AND
  ( [Sales (query)].[Time].[Date] BETWEEN _first_of_month(_add_months (current_date, -1 )) AND _last_of_month (_add_months (current_date, -1 )))
)
OR
(
 ( ?p_selected? <> 'CURRENT'  ) 
 AND
 ( [Sales (query)].[Time].[Date]   BETWEEN ?Report Start Date? AND ?Report End Date? )
)

Tuesday, January 2, 2018

How to Design Effective Cognos Report Studio Templates

1 Context
There are many articles to address this topic. Instead of showing you the generic solution, this document is intended to provide a few tips to resolve a few practical problems, listed below
1) Separate report template from layout components
2) Define all styles in report template
3) Make title/header/footer dynamic based report description
4) Share the same report template for both excel and PDF
5) Make more detail report pages in report template with accurate size
 

2 Separate report template from layout components
The layout components mainly refers page header, page footer and page title. We want to make sure that these changes should be changed only once if needed. Therefore, we need to define layout components as many as possible, and then make report template to use these components as references. Report developers do not need to make report header and footer to point components any more.
 

3 Define all styles in report template
If we want to make report style for list and crosstab, we can change Global Class Extensions as well as define local Classes. If we change Global Class Extensions as proposed standard, then any reports applied this template will use the defined style. The problem is that we cannot change template once and reflect this change for all reports, as it doesn’t work the same way as layout component reference. The local Classes are very useful to standardize reports as well, in addition, this concept can be applied for active report development to reduce mht file size.



4 Make title/header/footer dynamic based report description
As report name appears on report for every pages including report title. However, sometimes you can not change report name once developed as the report name may be used in report schedule. The solution is to leverage report description. If report description will override report name.

Please note that report description is not saved in report specification, therefore, report description won’t be available when running report from report studio, and report name will be displayed. However, report description will be displayed for end users and scheduled output.
The script for report title for layout calculation is below
if (character_length(ReportDescription ()) = 0 ) then (ReportName ()) else (ReportDescription())



5 Share the same report template for both excel and PDF
As PDF and EXCEL output is different, using the same report footer and header is impossible. One of main problem is that the columns length is NOT based on report contents such as list and crosstab duo to the impact from report footer.  The idea is to make report header and footer to be selected based on the report output. For example, we can make footer as block as layout component reference. Two blocks within this block will be defined as EXCEL block and PDF block. Then we can use conditional rendering to proceed.
 

6 Make more detail report pages in report template with accurate size
As known, the size is very important to control, especially when charts and tables are combined on same page. We don’t want chart size to impact on table. The solution is to specify the exact size for both charts and tables, so that report will be generated as exact as expected when apply predefined template.

Friday, December 1, 2017

How to validate data integrity using Cognos report

Context

This document is intended to demonstrate a new method to handle data integrity with respect to data warehousing. There are some occasionsduring data ware house development, where data integrity cannot be enforced :
1) As known, Netezza database cannot enforce foreign key.
2) Data warehouse is not created by star scheme with foreign key enforcement duo to different reasons.
3) There are many different logics on the Framework manager side, such as make one or more columns with logic to be combined as adapter column to connect with conformed dimension.
It is very difficult to write SQL for validating database, because we need to mimic all mapping logic in framework manager and then proceed. In addition, when there are any changes in Framework, we need to change SQL accordingly. In other word, we need to maintain two sets of codes, which is very difficult to keep them in sync.

Solution & implementation

The idea is to take advantage of framework manager model, as all detail logic is already built there. we don’t need to write SQL script, let Framework manager write for us behind the scene.  Whether there is DMR model or relational model, we need to create report at business layers. Theoretically, we need to check all links about fact table. Please see sample below, we need to make sure that all data in PCF Performance Cube should be contained in Conformed Delq Status Dim.



The column on fact side Conformed Delq Status Adaptor is with logic below:
case [Business Layer].[PCF Performance Cube].[Realigned Delq Status] 
when '0 - Current' then 'NOT DELINQUENT'
when '1 - Class 1' then 'CLASS 0'
when '2 - Class 2' then 'CLASS 1'
when '3 - Class 3' then 'CLASS 2'
when '4 - Class 4' then 'CLASS 3'
when '5 - NPNA' then 'NPNA'
end

The exception report is built from this relationship:



 








Fact: 'PCF Performance Cube'
Dim: 'Conformed Delq Status Adaptor'
Exception: [Business Layer].[PCF Performance Cube].[Conformed Delq Status Adaptor]