Friday, February 1, 2019

How to create band report based on Cognos DMR

Context
There are 4 different ways to handle bands:
1. Create band in database
2. Create band in Cognos FM model, or tableau model
3. Create band in relational based model
4. Create band in DMR based.

While it is relative simple to create band in relational based model (s. sample below), the DMR based band is difficult to build and has poor performance.

BAND
CASE
WHEN [Sales (query)].[Sales].[Revenue]  < 1000  THEN '<1000'
WHEN [Sales (query)].[Sales].[Revenue]  BETWEEN 1000 AND 2000 THEN '1000 - 2000'
ELSE 'OTHER'
END

Sample to create band report based on Cognos DMR

BAND
set([BAND1],[BAND2],[BAND3])

BAND1
member (
total ( currentMeasure within set filter ( [Sales (analysis)].[Order].[Order].[Order detail code], tuple(currentMember([Sales (analysis)].[Order].[Order]),[Revenue]) < 50000) )
, '< 50000'
, '< 50000'
, [Sales (analysis)].[Order].[Order])

BAND2
member (
total ( currentMeasure within set filter ( [Sales (analysis)].[Order].[Order].[Order detail code], tuple(currentMember([Sales (analysis)].[Order].[Order]),[Revenue]) between  50000 and 100000) )
, '50000 -100000'
, '50000 -100000'
 , [Sales (analysis)].[Order].[Order])

BAND3 
 member (
total ( currentMeasure within set filter ( [Sales (analysis)].[Order].[Order].[Order detail code], tuple(currentMember([Sales (analysis)].[Order].[Order]),[Revenue]) > 100000) )
, '> 100000'
, '> 100000'
, [Sales (analysis)].[Order].[Order])


Performance problem with  to band report based on Cognos DMR
Relational based: All process is done on database side
with Sales as (
select ORDER_DETAILS.QUANTITY*ORDER_DETAILS.UNIT_SALE_PRICE Revenue
 from GOSALES.ORDER_DETAILS ORDER_DETAILS, GOSALES.ORDER_HEADER ORDER_HEADER
 where ORDER_HEADER.ORDER_NUMBER=ORDER_DETAILS.ORDER_NUMBER) 
select 
case  when Sales.Revenue<1000 then '<1000' when Sales.Revenue between 1000 and 2000 then '1000 - 2000' else 'OTHER' end  Band
, sum(Sales.Revenue) Revenue
 from Sales
 group by case  when Sales.Revenue<1000 then '<1000' when Sales.Revenue between 1000 and 2000 then '1000 - 2000' else 'OTHER' end

DMR based: All process is done on report server. Therefore, we need to be very cautious about the data amount to be handled.
with TIME_DIMENSION3 as (
select TIME_DIMENSION.CURRENT_YEAR CURRENT_YEAR, TIME_DIMENSION.MONTH_KEY MONTH_KEY
 from GOSALES.TIME_DIMENSION TIME_DIMENSION
 group by TIME_DIMENSION.CURRENT_YEAR, TIME_DIMENSION.MONTH_KEY) 

select distinct CAST( TIME_DIMENSION3.CURRENT_YEAR AS CHAR( 4 ) ) Yearkey
 from TIME_DIMENSION3
 where CAST( TIME_DIMENSION3.CURRENT_YEAR AS CHAR( 4 ) )='2006' and mod(TIME_DIMENSION3.MONTH_KEY, 100)<>0
 order by Yearkey asc nulls last
with Sales as (
select (EXTRACT( YEAR FROM (ORDER_HEADER.ORDER_DATE) )*10000+EXTRACT( MONTH FROM (ORDER_HEADER.ORDER_DATE) )*100)+EXTRACT( DAY FROM (ORDER_HEADER.ORDER_DATE) ) Day_key__order_date_, ORDER_HEADER.ORDER_NUMBER Order_number, ORDER_DETAILS.ORDER_DETAIL_CODE Order_detail_code, ORDER_DETAILS.QUANTITY*ORDER_DETAILS.UNIT_SALE_PRICE Revenue
 from GOSALES.ORDER_DETAILS ORDER_DETAILS, GOSALES.ORDER_HEADER ORDER_HEADER
 where ORDER_HEADER.ORDER_NUMBER=ORDER_DETAILS.ORDER_NUMBER)
 , TIME_DIMENSION8 as (
select TIME_DIMENSION.DAY_KEY DAY_KEY, TIME_DIMENSION.MONTH_KEY MONTH_KEY, TIME_DIMENSION.CURRENT_YEAR CURRENT_YEAR
 from GOSALES.TIME_DIMENSION TIME_DIMENSION
 where CAST( TIME_DIMENSION.CURRENT_YEAR AS CHAR( 4 ) )='2006')
 , Sales9 as (
select (EXTRACT( YEAR FROM (ORDER_HEADER.ORDER_DATE) )*10000+EXTRACT( MONTH FROM (ORDER_HEADER.ORDER_DATE) )*100)+EXTRACT( DAY FROM (ORDER_HEADER.ORDER_DATE) ) Day_key__order_date_, ORDER_HEADER.ORDER_NUMBER Order_number, ORDER_DETAILS.ORDER_DETAIL_CODE Order_detail_code, ORDER_DETAILS.QUANTITY*ORDER_DETAILS.UNIT_SALE_PRICE Revenue
 from GOSALES.ORDER_DETAILS ORDER_DETAILS, GOSALES.ORDER_HEADER ORDER_HEADER
 where ORDER_HEADER.ORDER_NUMBER=ORDER_DETAILS.ORDER_NUMBER) 

select
CAST( Order_.Order_number AS CHAR( 6 ) ) Order_numberkey
, CAST( Order_.Order_detail_code AS VARCHAR( 8 ) ) Order_detail_codekey
, CAST( TIME_DIMENSION8.CURRENT_YEAR AS CHAR( 4 ) ) Yearkey
, sum(Sales9.Revenue) Revenue
from Sales Order_, TIME_DIMENSION8, Sales9
where mod(TIME_DIMENSION8.MONTH_KEY, 100)<>0
and Sales9.Order_detail_code=Order_.Order_detail_code
and Sales9.Day_key__order_date_=TIME_DIMENSION8.DAY_KEY
group by CAST( Order_.Order_number AS CHAR( 6 ) )
, CAST( Order_.Order_detail_code AS VARCHAR( 8 ) )
, CAST( TIME_DIMENSION8.CURRENT_YEAR AS CHAR( 4 ) )
order by Yearkey asc nulls last

Tuesday, January 1, 2019

How to model Cognos FM to join two fact tables at different granularity with not shared dimension(s)

First of all, joining two fact tables at different granularity is a big no-no. In addition, we can't use a determinant on a fact table because it will aggregate your measure. However, this is not a valid case as data with different granularities should be defined as different tables from        data warehousing design point of view. This document is intended to share a solution to provide an effective Cognos FM model.

Problem
In order to explain this issue in detail, we create a sample below.
C3 is only used for FACT1. The M2 in FACT2, or denominator  is based on C1 and C2 only. The challenge is how to make Cognos FM model be flexible enough to allow report authors to generate report by simply drag and drop, using relational model or DMR model.
 
Analysis
It looks like there are a few options to try
Option1: Using UNION to bring both datasets together
 

Data looks like below

This is a problem, as data cannot be aligned in report.
 
 Option2: Using JOIN to bring both dataset together

Report

Even we apply semi aggregate function, we still cannot generate the current report.
 
Option3: Using STICTCH QUARY to bring both dataset together
Cognos has two core concepts, stitched query and determinant. However, this sample is not a typical sample, as C3 is not shared dimension. To resolve this problem, we can “FAKE” an extra column for FACT2 with value “total”. We can also “FAKE” an extra column for C3 with value “total”. Doing this way will bring all three dimensions as confirmed and shared dimensions, where the relationship between C3 and FACT2 is connect at “Total” level. This relationship can be further adjusted by determinant to avoid the double count.

Solution
Below is the FM design screenshot, where we get 3 dimensions are designed and turn the design as multiple fact tables with different granularities.



Create a faked column as C3_Total   in FACT2
[Data Source Layer].[ FACT2]: select C1, C2, M2, 'Total' as C3_Total  from [DATASOURCE1].FACT2

Create a view as V_C3  with a fake column Total, the reason we create this view is because only table or view can be supported for Cognos determinant.  
[Data Source Layer].[ C3]: select *  from [DATASOURCE1].V_C3
CREATE OR REPLACE VIEW SCHEMEX.V_C3 AS SELECT 'Total' AS TOTAL, SCHEMAX.C3.ID FROM C3



Result analysis
Case 1:
Report:

 SQL:
 select
 D2.ID ID ,
 D2.M1 M1 ,
 D3.M2 M2
from
(
 select
  C3.ID ID ,
  sum(FACT1.M1) M1
 from SCHEMAX.C3 C3, SCHEMAX.FACT1 FACT1
 where FACT1.C3 = C3.ID group by C3.ID
) D2,
(
 select
  sum(M2) M2
 from SCHEMAX.FACT2
  having count(*) > 0
) D3


 Data:


 Case 2:Report:

 SQL:
select
 D2.ID ID ,
 D2.M1 M1 ,
 D3.M2 M2 ,
 (D2.rc * 1.0e0) / nullif(D3.rc, 0) C____M1__M2_ ,
 min(D2.Summary_M1_) over () Summary_M1_ ,
 min(D3.Summary_M2_) over () Summary_M2_ ,
 (min(D2.rc5) over () * 1.0e0) / nullif(min(D3.rc) over (), 0) Summary____M1__M2__
from (
  select
   T0.C0 ID ,
   T0.C1 M1 ,
   T0.C1 rc
   , sum(T0.C1) over () Summary_M1_ ,
   sum(T0.C1) over () rc5
   from (
    select C3.ID C0 , sum(FACT1.M1) C1
    from SCHEMAX.C3 C3, SCHEMAX.FACT1 FACT1
    where FACT1.C3 = C3.ID group by C3.ID
   ) T0
 ) D2,
 (
  select
   distinct T0.C0 M2 ,
   T0.C0 rc ,
   T0.C0 Summary_M2_
   from
  (
   select sum(M2) C0 from SCHEMAX.FACT2
  ) T0
 ) D3


Data:

 Case 3:Report:
 SQL:
select
 (coalesce(D2.ID1, D3.ID1)) ID1 ,
 (coalesce(D2.ID2, D3.ID2)) ID2 ,
 D2.ID ID ,
 D2.M1 M1 ,
 D3.M2 M2
from
 (
  select
   C1.ID ID1 ,
   C2.ID ID2 ,
   C3.ID ID ,
   sum(FACT1.M1) M1
  from SCHEMAX
.C1 C1,
   SCHEMAX.C2 C2,
   SCHEMAX.V_C3 C3,
   SCHEMAX.FACT1 FACT1
  where FACT1.C1 = C1.ID
   and FACT1.C2 = C2.ID
   and FACT1.C3 = C3.ID
  group by C3.ID, C1.ID, C2.ID
 ) D2
 FULL OUTER JOIN
 (
  select C1.ID ID1 ,
   C2.ID ID2 ,
   sum(FACT2.M2) M2
  from SCHEMAX.C1 C1,
   SCHEMAX.C2 C2,
   SCHEMAX.FACT2 FACT2,
   (
    select C3.TOTAL TOTAL
    from SCHEMAX.V_C3 C3
    group by C3.TOTAL
   ) C3
  where C1.ID = FACT2.C1
   and C2.ID = FACT2.C2
   and C3.TOTAL = 'Total'
  group by C1.ID, C2.ID
 ) D3
 on D2.ID1 = D3.ID1 and D2.ID2 = D3.ID2;


Data:

 
  Case 4:Report:












SQL:

select
  (coalesce(D2.ID1, D3.ID1)) ID1 ,
  (coalesce(D2.ID2, D3.ID2)) ID2 ,
  D2.ID ID ,
  D2.M1 M1 ,
  D3.M2 M2
from
 (
   select
    C1.ID ID1 ,
    C2.ID ID2 ,
    C3.ID ID ,
    sum(FACT1.M1) M1
   from SCHEMAX.C1 C1,
    SCHEMAX.C2 C2,
    SCHEMAX.V_C3 C3,
    SCHEMAX.FACT1 FACT1
   where FACT1.C1 = C1.ID
    and FACT1.C2 = C2.ID
    and FACT1.C3 = C3.ID
   group by C3.ID, C1.ID, C2.ID
 ) D2
 FULL OUTER JOIN
 (
   select C1.ID ID1 ,
    C2.ID ID2 ,
    sum(FACT2.M2) M2
   from SCHEMAX.C1 C1,
    SCHEMAX.C2 C2,
    SCHEMAX.FACT2 FACT2,
    (
     select C3.TOTAL TOTAL
     from SCHEMAX.V_C3 C3
     group by C3.TOTAL
    ) C3
   where C1.ID = FACT2.C1
    and C2.ID = FACT2.C2
    and C3.TOTAL = 'Total'
   group by C1.ID, C2.ID
 ) D3
 on D2.ID1 = D3.ID1 and D2.ID2 = D3.ID2;

Data:

Saturday, December 1, 2018

How to build a practical planning and forecasting data model for Cognos reports

Requirement
Given that Actual data is provided monthly,  forecast data is provided quarterly for months up to next October, Plan data is provided annually for current October to next October. Sample below

 It is requested to report for current quarter, current fiscal year and current fiscal year to date




Approach       


Assume that Historical reports is not needed to reproduce.  
Report authors don't need to create measure on the fly, instead, the logic will be embedded in data mart.       
Report data should be designed as same granularity and addable.       

Solution
The idea is to wrap the logic into ETL as below
Combined actual: move forecast data from future periods into actual to report actual data for aggregated periods such as quarter, FY.    
Combine forecast: move forecast data from previous quarters to align to current quarter to report forecast data for aggregated periods such as quarter, FY and FYTD
Combined plan: move plan data from previous year to align to current year to report plan data for aggregated periods such as quarter, FY and FYTD.






Dataflow       
Step 1: Load combined plan data
Step 2: Normalize forecast data
Step 3: Load actual data
Step 4: load combined forecast data
Step 6: load combined actual data

Thursday, November 1, 2018

How to Transpose / Pivot millions of records efficiently in SQL/Netezza

SAS has the transpose process, SQL server and Oracle has pivot function, while both  transpose and pivot functions are not supported in Netezza version. This short document is intended to introduce a very efficient way to query data to retrieve the pivot result. Please see sample below

CREATE TABLE STUDENTS (SID VARCHAR(11)) ;
INSERT INTO STUDENTS VALUES ('ADAM') ;
INSERT INTO STUDENTS VALUES ('BRIAN') ;
INSERT INTO STUDENTS VALUES ('CHARLES') ;
INSERT INTO STUDENTS VALUES ('DAVE') ;
SELECT * FROM STUDENTS;
 

SID
DAVE
CHARLES
BRIAN
ADAM

 

CREATE TABLE MARKS (SID VARCHAR(11), SUBJECT VARCHAR(11), MARK INT2) ;
INSERT INTO MARKS VALUES ('ADAM', 'ENGLISH', 20) ;
INSERT INTO MARKS VALUES ('ADAM', 'HISTORY', 40) ;
INSERT INTO MARKS VALUES ('ADAM', 'SCIENCE', 60) ;
INSERT INTO MARKS VALUES ('BRIAN', 'HISTORY', 60) ;
INSERT INTO MARKS VALUES ('BRIAN', 'SCIENCE', 80) ;
INSERT INTO MARKS VALUES ('CHARLES', 'ENGLISH', 60) ;
INSERT INTO MARKS VALUES ('CHARLES', 'SCIENCE', 20) ;
INSERT INTO MARKS VALUES ('DAVE', 'ENGLISH', 80) ;
INSERT INTO MARKS VALUES ('DAVE', 'HISTORY', 20) ;
INSERT INTO MARKS VALUES ('DAVE', 'SCIENCE', 40) ;
SELECT * FROM MARKS;

 
SID
SUBJECT
MARK
ADAM
ENGLISH
20
ADAM
HISTORY
40
ADAM
SCIENCE
60
BRIAN
HISTORY
60
BRIAN
SCIENCE
80
CHARLES
ENGLISH
60
CHARLES
SCIENCE
20
DAVE
ENGLISH
80
DAVE
HISTORY
20
DAVE
SCIENCE
40

 

SELECT S.SID, E.MARK AS ENGLISH, H.MARK AS HISTORY, C.MARK AS SCIENCE
FROM STUDENTS AS S
LEFT OUTER JOIN MARKS AS E ON (S.SID = E.SID AND E.SUBJECT = 'ENGLISH')
LEFT OUTER JOIN MARKS AS H ON (S.SID = H.SID AND H.SUBJECT = 'HISTORY')
LEFT OUTER JOIN MARKS AS C ON (S.SID = C.SID AND C.SUBJECT = 'SCIENCE')
ORDER BY 1 ;

 
SID
ENGLISH
HISTORY
SCIENCE
ADAM
20
40
60
BRIAN
NULL
60
80
CHARLES
60
NULL
20
DAVE
80
20
40

  

Actually, this sample above is the excerpt from Netezza support. However,the most efficient way is use the query below with two advantages:
1.    Only one table is used
2.    Not any joins are needed
3.    The performance is much better than approach above

 
SELECT
SID
,MAX(CASE WHEN SUBJECT = 'ENGLISH' THEN MARK ELSE NULL END) AS ENGLISH
,MAX(CASE WHEN SUBJECT = 'HISTORY' THEN MARK ELSE NULL END) AS HISTORY
,MAX(CASE WHEN SUBJECT = 'SCIENCE' THEN MARK ELSE NULL END) AS SCIENCE
FROM MARKS
GROUP BY SID
ORDER BY 1 ;

SID
ENGLISH
HISTORY
SCIENCE
ADAM
20
40
60
BRIAN
NULL
60
80
CHARLES
60
NULL
20
DAVE
80
20
40