- Hold onto the fact until all dimensions arrive
- Create a dimension called ‘unknown’ or ‘not available’ with a primary key of -1
- Insert a dummy row in the dimension and populate it with whatever you can
For the sake of easy explanation, dimension location in project is used to demonstrate this idea. For this project, approach 1) and 3) are applied. The bottom line is that all Locations must be mapped before sales fact is populated into production database. In case when newly Location map can’t be found, a dummy location map is generated. This case is very reasonable, as the sales arrive earlier than the outlet file (called banner file) arrives. Sales data could be loaded 4 weeks earlier than banner file. Therefore, when banner file arrives, we need to have a process to allow user to resolve
1) dummy outlet, and
2) remap existed map when the map is confirmed wrong.
The issue here is how to reprocess data. Based on current ETL framework (or dataflow) below
INBOUND table == > HIST table == > BASE table == > AGGREAGTE table
There are two approaches: Reload files, or reprocess impacted records
- Reload files
This approach has a lot of disadvantages:
- Difficult to identify all impacted files, currently based on business user’s expertise
- There are too many files, each week has 11 wholesalers
- Process is very heavy
- Reprocess impacted records only without reloading files
This is an obviously better idea, however, this approach can only work when the HIST table contains original location identifiers or natural key from each wholesaler.
Detail for reprocessing outlets:
- Based on newly maps and HIST table (with original location identifier column using normalized column name, or natural key) , the incremental dataset are identified.
- Rerun the incremental load, starting from
== > BASE table (with surrogate key)
== > AGGREAGTE table (with surrogate key)
The bottom line is that natural key (or original identifier) must be preserved for transaction reprocessing. The natural key should be saved into HIST table.