We initially used Kettle and then replaced it with Spring Batch and Integration.
Data Integration Process sends data streams to Data Ingestion Process which in turn kicks off ETL process.
Every Tenant should have an ETL flow
All records from table can be first serialized into a binary flat file. Then during an ETL transformation the binary file can be deserialized into record streams.
All transformation steps should share common DB Connection pool.
Always ensure the changes to regular dimensions are cascaded from the fact table. The cascade effect occurs when there is a change in a regular dimension whose primary key is referenced within the fact table where there is no change in the fact itself, because that primary key needs to be updated to the most current primary key of the regular dimension to point the latest change.
Job Flow For Processing Transformations goes through the stages of New / Upd / Del operation
Stored-procedures in BI are for pre-populating date dimensions per tenant, local time zone calculation, injecting future date dimensions, hierarchy of security labels.
For example populating the date dimensions should be done once per tenant. So these one-time startup actions are maintained in a metadata table. The entire job should run only for successful jobs. Upon startup it should check if job already running, if rollback should be performed, mark job successful, set the last batch id.
All Transformation steps should run in parallel on same data streams. For example while one step is populating a regular dimension, another a junk dimension and other a fact table. So each step should satisfy a specific business requirement.