Data Integration is a mission-critical Business Application problem.
Connecting to Data source is pretty straight forward. Zapier, Boomi, Connection-Cloud all offer out-of-the box datasources.
While its true that there are many cloud-based Data Integration Tools, but I picked up one tool Boomi.
We would discuss the Best Practices and omit the internal details for brevity.
Welcome to world of Visual Programming paradigm!
(1) Create your own personal widget/portlet/webpage that can be embedded in an iframe
– The Widget should be able to capture preferences, datasource connection info, custom properties.
– Drive the Integration through the configurable widget
(2) Design your main process to co-ordinate between all child processes. The Integration actually should provide mechanisms to add logic, create rule, query data, mashup services etc.
(3) First initiate a batch by requesting the batch start time and other contextual data (tenant, application etc. )
>> In SaaS model its always very important to initialize the Business Context.
(4) You make first call to external Data Ingestion Service by providing some hints in payload structure (app_code, app_date_format, app_timezone ) and Data Ingestion Service in return send back response (app_qry_from_time, app_qry_to_time, batch_id, app_id, tenant_id)
(5) Within the child process, map the source documents to target schema with the help of a Business Analyst.
So now that we know the correct time range within which Integration Tool need to fetch data from Biz App.
>> CONNECT (Biz App Source) -> FLOW RULE (Record New/Updated/Deleted) -> MAP (Record to target XML/Json model) -> Http POST (mapped records to Data Ingestion Service).
Well you can definitely add Try/Catch , Retry count, Error Reporting steps.
>> You can make the target Model a bit intelligent by embedding the contextual info directly inside data during the mapping step – i.e. dynamically add batch_id, app_id, tenant_id, parent/child info, implicit table names (same as parent node name), specific flags.
(6) Send the stream the data to a target data warehouse.
>> Finally terminate the batch and send the batch finalization data (end time, application name, tenant name etc.) to Batch Synchronizer.
Advanced Usecases :
>> Its very important to convert the business data timestamp to correct timezone.
>> Combine documents received from Data source and split into smaller set of mega documents.
For example if we receive 10000 records from source, the Integration step should be able to combine them into one mega doc of one record.
>> and then split into say 10 docs each with 1000 records
>> then we send only 10 documents over wire to target Data Ingestion service (FLOW Control step)
>> Many Business APPS API has Rate limiting.
So inside your Integration Process you should be able to check response code or some other API attribute (fecth count) to figure out if there are more records to loop through; accordingly reset the previous query_start time to new query_start_time and fetch next set of records. [ Pagination]