Salient Features :

Application data need to be fetched periodically and consistently in order to preserve history.

Batches can be replayed to restore data from staging.

Batch = { time-series data for a specific start and end time for a set of objects of a specific app and specific tenant }.

Batch_Log acts like time machine

Batch(current)_qry_start time = Batch(last_successful)_qry_end_time

DIP (Data Integration Platform) should conform to the CDTP ( Contextual Data Transfer Protocol) by injecting contexts (app-id, batch-id, time, schema info etc.) into raw data so that DCP receives context-aware data and does not need to process the data to figure out which tenant, which batch or which table it belongs to !

DCP  (Data Collection Platform) should be completely stateless and ready to serve multiple requests.

self-defined independent immutable contextual records ensure data isolation, data sharding, data scalability, rollback, contextual backup, ordering, auto-restartability, error-recovery, contextual view (app-specific / tenant-specific / batch-specific / time-rang specific ) , contextual analysis (data consumption pattern for tenants/ apps/ objects), automatic schema generation

Consistent historical time-series data will allow many ad-hoc and pre-computed analysis (data consumption pattern, hidden structures and relations).

 

  1. DIP sends & DCP receives : relevant Batch initialization info – (‘User_Id‘,’Pwd’, ‘App_Id‘, ‘Timezone’)

     /dcp/api/entities/batch/initialize

  1. DCP creates : Batch Info in the batch_log table
  2. DCP sends : CONTEXTUAL INFO

(‘App_Id’,’Batch _Id’,’Tenant_Id‘) and a SLIDING WINDOW of ‘Time_Range‘ to fetch data.

  1. DIP injects : the CONTEXTUAL INFO into raw records and queries DSP using the SLIDING WINDOW of TIME RANGE.

>> Raw Data becomes App-aware, Tenant-aware, Batch-aware

  1. DIP injects : SCHEMA HINTS like if record is Parent or is Child etc. inside the raw records.

   >> Raw Data becomes persistence-ready, Schema-aware

  1. DIP sends Create/Update request : for CONTEXT-aware Intelligent DATA

Split Documents into subsets and send each subset of records in parallel (using multiple threads)  to leverage http packet size and reduce number of http calls .

Since data already contextual, we don’t worry about order of records and thus we send microbatches .

/dcp/api/entities/add ,  /dcp/api/entities/update ,  /dcp/api/entities/delete

  1. DCP persists : individual record automatically in correct schema through — Automatic schema resolution, and Dynamic data source discovery .

– create a request entry in stgrequestlog (with status – INIT )

– add all the records using mysql bulk write

– configure tomcat to use threadpool and NioConnector

– configure MySQL to increase cache, buffer size, io capacity

– configure NGinx and tune data delivery params for higher throughput

http://www.cyberciti.biz/tips/linux-unix-bsd-nginx-webserver-security.html )

– leverage callable / deferred result / async servlet

– update the request with persisted record keys and update the

status (ERR / DONE )

  1. DCP sends: response message (errorCode, errorMessage ) to Boomi.
  2. DIP sends : ‘Batch termination info’

(‘App_Id’,’Batch_Id’, Tenant_Id, Start_Time, End_Time, etc.)

/dcp/api/entities/batch/finalize

  1. DCP updates : the Batch Info
Advertisements