Saturday, April 8, 2017

Load data to Coherence in bulk mode

Typical initial load of data into Coherence cluster is based on  series of individual put operations. Such technique requires massive number of network trips. This note shows how it may be performed faster, by enabling batching. 50 words, 2 diagrams. 

Typical data load looks like this:





System is busy, but a lot of network noise is generated. Such model may be optimised by using multiple threads, but anyway network noise will be generated.


Optimised, by using batches, data load looks the following:




Inserting data entry processor have to solve problem of potential cluster rebalancing performed during processing of the entry processor. In such situation entry processor will be executed on a new member, however some of carried objects will not belong into the same partition anymore. To handle this problem, entry processor have check if key belongs to local backing map (how?) and return to client list of result operations as a pair of key:result[true|false]. Client will be able to resend failed inserts. 

Batched data load may be further optimised by sending each batch (communication with each member) from separate thread.






1 comment:

  1. Somatic coherence techniques Awesome article, it was exceptionally helpful! I simply began in this and I'm becoming more acquainted with it better! Cheers, keep doing awesome!

    ReplyDelete