MongoDB works as – Basically Available, Strongly Consistent ,  Partition Tolerant DB !

Its very important to understand MongoDB will not be 100% available when  a DC is down if QUORUM is not satisfied in order to gracefully manage network partition and guarantee strong consistency of data.

Fault Tolerance Table :

Number of Members.

Majority Required to Elect a New Primary.

Fault Tolerance.

3

2

1

5

3

2

6

4

2

StackOverflow Discussion :  http://stackoverflow.com/questions/28389365/what-is-recommended-configurations-of-mongodb-replicaset-2-dcs-for-automatic-p

Lets assume  DC1 has 2 nodes and DC2 has 3 nodes :

if DC1 is down , DC2 will work fine ( as 3 out of 5 are pingable)  , 

but if DC2 is down , (only 2 out of 5 are pingable) , DC1 will not select  a primary automatically  as Quorum rule will break and we need to manually force a primary and activate an arbiter

Lets dive deeper into CAP theorem to analyze the above use case !

1.  Consistency :  Each client always has the same view of the data. So you can read or write to/from any node and get the same data across the cluster .

How  does MongoDB ensure Consistency ?

(i) Mongo will always write to the Primary first

(ii) Mongo will always enable journaling to survive power failure

(iii) we can configure mongo-client to ensure that write operations succeed on all members before completing successfully

 (set write-concern as replica set, w=2)

Ref:  http://docs.mongodb.org/manual/reference/connection-string/

2.  Availability :  Ability to access the cluster even if a node in the cluster goes down . All mongo-cleints can always read and write.

How  does MongoDB ensure Availability ?

 Replica sets provide high availability .  If the unavailable mongod is a primary, then the replica set will elect a new primary based on the fault-tolerance table (mentioned earlier)

set read_preference : primary_preferred  ( try reading from primary , if not available then read from secondary )

3. Partition Tolerance : cluster continues to function even if there is a “partition” (communications break) between two nodes (both nodes are up, but can’t communicate).

How does MongoDB survive Network Partition ?

(i) automatically select a primary when a primary is down and quorum is satisfied

(ii) auto resync when a node rejoins replica set i.e. becomes pingable again !

(iii) auto shard sync-up

(iv) if a primary is down or majority down and quorum is not satisfied , then primary will not be elected and current primary (if any) will step down as secondary.

Now let consider this scenario :  suddenly N/W communication breaks between Node X and Node Y, so they can’t synch updates .

At this point a DB can either :

A) allow the nodes to get out of sync (giving up consistency – don’t have close affinity with participating nodes),  or

B) consider the cluster to be “down” (giving up availability – very close affinity …  unable to ping peers )

MongoDB prefers option B)  and lets the primary step down as secondary ( still allowing clients to query data)

Why ?   Its the age-old confusion :   “if something is not pingable is it down?”  

Ref :  http://en.wikipedia.org/wiki/Quorum_%28distributed_computing%29  >> most of the CP systems implement Quorum as the basis for ‘Leader Selection Algo’

>> A quorum is the minimum number of votes that a distributed transaction has to obtain to enforce consistent operation in a distributed system.

Out of 3 nodes only 1 is pingable  =>  so primary can’t ping majority (I.e. at least 2 nodes)

> so PRIMARY doesn’t know

    — if other nodes are all still up ? 

    — is it the only primary or just a cut-off node

    —  if other nodes have elected a new PRIMARY ?

    —  if there is a Network Partition  ? 

    —  how to fulfill the promise for Strong Consistency.

> So PRIMARY eventually steps down precisely to preserve the integrity of the replica set as a whole.

Deployment Architecture Reference :

http://docs.mongodb.org/manual/core/replica-set-architectures/ ,

http://docs.mongodb.org/manual/tutorial/deploy-geographically-distributed-replica-set/ ,

http://docs.mongodb.org/manual/tutorial/add-replica-set-arbiter/

What should we do to force the secondary to become PRIMARY again ?

http://docs.mongodb.org/manual/tutorial/reconfigure-replica-set-with-unavailable-members/#replica-set-force-reconfiguration

Advertisements