DocumentDB revisited Part 3 – Concurrency in DocumentDB

In my series of DocumentDB I will discuss more advanced features of DocumentDB. This post covers concurrency options for DocumentDB but starts with an overview of the problems with concurrency in general.

Concurrency options in DocumentDB

What is concurrency?

Consider the scenario that you have a hotel room booking application. You have a web site that allows people to book rooms. The key scenario here is that there may be several users looking at the same available (at the time of loading) hotel room for the same night. You typically want to avoid that more than one person books the same room. So when the two users push the “Book room” button on the web page you can only allow one of them (typically the first one to win and get the reservation confirmed). So for a system allowing more than one source/actor/system/user to edit the same data you need a concurrency strategy.

General Concurrency Methods

Three of the most common strategies to handle these are:

  • Optimistic Concurrency: This strategy knows there could be other actors updating the same data as you are working with. This approach does however assume that this is likely not the normal case – but rather has the optimistic approach that it will hopefully succeed. So you would typically update the data in your process and when you try to commit your changes (like a save to the database) you will get a green light or not. To determine whether you get the green light or not you would have an (implicit or explicit) Version Number,TimeStamp or similar construction and only allow update if the version/timestamp is the same as when you read the data. If not you will get a concurrency exception and your save will fail. It is then up to you to load the latest data and merge your changes with the refreshed data or just to cancel your changes and start over. This approach suits read-intensive applications best where the chance of concurrency exceptions is a low percentage and the drawback of having to say to the actor that his save failed is acceptable.
  • Pessimistic Concurrency: Just as the words suggests this is the opposite of optimistic concurrency. This approach is targeting to limit the number of times we press the save button and get the result that someone else has updated your data. It is done by setting some kind of lock on the data so that only you can update the data. Once the lock is in place you can be assured that nobody else “steals” your record/data/document. The more likely a concurrency situation is to occur the more efficient this strategy is. This can be combined with setting the lock for a certain period to avoid that something gets locked forever. Consider when you visit the cinema website and you first reserve some seats before you pay. The actual booking is not complete until you pay but you don’t want people to go through a long process just to discover that the seats you selected was booked by someone else.
  • Last-Write Wins: This is a strategy that you should only use for certain scenarios where it is OK that the last write is ok. Maybe you get fresh stats from a sensor and you are only interested in the last reading. In this case you could just overwrite the old data no matter what. It is not suitable for the hotel room or movie seat reservation scenarios described above. The benefit of this is that it allows a higher throughput and no errors and will take fewer resources than the previous strategies. This is the default concurrency update in DocumentDB. This may sound strange but if you think about it this is the default in SQL databases as well (though on a more granular level – two  update statements after each other will have the second overwrite the first one).

DocumentDB Optimistic Concurrency

The examples to update documents presented in the last part of the post series used a last-write-wins concurrency philosophy. This means I cannot use it for the two more complex scenarios above.

DocumentDB is designed to read/write with extremely high throughput so be aware that adding support for optimistic or pessimistic concurrency will affect throughput. DocumentDB has built-in support for Optimistic Concurrency but we will have to adjust the DocumentDB repository that we created last time.

Understand the ETag

Each DocumentDB document has an ETag property that is updated each time the document is updated and you can enforce that the ETag must be identical to what you opened before when you replace the your document.

The problem with the ETag is that it is attached to the document not the stored JSON object. So all methods we have for fetching a typed Order and write typed Linq statements against the order is effectively made unusable. You shouldn’t (in my opinion) manually add the ETag to the orders object as JSONProperty either. In theory you could add a property in your objects like

[JsonProperty(PropertyName = “_Etag”, NullValueHandling = NullValueHandling.Ignore)]

public string ETag { get; set; }//Don’t add this….

 

But that would serialize the old (before update) Etag into your document when you Replace the document so I would recommend using the other approach that I will explain further down.

This means that we need to work with other methods if we plan to update the Documents with Optimistic Concurrency. Note that if you only want to read data the GetItem with LINQ/Lambda syntax is still the preferred method. But for this scenario we cannot work directly with the Order object but we rather need to fetch the Document instance instead or work directly with dynamic (untyped) objects. The two following methods should do the trick.

public static IEnumerable<T> GetItems(string qry){

return Client.CreateDocumentQuery<T>(Collection.DocumentsLink, qry).AsEnumerable();

}

Not that these the second method just differs that it allows you to define the return type in the method instead of creating a new DocumentDBRepository if you have already declared it as DocumentDBReportitory<Order>.

Now that we can return the data we can work with the object in some different ways.

The first alternative

  1. fetches a DocumentDB document (not an Order) through the SQL like syntax. Note the <Document> declaration that forces the result to be a DocumentDB object.
  2. To simplify working with the Order in the Document we cast it through dynamic to our Order object with var theOrder = (Order)(dynamic)orderDoc;.
  3. We change the FirstName
  4. We replace the document (no optimistic concurrency yet). Notice the Etag changes before and after the save. 

var orderDoc = DocumentDBRepository<Document>.GetItems(“SELECT * FROM ORDERS WHERE ORDERS.id = ‘webShop635874323254478467′”).FirstOrDefault();

var theOrder = (Order)(dynamic)orderDoc;

System.Diagnostics.Debug.WriteLine(theOrder.Customer.FirstName + “. Etag “ + orderDoc.ETag);

theOrder.Customer.FirstName = “Svenneman”;

var answer3 = await DocumentDBRepository<Document>.UpdateDocumentAsync<Order>(orderDoc.SelfLink, theOrder);

var theNewOrder = (Order)(dynamic)answer3;

System.Diagnostics.Debug.WriteLine(theNewOrder.Customer.FirstName + “. Etag “ + answer3.ETag);

Alternative. This is very similar to the code above but we get the dynamic return value  instead it and then cast it into one Order object Order order = (Order)orderquery5 and one DocumentDB document Document doc = (Document)orderquery5;.

var orderquery5 = DocumentDBRepository<dynamic>.GetDocuments(“SELECT * FROM ORDERS WHERE ORDERS.id = ‘webShop635874323254478467′”).FirstOrDefault();var readDocument = orderquery5.Resource;

Order order = (Order)orderquery5;

Document doc = (Document)orderquery5;

System.Diagnostics.Debug.WriteLine(order.Customer.FirstName + “. Etag “ + doc.ETag);

order.Customer.FirstName = “Nisse Dyn”;

var answer = await DocumentDBRepository<Document>.UpdateDocumentAsync<Order>(doc.SelfLink, order);

Order orderUpd = (Order)(dynamic)answer;

System.Diagnostics.Debug.WriteLine(orderUpd.Customer.FirstName + “. Etag “ + answer.ETag);

In both the methods above we get access to the ETag of the document (see the debug.log to track the ETag) so we can start implementing the optimistic concurrency.

Configure DocumentDB for optimistic concurrency using the ETag

So now we have prepared for usage of Optimistic Concurrency. Now it is just to make sure that we use it. The way DocumentDB implements Optimistic Concurrency is by using AccessConditions. These are added as RequestOptions in the ReplaceDocument API call.

I added the following two overloads in the DocumentDBRepository. One accepts dynamic and one that accepts a Document (I recommend using the Document overload)

public static async Task<Document> UpdateDocumentOptimisticAsync(dynamic item)

{

var doc = (Document)item;

var ac = new AccessCondition { Condition = doc.ETag, Type = AccessConditionType.IfMatch };

return await client.ReplaceDocumentAsync(doc, new RequestOptions { AccessCondition = ac });

}

public static async Task<Document> UpdateDocumentOptimisticAsync(Document item)

{

var ac = new AccessCondition { Condition = item.ETag, Type = AccessConditionType.IfMatch };

return await client.ReplaceDocumentAsync(item, new RequestOptions { AccessCondition = ac });

}

The important part is the AccessCondition where I add the loaded ETag as a condition and set the AccessCondition to IfMatch (enforcing that the ETag must be like the one supplied.

The call to the ReplaceDocumentAsync is similar to before except that it adds a RequestOptions parameter using the AccessCondition just declared.

What will happen is that the ReplaceDocumentAsync will generate and Exception.To know that is is a ConcurrencyException we need to catch a DocumentClientException exception and thereafter check the StatusCode which should be PreconditionFailed (rather cryptic name but this is what you need to check for). I post the whole test scenario below.

  1. Load two instances of the same Document
  2. Update the FirstName on both (it does not matter if I set it to the same value or not)
  3. Replace the Document with the first instance
  4. Try to update the Document with the second instance (which will fail)
  5. The errorhandler will check the StatusCode to determine whether it is a concurrency exception or not.
  6. Note that what you do to resolve this is up to you. In some cases you may say to the caller that the update failed but in others you may want to merge the data.

 

//Open instance 1

var orderDoc1 = DocumentDBRepository<Document>.GetItems(“SELECT * FROM ORDERS WHERE ORDERS.id = ‘webShop635874323254478467′”).FirstOrDefault();

var theOrder1 = (Order)(dynamic)orderDoc1;

System.Diagnostics.Debug.WriteLine(theOrder1.Customer.FirstName + “. Etag “ + orderDoc1.ETag);

//Open instance 1

var orderDoc2 = DocumentDBRepository<Document>.GetItems(“SELECT * FROM ORDERS WHERE ORDERS.id = ‘webShop635874323254478467′”).FirstOrDefault();

var theOrder2 = (Order)(dynamic)orderDoc2;

System.Diagnostics.Debug.WriteLine(theOrder2.Customer.FirstName + “. Etag “ + orderDoc2.ETag); //Same ETAG as instance 1

theOrder1.Customer.FirstName = “This is OK”;

theOrder2.Customer.FirstName = “This should fail”;

 

var answer1 = await DocumentDBRepository<Document>.UpdateDocumentOptimisticAsync(orderDoc1); //Update 1st instance

var theNewOrder = (Order)(dynamic)answer1; //Read the result Document

System.Diagnostics.Debug.WriteLine(theNewOrder.Customer.FirstName + “. Etag “ + answer1.ETag); //Etag changed after 1st update

//Try to update the second document

try

{

var answer2 = await DocumentDBRepository<Document>.UpdateDocumentOptimisticAsync(orderDoc2);

}

catch (DocumentClientException docEx)

{

if (docEx.StatusCode == System.Net.HttpStatusCode.PreconditionFailed)

{

//is concurrency exception – take action

System.Diagnostics.Debug.WriteLine(“Optimistic Concurrency error.”);

}

else

{

//is concurrency object – take action

System.Diagnostics.Debug.Write(docEx.Message.ToString());

}

}

 

DocumentDB Pessimistic concurrency

There is no built-in support for handling pessimistic concurrency in DocumentDB so you would have to implement it yourself. I would not recommend this because you have to handle deadlocks, lock expiry and advanced database topics manually. I haven’t read up on permissions on documents yet and maybe that could be a start to limit access to a document temporary (have to return on that issue). Doing it the hard way and implementing your own pessimistic locking mechanism is to a series of posts itself and frankly it will most likely be a bad idea as one of DocumentDBs major benefits isthe fast throughput/access times which you may very likely kill if you add pessimistic concurrency to this component.

Partial Updates

Partial updates are currently not supported in DocumentDB. This means that you will save the entire document upon time of writing. So a scenario of having two order lines being modified by separate workers is not possible and you should use optimistic concurrency and reload the object if it has been updated elsewhere.

 

Advertisements

3 thoughts on “DocumentDB revisited Part 3 – Concurrency in DocumentDB

    1. I just re-ran my sample and still get only the DocumentClientException. My sample used “Microsoft.Azure.DocumentDB” version=”1.5.2″ with targetFramework=”net452″. Which version do you run so I can try to emulate your conditions? And also – which InnerExceptions do you get in the aggregate one?

      Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s