Cosmos DB: Lessons from the field

Deepu Bhatia
3 min readJun 8, 2021

--

Part I: The case of the missing change event

Over the last twenty five plus years of my career as an OLTP database solutions architect, I’ve had the great fortune of working on some pretty phenomenal transactional use cases. A majority of these use cases were customer or site facing transactional systems running at scale and required data distribution across multiple regions. Every once in a while, you come across a technology that ignites a spark. It fires up that inner spark piquing your interest. Cosmos DB has been one such spark for me.

Over this series, I’ll share some Cosmos nuggets aka best practices or patterns with you. I came across these nuggets as we went through the journey of migrating some of our most critical use cases from a traditional RDBMS database over to Cosmos.

Azure Cosmos DB provides a change feed mechanism to stream changes for downstream systems. This is an extremely nifty feature that opens up a lot of potential capabilities. In fact, Cosmos was one of the very first NoSQL databases to provide this capability. It is similar to Oracle Golden Gate and appealed to most application teams migrating from Oracle to Cosmos.

Cosmos change feed in action

However as we dug deeper into Cosmos Change Feed for the SQL API, we came to the realization that it does not guarantee all changes will be captured and published in the feed. In fact, the current change feed implementation does not capture hard deletes. It is also possible that intermediate updates may be lost. As demonstrated in the above example, the intermediate update of K1,V3 was lost in the change feed. Inserts are the only operation that are guaranteed to be available in the change feed pipeline. Let’s try to understand this better with an example. Consider the following data model snippet in Cosmos.

Cart data model example

The above model consists of the following documents:

  1. A CARTHEADER document
  2. Two LINEITEM documents

Cosmos allows us to use the above data model comprising of multiple documents and create it as a single transaction. This can be achieved using either a stored procedure or a transactional batch to persist multiple documents as long as all of the documents share the same partition key. In effect, this ensures all documents relating to the same partition key are persisted into a single logical partition that resides on a single physical server. So transactions can be ensured at that point.

Now lets assume that we have set up a change feed stream and feed a downstream system. Two updates comes in and change the quantity of the LINEIITEM document for itemId I55.

a) from 5 to 6 and subsequently

b) from 6 to 10

It’s possible that the intermediate quantity update (a) isn’t picked up by the change feed. If the downstream system is attempting to audit these changes, it would have just incurred data loss!

One way to workaround this issue is to emit a business event document. This document would always be an insert and hence always captured by change feed. The document could simply capture the old and new values and be included as part of the transaction above.

Cart model with a change feed business event document

Using the above business event document, we can now ensure that the change feed stream is able to capture each and every mutation. Similarly, we can also emit another document when the quantity is updated from 6 to 10. A side effect of emitting these business events are that the overall storage requirements will start creeping up. This can be controlled by setting a TTL (time to live) for the change feed business events. The events will automatically expire out after the TTL duration is attained and Cosmos will clear them out.

--

--

No responses yet