CQRS and CockroachDB
Today, I would like to talk about Command Query Responsibility Segregation (CQRS), how CockroachDB (CRDB) fits into it, and the situations where the two don't make sense together. I have had a number of customers approach me regarding the use of CQRS in their applications, and once we've looked at the drivers for it, the value wasn't there. There are a number of very good reasons to adopt a CQRS pattern for your application, but several of those reasons no longer apply when you are using CRDB as your backend datastore… and you still have the additional complexities and downsides to a CQRS model. I will not be covering all of the reasons to adopt the CQRS pattern, only a subset.
So what is CQRS and what does it do?
CQRS stands for Command Query Responsibility Segregation. The idea is that between your presentation layer and your data layer, all the microservices that do reads of information (queries) from the data layer are on one side of things and do not write to the data store. Similarly, all of the microservices that perform writes on the data layer (commands) are on the other side and do not perform any reads on the data layer. If a microservice in the command path needs information, it gathers that data from the APIs of the microservices in the query path.
Way back when, as applications began to use microservices, there were three layers. The UI layer communicated with the BL, which was made up of microservices, and the BL would communicate with the data layer, which housed the DB.
This allowed for scaling of additional UI capacity and additional BL capacity. Additional nodes could be added to the overall system in those two layers. The problem was that eventually the demands put upon the DB could not be met by a single server. Eventually, you get to a point where you cannot put any additional CPU, memory, and storage resources into that one server. It just won't go any faster no matter how hard you try. The desire was to be able to scale the DB similarly to the BL or UI. This is where CQRS came about.
It was recognized that many workloads do far more reads than they do writes. So an early option was to stand up additional nodes to run read-only copies of the DB. This way, the read workload could scale further and remove that burden on the primary DB, where writes occur. At this time, there were no readily available solutions to have multiple nodes in which you could perform writes at the same time.
In order to have this replication from the primary DB to one or more read-only copies, the read-only copies were eventually consistent. Updates are streamed from the primary to each of the read-only copies. There was no assurance that the R/O copies were up to date. But if the application could tolerate data that was inconsistent, the read side of the DB cluster could be scaled at will. But to do this, you can't have microservices writing to the read only copies of the data, as there is no chance that the updates would get synchronized back to the primary DB. So all of the microservices need to be broken up into ones that read (query), or ones that write (command). All of the command path microservices will talk to the primary DB, and the DB updates will get synchronized to the R/O copies. And all of the microservices that read data, in the query path, will connect to the R/O copies of the DB.
This obviously adds a good bit of complexity to the overall system. Additionally, if part of your presentation layer needs to switch from a write context to a read context for a user, there is even more complexity that needs to be addressed in your UI. So now our overall system, which will be issuing both commands and queries, is eventually consistent. Even if the data layer matures and it can be assured that every R/O copy of the DB is instantly up to date as the primary DB changes, the overall system is still eventually consistent.
What happens though if we can scale the database such that any node can perform reads or writes and we can ensure consistency of the data across all the nodes? Simplistically, this is what CockroachDB does along with a number of other capabilities.
And if our command microservices and query microservices can utilize the same data structures, why do we need to divide them up? We now have a scalable database that is entirely consistent from any entry point, our microservices are broken apart by data domain or function (as opposed to command or query) and the overall system becomes consistent again.
In this last scenario, we don't need CQRS because the DB can scale as needed. We don't need the capability to process reads and writes separately, because every CRDB node can do either. The whole CRDB cluster can handle either workload and we maintain data consistency throughout. We can also make the application design simpler than we would in a CQRS pattern. And we no longer need to manage complex context switches from reads to writes in the UI or BL. *IF* none of the additional capabilities of CQRS are needed… and the read and write workloads can use the same data structures in one large data store… and you need your data to be consistent… then this is a much easier system to implement.
I was recently working with several customers in a highly regulated industry that were building new applications and wanted to use the CQRS pattern. They were excited about it until I pointed out that they would lose consistency in the overall system, and parts of the logic would become much more complicated. We continued to go through the requirements, and found that their read and write workloads could use the same data structures, and they did need consistent data. We followed the line of discussion above and found that implementing a CQRS pattern in combination with CRDB only made the development efforts more complicated, the system inconsistent, and didn't provide any of the CQRS benefits that they were hoping for. One of them did have a need for the OLTP data to be streamed into a OLAP data store for additional analysis, but that was treated as a side branch. They set up an OLAP data store where inconsistent data was acceptable and all of the analytical queries were executed there.
Comments
Post a Comment