Posts

CockroachDB and precision clocks

I was recently working with a situation where CockroachDB nodes were running as VMs on VMware hosts.  The difficulty experienced was that when the VMs went through a vmotion, that the hosts would end up flapping upon the completion of the vmotion.  They would end up flapping for up to 20 minutes.  Obviously, having nodes bouncing up and down is not desirable and could lead to unavailability of data if other maintenance activities are happening concurrently, such as a repave or upgrade, or result in a diminished amount of computational resources.  If the node could not successfully rejoin the cluster within five minutes, then the remainder of the cluster would start to up-replicate any data that existed on the down node.  This puts yet an additional load on the remaining nodes in the cluster as it tries to self-heal. Historically, the VMs running CockroachDB were utilizing NTPD, synchronizing every 11min, on the guest OS to keep the clocks reasonably well align...

Bulk Update of data in CockroachDB

  Working with customers always provides a large number of topics to discuss; however, in recent weeks the same type of scenario has surfaced a number of times.  "I have a very large table and I want to update a large portion of these records.  I wrote an UPDATE sql statement but it never finishes." So how does this situation come about?  There are three main scenarios that have repeatedly led to the need to bulk update data within CockroachDB. Data from a legacy datastore has been imported into CockroachDB without any sanitization.  Ideally, some form of ETL workflow has been utilized and only clean data is being stored in CockroachDB. An application inserts data over a period of time that also hasn't been properly sanitized. A business use case arises that entails updating data in place.   Now, once the malformed data is in CockroachDB, something needs to be done to fix the data. Let's take an example table like the one below.   table_name ...

High Availability Software Based Load Balancer for CockroachDB

Image
Recently, I was building an environment for a series of tests that used CockroachDB as the underlying data store.  This happens all the time, but in this instance I had the need to also build a high availability (HA) load balancer (LB) configuration.  The need was for the environment to continue to function even if one of the software based load balancers failed.  I will explain how to build redundant load balancers using keepalived and haproxy in this post. To start out, I built the DB cluster on a number of nodes, spread across multiple availability zones and multiple geographic regions.  For this, I followed the basic instructions from the CockroachDB documentation ( https://www.cockroachlabs.com/docs/v21.1/install-cockroachdb-linux ).  One of the huge benefits of CRDB is that it provides redundancy and distributed execution of queries across the DB cluster.  What it does not include is any kind of load balancing, as that is beyond its purpose.  If ...

CQRS and CockroachDB

Image
  Today, I would like to talk about Command Query Responsibility Segregation (CQRS), how CockroachDB (CRDB) fits into it, and the situations where the two don't make sense together.  I have had a number of customers approach me regarding the use of CQRS in their applications, and once we've looked at the drivers for it, the value wasn't there.  There are a number of very good reasons to adopt a CQRS pattern for your application, but several of those reasons no longer apply when you are using CRDB as your backend datastore… and you still have the additional complexities and downsides to a CQRS model.  I will not be covering all of the reasons to adopt the CQRS pattern, only a subset. So what is CQRS and what does it do? CQRS stands for Command Query Responsibility Segregation.  The idea is that between your presentation layer and your data layer, all the microservices that do reads of information (queries) from the data layer are on one side of things and do not ...

CockroachDB Backups, Exports, and Archives

Today I would like to talk about the differences between backups, exports, and archives as they relate to CockroachDB.  Let's start by deciding, in general, what these three ideas are supposed to accomplish. Backing up data is a concept that nearly everyone in the IT field has encountered at some point.  Many of us have had to work with various backup systems, differing capabilities, and various requirements.  Backups exist to satisfy the use case where we need to restore from some form of disaster or loss and get a system back to an operational state.  There can be various requirements in the form of encryption, retention, frequency, and so forth.  CockroachDB is highly performant in this situation as both backups and restores are executed in a parallel manner, with work spread through the cluster and there is the ability to backup the MVCC history as well.  The syntax for executing backups and restores can be found in the excellent CRDB documentation....