Impact of LeaseHolder Placement on Joins

In this post on CockroachDB, I will delve into the impacts of co-locating range leaseholders together when a multi-table join is executed via SQL. We'll see that when all the leaseholders are geographically close to each other in a distributed system, the performance is much more predictable and latencies are lower. The worst case scenario is when leaseholders for the various elements used in the join are scattered from a geographic perspective.

To start from the beginning, within CockroachDB, all of the data and indexes in a cluster are part of a global KV store. This KV store is sliced up into ranges that usually span 128MB to 512MB in size. Each of these ranges will have a number of replicas (3 by default, but configurable) and one of those replicas of a given range will be elected the LeaseHolder for the range. The leaseholder is used to coordinate all reads and writes for that range of data. For writes, the leaseholder coordinates the quorum based write activities.

Let's start a simulation that includes three different tables. Table A has 2M records, table B has 4M records, and table C has 8M records and is a cross reference table that links another set of external identifiers to a single id in A and a single id in B.

CREATE TABLE A (

account_id UUID PRIMARY KEY,

account_ref_number INT

);

CREATE TABLE B (

card_id UUID PRIMARY KEY,

b_number INT

);

CREATE TABLE C (

reference_id UUID PRIMARY KEY,

account_id UUID REFERENCES A(account_id),

card_id UUID REFERENCES B(card_id),

xref_value uuid,

INDEX xref_value_idx (xref_value)

);

Think of table A as containing account information and table B contains payment information related to those accounts. There may be multiple entries in table B for each entry in table A. That 3rd table, table C, contains an identifier used by a 3rd party to link to a combination of table A and B. The CRDB cluster has six nodes, two each in three regions… us-west, us-east, and europe-west and a replication factor of five. The various UUID fields that interrelate these three tables have the appropriate indexes and foreign key constraints.

For our first query, let's look at "SELECT * from C where C.xref_value = '<my xref UUID>';". This definitely doesn't involve a join, but it is a place to start. At a high level, when I issue this query, the node I am connected to sends the request off to whichever node in the cluster is the leaseholder for the range of data that contains my UUID. This can be done as a point lookup, as we have the needed indexes in place to support this.

The sequence of events looks like this:

I connect to any one of the nodes in the cluster
I issue my query
The node I am connected to knows the leaseholder for the UUID I'm looking up in table C, and forwards the query to that node.
The leaseholder does the point lookup within table C, returning the result to the node I am connected to
The node I'm connected to replies back to me with the result of the query.

As you can see here, this is all well and good if the node I'm connected to and the leaseholder are either the same node, or very close to each other geographically. If they are not near each other, the significant latency will be the network connection between me and the node I'm connected to as well as the node I'm connected to and the leaseholder for the range of table A that contains the UUID I'm looking up.

The question that comes up next, what happens to the latency when we are doing a join across multiple tables? Is there anything to worry about when doing "SELECT * from C, A, B where C.account_id = A.account_id and C.card_id = B.card_id and C.xref_id = <my xref UUID>;"?

The sequence of events looks like this:

I connect to any node in the cluster
I issue my query
The node I am connected to contacts the leaseholder for the range of table C that contains my UUID
The leaseholder for the range of data continuing my UUID in table C does a point lookup
The leaseholders for tables A and B are contacted for the point lookups from those tables
All three elements are now easily available from the three tables and are combined together.
The result is passed back to the node I am connected to.
The node I am connected to presents me with the result.

Since the point lookups are very quick, once they are being executed and using the indexes, the big factors in the latency will be the network connection from the node I am connected with to the leaseholder in table C, and from there to the leaseholders for the ranges in tables A and B.

As you can imagine, if all the leaseholders we need to work with are close to each other, getting the three elements and putting them together will be pretty quick. If I am close to those leaseholders, my response will be fast as well. The opposite of this would be to have the leaseholders scattered far apart and distant from me when I make this query. Having the leaseholders spread out will incur the most latency.

Below is a grid of various combinations of where the leaseholders have been placed and from where I am issuing the query.

Test	Leaseholder A	Leaseholder B	Leaseholder C	Query From	Latency (ms)
1	us-east4-a	us-east4-a	us-east4-a	us-east4-a	2
2	us-east4-a	us-east4-a	us-east4-a	us-east1-d	14
3	us-east4-a	us-east4-a	us-east4-a	us-west1-c	63
4	us-east1-d	us-east1-d	us-east4-a	us-east4-a	26
5	us-east1-d	us-east1-d	us-east4-a	us-east1-d	37
6	us-east1-d	us-east1-d	us-east4-a	us-west1-c	87
7	us-east4-a	us-east4-a	us-east1-d	us-east4-a	37
8	us-east4-a	us-east4-a	us-east1-d	us-east1-d	24
9	us-east4-a	us-east4-a	us-east1-d	us-west1-c	90
10	us-east4-a	us-west1-c	us-east1-d	us-east4-a	90
11	us-east4-a	us-west1-c	us-east1-d	us-east1-d	78
12	us-east4-a	us-west1-c	us-east1-d	us-west1-c	144
13	us-west1-c	us-west1-c	us-east1-d	us-east4-a	144
14	us-west1-c	us-west1-c	us-east1-d	us-east1-d	132
15	us-west1-c	us-west1-c	us-east1-d	us-west1-c	198
16	us-west1-c	us-west1-c	us-west1-c	us-east4-a	63
17	us-west1-c	us-west1-c	us-west1-c	us-east1-d	67
18	us-west1-c	us-west1-c	us-west1-c	us-west1-c	1

As you can see from this sample, the more geographically distributed the leaseholders are, the more latency we encounter with a three table join. The closer the leaseholders for the various components are to each other, the lower the latency [tests 1-3 and 16-18]… and the closer I am to the leaseholder for table C when I issue the query, the latency is reduced even further [tests 1 and 18]. The second best option is to have the leaseholders close to each other, even if they aren't in the same region or AZ [tests 4-6].

Test #15 is the worst scenario here, because we are starting in us-west1-c, then have to connect to the leaseholder for C in us-east1-d, which then has to get pieces from tables A and B from us-west1-c, then compile the result, return the result to us-west1-c, which then presents it to me. This represents a number of expensive and time consuming hops across North America.

In summary, you will get the best performance out of multi-table joins in CockroachDB if you can have the leaseholders for the data in question located very close to each other and if the source of the query is also close to those leaseholders. Second best is locating all of the leaseholders close to each other, even if the source of the application is more distant. The worst scenario is when the leaseholders are geographically diverse and the source of the query is distant from table C (our cross reference table).

Search This Blog

InsanityUnlimited.com

Impact of LeaseHolder Placement on Joins

Comments

Post a Comment

Popular posts from this blog

CockroachDB Backups, Exports, and Archives

Data Center Failures and CRDB Replication

Bulk Update of data in CockroachDB