Skip to content

Commit 8b9fcd9

Browse files
authored
fix(cluster) use 'listen_address' for contact point in refresh()
Previously, using `coordinator.host` to add the contact point to the LB policy means that if the user specified a hostname, then it would be used to index this node instead of the IP address. Nothing harmful in that except some inconsistent log messages (sometimes an IP address shows up, other times a hostname). Problem ------- An issue arises however when: 1. Several Cluster instances call `:refresh()` on the same C* cluster 2. DNS round-robin is in effect for the contact point hostnames Let's consider clusterA and clusterB, both instances of the Cluster module. Let's also consider the following C* cluster: 10.16.0.1 node1 10.16.0.2 node2 And the following DNS record: cassandra.default.svc.cluster.local. 30 IN A 10.16.0.1 cassandra.default.svc.cluster.local. 30 IN A 10.16.0.2 First, clusterA calls `refresh()`, with `contact_points = { "cassandra" }`, and as a result inserts the following topology in the cluster's shm: cassandra:[peer info] 10.16.0.2:[peer info] Its LB policy now has 2 entries: `cassandra` and `10.16.0.2`. Then, clusterB calls `refresh()` as well, with the same `contact_points` option, and as a result first purges the cluster's shm content, before inserting the following: 10.16.0.1:[peer info] cassandra:[peer info] Note that because of the round-robin DNS resolution, `cassandra` pointed to `10.16.0.2` this time. Now, when clusterA will invoke its LB policy to elect a peer for a given query, it will eventually look for `10.16.0.2`. However, such an entry does not exist in the cluster's shm anymore. Therefore, the following error is returned: no host details for 10.16.0.2 Proposed solution ----------------- By replacing the cache key of the peer's info in the shm from the specified `contact_point` value (which is the user's input), to the `listen_address` column of the `system.local` table, do not store hosts details by hostname anymore. This has the added benefit of ensuring all logs and other operations done by the Cluster module are always using the IP address of the node. From #118
1 parent f43c638 commit 8b9fcd9

1 file changed

Lines changed: 2 additions & 2 deletions

File tree

lib/resty/cassandra/cluster.lua

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -479,7 +479,7 @@ function _Cluster:refresh()
479479
if not coordinator then return nil, err end
480480

481481
local local_rows, err = coordinator:execute [[
482-
SELECT data_center,rpc_address,release_version FROM system.local
482+
SELECT data_center,listen_address,release_version FROM system.local
483483
]]
484484
if not local_rows then return nil, err end
485485

@@ -493,7 +493,7 @@ function _Cluster:refresh()
493493
coordinator:setkeepalive()
494494

495495
rows[#rows+1] = { -- local host
496-
rpc_address = coordinator.host,
496+
rpc_address = local_rows[1].listen_address,
497497
data_center = local_rows[1].data_center,
498498
release_version = local_rows[1].release_version
499499
}

0 commit comments

Comments
 (0)