fix(cluster) use 'listen_address' for contact point in refresh()

thibaultcha · web-flow · commit 8b9fcd967bb2 · 2018-08-08T18:20:56.000-07:00
Previously, using `coordinator.host` to add the contact point to the LB policy means that if the user specified a hostname, then it would be used to index this node instead of the IP address. Nothing harmful in that except some inconsistent log messages (sometimes an IP address shows up, other times a hostname). Problem ------- An issue arises however when: 1. Several Cluster instances call `:refresh()` on the same C* cluster 2. DNS round-robin is in effect for the contact point hostnames Let's consider clusterA and clusterB, both instances of the Cluster module. Let's also consider the following C* cluster: 10.16.0.1 node1 10.16.0.2 node2 And the following DNS record: cassandra.default.svc.cluster.local. 30 IN A 10.16.0.1 cassandra.default.svc.cluster.local. 30 IN A 10.16.0.2 First, clusterA calls `refresh()`, with `contact_points = { "cassandra" }`, and as a result inserts the following topology in the cluster's shm: cassandra:[peer info] 10.16.0.2:[peer info] Its LB policy now has 2 entries: `cassandra` and `10.16.0.2`. Then, clusterB calls `refresh()` as well, with the same `contact_points` option, and as a result first purges the cluster's shm content, before inserting the following: 10.16.0.1:[peer info] cassandra:[peer info] Note that because of the round-robin DNS resolution, `cassandra` pointed to `10.16.0.2` this time. Now, when clusterA will invoke its LB policy to elect a peer for a given query, it will eventually look for `10.16.0.2`. However, such an entry does not exist in the cluster's shm anymore. Therefore, the following error is returned: no host details for 10.16.0.2 Proposed solution ----------------- By replacing the cache key of the peer's info in the shm from the specified `contact_point` value (which is the user's input), to the `listen_address` column of the `system.local` table, do not store hosts details by hostname anymore. This has the added benefit of ensuring all logs and other operations done by the Cluster module are always using the IP address of the node. From #118
diff --git a/lib/resty/cassandra/cluster.lua b/lib/resty/cassandra/cluster.lua
@@ -479,7 +479,7 @@ function _Cluster:refresh()
     if not coordinator then return nil, err end
 
     local local_rows, err = coordinator:execute [[
-      SELECT data_center,rpc_address,release_version FROM system.local
+      SELECT data_center,listen_address,release_version FROM system.local
     ]]
     if not local_rows then return nil, err end
 
@@ -493,7 +493,7 @@ function _Cluster:refresh()
     coordinator:setkeepalive()
 
     rows[#rows+1] = { -- local host
-      rpc_address = coordinator.host,
+      rpc_address = local_rows[1].listen_address,
       data_center = local_rows[1].data_center,
       release_version = local_rows[1].release_version
     }