Skip to content

Commit 9aea5a9

Browse files
Support deploying a cluster using stateful IP addresses
Stateful IPs are required to support load balancing using DNS round robin. This is because clients will be connecting directly to a specific proxy instances via the instance's internal IP address. If a proxy instance fails and is replaced, the new instance needs to keep the same IP address as the original so that clients can reconnect. Change-Id: I3646cdee6ab534e00d919abc87e044c544f6c103
1 parent bd8176c commit 9aea5a9

6 files changed

Lines changed: 68 additions & 10 deletions

File tree

deployment/README.md

Lines changed: 14 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -83,13 +83,14 @@ terraform apply
8383

8484
### Network Configuration
8585

86-
| Variable | Description | Required | Default |
87-
| ---------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -------- | --------- |
88-
| NETWORK | The network name (VPC) to use for the deployment of the Knfsd Compute Engine Instances. | False | `default` |
89-
| SUBNETWORK | The subnetwork name (subnet) to use for the deployment of the Knfsd Compute Engine Instances. | False | `default` |
90-
| SUBNETWORK_PROJECT | The project that the subnetwork exists in. This only needs to be set if using a Shared VPC, where the subnetwork exists in a different project. Otherwise it defaults to the provider project. | False | null |
91-
| AUTO_CREATE_FIREWALL_RULES | Should firewall rules automatically be created to allow [healthcheck connectivity](https://cloud.google.com/load-balancing/docs/health-check-concepts#ip-ranges)? | False | `true` |
92-
| LOADBALANCER_IP | The IP address to use for the Internal Load Balancer. If not specified, a random IP address will be assigned within the subnet. | False | null |
86+
| Variable | Description | Required | Default |
87+
| ----------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | -------- | --------- |
88+
| NETWORK | The network name (VPC) to use for the deployment of the Knfsd Compute Engine Instances. | False | `default` |
89+
| SUBNETWORK | The subnetwork name (subnet) to use for the deployment of the Knfsd Compute Engine Instances. | False | `default` |
90+
| SUBNETWORK_PROJECT | The project that the subnetwork exists in. This only needs to be set if using a Shared VPC, where the subnetwork exists in a different project. Otherwise it defaults to the provider project. | False | null |
91+
| AUTO_CREATE_FIREWALL_RULES | Should firewall rules automatically be created to allow [healthcheck connectivity](https://cloud.google.com/load-balancing/docs/health-check-concepts#ip-ranges)? | False | `true` |
92+
| LOADBALANCER_IP | The IP address to use for the Internal Load Balancer. If not specified, a random IP address will be assigned within the subnet. | False | null |
93+
| ASSIGN_STATIC_IPS | If set to `true`, configures the MIG to use [stateful IP addresses](https://cloud.google.com/compute/docs/instance-groups/configuring-stateful-ip-addresses-in-migs). If an instance is replaced due to an update or failing health check the new instance will keep the same IP address as the original instance. | False | `false` |
9394
| ENABLE_HIGH_BANDWIDTH_CONFIGURATION | If set to `true` enables [gVNIC](https://cloud.google.com/compute/docs/networking/using-gvnic) and [Tier 1 Bandwidth](https://cloud.google.com/compute/docs/networking/configure-vm-with-high-bandwidth-configuration) for higher egress. When enabled, only N2, N2D, C2 or C2D VM's are supported. You should also make sure you [assign enough vCPU's](https://cloud.google.com/compute/docs/networking/configure-vm-with-high-bandwidth-configuration#bandwidth-tiers) to take advantage of this configuration. | False | null |
9495

9596
### Health Check Configuration
@@ -153,11 +154,16 @@ If using the NetApp Exports Auto-Discovery feature, please also read the [NetApp
153154
| CUSTOM_POST_STARTUP_SCRIPT | The path to a bash script to run after the [proxy-startup.sh](proxy-startup.sh) script. For example `file("/home/ben/myscript.sh")`. | False | empty script |
154155
| MACHINE_TYPE | The GCP Machine type to use for the Knfsd cache. Currently only N1 instances can be used. | False | `n1-highmem-16` |
155156
| MIG_MAX_UNAVAILABLE_PERCENT | The maximum number of instances that can be unavailable during automated MIG updates ([see docs](https://cloud.google.com/compute/docs/instance-groups/rolling-out-updates-to-managed-instance-groups#max_unavailable)). Defaults to 100% to ensure consistent cache instances within the MIG. | False | `100` |
156-
| MIG_REPLACEMENT_METHOD | The instance replacement method for managed instance groups. Valid values are: `RECREATE`, `SUBSTITUTE`.<br><br>If `SUBSTITUTE` (default), the group replaces VM instances with new instances that have randomly generated names. If `RECREATE`, instance names are preserved. You must also set `MIG_MAX_UNAVAILABLE_PERCENT` to be greater than 0 (default is already `100` so this only applies if you have modified this variable). | False | `SUBSTITUTE` |
157+
| MIG_REPLACEMENT_METHOD | The instance replacement method for managed instance groups. Valid values are: `RECREATE`, `SUBSTITUTE`.<br><br>If `SUBSTITUTE` (default), the group replaces VM instances with new instances that have randomly generated names. If `RECREATE`, instance names are preserved. You must also set `MIG_MAX_UNAVAILABLE_PERCENT` to be greater than 0 (default is already `100` so this only applies if you have modified this variable). | False | `SUBSTITUTE` or `RECREATE` |
157158
| MIG_MINIMAL_ACTION | Minimal action to be taken on an instance. You can specify either RESTART to restart existing instances or REPLACE to delete and create new instances from the target template. If you specify a RESTART, the Updater will attempt to perform that action only. However, if the Updater determines that the minimal action you specify is not enough to perform the update, it might perform a more disruptive action. | False | `RESTART` |
158159
| ENABLE_KNFSD_AGENT | Should the [Knfsd Agent](../../image/knfsd-agent/README.md) be started at Proxy Startup? | False | `true` |
159160
| SERVICE_ACCOUNT | Service account the NFS proxy compute instances will run with. | False | See service account notes below |
160161

162+
The default `MIG_REPLACEMENT_METHOD` depends on `ASSIGN_STATIC_IPS`:
163+
164+
* When `ASSIGN_STATIC_IPS = false` then the default `MIG_REPLACEMENT_METHOD` is `SUBSTITUTE`.
165+
* When `ASSIGN_STATIC_IPS = true` then the default `MIG_REPLACEMENT_METHOD` is `RECREATE`.
166+
161167
#### Service Account Notes
162168

163169
The default `SERVICE_ACCOUNT` depends on `ENABLE_STACKDRIVER_METRICS`.

deployment/terraform-module-knfsd/compute.tf

Lines changed: 23 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -159,22 +159,37 @@ resource "google_compute_health_check" "autohealing" {
159159
port = "2049"
160160
}
161161

162+
depends_on = [
163+
# Ensure that the firewall rules are not deleted while the health check
164+
# still exists. Otherwise when removing clusters, Terraform may delete the
165+
# firewall rule causing the proxy group to start replacing instances.
166+
# Terraform will then get stuck waiting for the instance group to complete
167+
# the changes before removing the instance group.
168+
google_compute_firewall.allow-tcp-healthcheck
169+
]
162170
}
163171

164172
# Instance Group Manager for the Knfsd Nodes
165173
resource "google_compute_instance_group_manager" "proxy-group" {
174+
provider = google-beta # required to support stateful_internal_ip
175+
166176
project = var.PROJECT
167177
name = "${var.PROXY_BASENAME}-group"
168178
base_instance_name = var.PROXY_BASENAME
169179
zone = var.ZONE
170180
// Set the Target Size to null if autoscaling is enabled
171181
target_size = (var.ENABLE_KNFSD_AUTOSCALING == true ? null : var.KNFSD_NODES)
172182

183+
# when using static IPs, wait for all the instances to be updated so that the
184+
# IPs of the Compute Instances can be fetched using the instance_ips module.
185+
wait_for_instances = var.ASSIGN_STATIC_IPS
186+
wait_for_instances_status = "UPDATED"
187+
173188
update_policy {
174189
type = "PROACTIVE"
175190
minimal_action = var.MIG_MINIMAL_ACTION
176191
max_unavailable_percent = var.MIG_MAX_UNAVAILABLE_PERCENT
177-
replacement_method = var.MIG_REPLACEMENT_METHOD
192+
replacement_method = coalesce(var.MIG_REPLACEMENT_METHOD, local.MIG_REPLACEMENT_METHOD_DEFAULT)
178193
}
179194

180195
version {
@@ -191,6 +206,13 @@ resource "google_compute_instance_group_manager" "proxy-group" {
191206
}
192207
}
193208

209+
dynamic "stateful_internal_ip" {
210+
for_each = toset(var.ASSIGN_STATIC_IPS ? ["nic0"] : [])
211+
content {
212+
interface_name = stateful_internal_ip.value
213+
delete_rule = "ON_PERMANENT_INSTANCE_DELETION"
214+
}
215+
}
194216
}
195217

196218
# Firewall rule to allow healthchecks from the GCP Healthcheck ranges

deployment/terraform-module-knfsd/main.tf

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,4 +25,5 @@ locals {
2525
var.ENABLE_STACKDRIVER_METRICS ? ["logging-write", "monitoring-write"] :
2626
[]
2727
)
28+
MIG_REPLACEMENT_METHOD_DEFAULT = var.ASSIGN_STATIC_IPS ? "RECREATE" : "SUBSTITUTE"
2829
}

deployment/terraform-module-knfsd/outputs.tf

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,3 +25,18 @@ output "nfsproxy_loadbalancer_dnsaddress" {
2525
description = "The internal dns entry address for the nfsProxy load balancer:"
2626
value = one(module.loadbalancer.*.dns_name)
2727
}
28+
29+
output "instance_group" {
30+
description = "Full URL of the KNFSD proxy instance group."
31+
value = google_compute_instance_group_manager.proxy-group.instance_group
32+
}
33+
34+
output "instance_group_manager" {
35+
description = "Full URL of the KNFSD proxy instance group manager."
36+
value = google_compute_instance_group_manager.proxy-group.self_link
37+
}
38+
39+
output "instance_group_name" {
40+
description = "Name of the KNFSD proxy instance group."
41+
value = google_compute_instance_group_manager.proxy-group.name
42+
}

deployment/terraform-module-knfsd/variables.tf

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -69,6 +69,11 @@ variable "SUBNETWORK" {
6969
type = string
7070
}
7171

72+
variable "ASSIGN_STATIC_IPS" {
73+
default = false
74+
type = bool
75+
}
76+
7277
variable "PROXY_BASENAME" {
7378
default = "nfsproxy"
7479
type = string
@@ -272,7 +277,7 @@ variable "MIG_MAX_UNAVAILABLE_PERCENT" {
272277
}
273278

274279
variable "MIG_REPLACEMENT_METHOD" {
275-
default = "SUBSTITUTE"
280+
default = ""
276281
type = string
277282
}
278283

docs/changes/changelog.md

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@
22

33
* Change the default build machine type to c2-standard-16
44
* Update to the latest FS-Cache performance patches (v11)
5+
* Assign static IPs to proxy instances
56

67
## Change the default build machine type to c2-standard-16
78

@@ -22,6 +23,14 @@ This includes the following patch sets:
2223
* vfs, security: Fix automount superblock LSM init problem, preventing NFS sb sharing (v5)
2324
<https://lore.kernel.org/linux-kernel/217595.1662033775@warthog.procyon.org.uk/>
2425

26+
## Assign static IPs to proxy instances
27+
28+
Added a new configuration option, `ASSIGN_STATIC_IPS`. This configures the MIG to use [stateful IP addresses](https://cloud.google.com/compute/docs/instance-groups/configuring-stateful-ip-addresses-in-migs).
29+
30+
When using stateful IP addresses, if an instance needs to be replaced due to an update, or auto-healing the new instance will have the same IP as the original instance.
31+
32+
This allows using the cluster without a load balancer, where the clients connected directly to a specific proxy instance via the instances internal IP address.
33+
2534
# v1.0.0-beta3
2635

2736
* Temporary fix for cachefilesd intermittently terminating

0 commit comments

Comments
 (0)