Describe the bug
The MaxRetryTimout setting is only implemented in a very limited set of use cases in the SDK.
As such the SDK does not handle transient errors like 429 rate throttling and instead throws an error.
The upstream provider https://github.com/vmware/terraform-provider-vcd/ also does not address transient errors, such as 429.
Reproduction steps
- Create multiple https://registry.terraform.io/providers/vmware/vcd/latest/docs/resources/nsxt_distributed_firewall_rule instances or other resources, e.g. 40 rules
- Run a Terraform plan and eventually you'll get 429 errors on the refresh stage as the rate limit is 100 requests per second by default and each call of this resource has at least 3 API calls, so a total of 120 API calls that Terraform will try to parallelise.
- The refresh will fail and the plan will fail with an error like this:
Error: [nsxt dynamic security group read] error getting NSX-T dynamic security group: error in HTTP GET request: INTERNAL_SERVER_ERROR - [ 248-2024-10-16-10-28-45-133--92724528-ae2c-4e41-94c8-81cc14e8b9cc ] Forbidden: Error occurred in the backing network provider: Client 'cloudadmin' exceeded request rate of 100 per second, error code 102
Expected behavior
When a 429 is encountered and MaxRetryTimeout is set, the SDK should retry in a loop until is receives a good status code or times out.
Additional context
Limiting the parallelism in the Terraform plan resolves the issue, but the consumer of the provider should not have to deal with that and the associated very slow plan time as the number of resources under control grows. It is also not a guarantee like the properly implemented retry.
Describe the bug
The
MaxRetryTimoutsetting is only implemented in a very limited set of use cases in the SDK.As such the SDK does not handle transient errors like 429 rate throttling and instead throws an error.
The upstream provider https://github.com/vmware/terraform-provider-vcd/ also does not address transient errors, such as 429.
Reproduction steps
Expected behavior
When a 429 is encountered and MaxRetryTimeout is set, the SDK should retry in a loop until is receives a good status code or times out.
Additional context
Limiting the parallelism in the Terraform plan resolves the issue, but the consumer of the provider should not have to deal with that and the associated very slow plan time as the number of resources under control grows. It is also not a guarantee like the properly implemented retry.