RFC 886: Making Hotswap Faster#887
Conversation
tmokmss
left a comment
There was a problem hiding this comment.
Thanks for putting this together! I use hotswap/watch regularly and I'm excited to see the coverage expanding. The asset bundling and synthesis improvements are especially great. A few comments on specific sections below.
|
|
||
| #### Broader Resource Coverage with AWS Cloud Control API | ||
|
|
||
| The primary driver behind this improvement is the introduction of a new hotswap engine built on the AWS Cloud Control API (CCAPI). |
There was a problem hiding this comment.
CCAPI is known to be slow compared to direct service APIs (some benchmarks). For hotswap where speed is the whole point, I think SDK is still a viable option for new resource types too. With coding agents these days, the maintenance cost of SDK implementations is not as high as it used to be. Have you run any benchmarks comparing the two for the target resource types?
There was a problem hiding this comment.
@ShadowCat567 the proposal should be more clear about this if true: we will maintain and add bespoke SDK implementations based on quantitative benchmarks compared to CCAPI in specific use cases, but feel like the CCAPI implementation will be adequate for most use cases.
There was a problem hiding this comment.
yes, that is the idea, I'll update this section
| for (const propName of classifiedChanges.namesOfHotswappableProps) { | ||
| const newValue = await evaluateCfnTemplate.evaluateCfnExpression( | ||
| change.propertyUpdates[propName].newValue, | ||
| ); |
There was a problem hiding this comment.
A major pain point of current hotswap is that intrinsic function resolution is very limited. For example, this makes ECS Task Definitions non-hotswappable in many real-world cases. aws/aws-cdk#26061 aws/aws-cdk#25563
Is there any fundamental approach to this problem in this RFC? For example, fetching resolved values from already-deployed resources instead of trying to evaluate them on the CLI side.
There was a problem hiding this comment.
I was not previously aware of this, now that you bring it up, I think it is a good idea to include in the set of improvements. Thank you for highlighting this issue, will be looking into it!
There was a problem hiding this comment.
This is required work to get the CCAPI engine to work (and work well), will be calling this out in the RFC.
|
|
||
| #### Plugin system with customer-created plugins | ||
|
|
||
| This was the proposal associated with the initial Github issue associated with hotswap anything (https://github.com/aws/aws-cdk-cli/issues/882). |
There was a problem hiding this comment.
One of the appealing aspects of the plugin architecture was that custom resource authors could provide their own hotswap logic. This matters for resources like BucketDeployment where the update behavior is defined by the construct library. I hope this option is still on the table.
There was a problem hiding this comment.
at least i don't see an argument right now for why we can't expose plugin architecture in addition to what else we are supporting as a use-at-your-own-risk feature.
is there a technical limitation that stops us from allowing plugins?
There was a problem hiding this comment.
Implementing hotswap plugins and implementing additional resource coverage via CCAPIs are not mutually exclusive. This RFC is more focused on how we can improve hotswap performance for most the most users. I'm not sure how many users would be willing to write their own plugins but if there is interest from people like (@tmokmss) then we can look into adding a plugin system as well.
There was a problem hiding this comment.
I would like to be very clear on the use case and perceived benefits of a plugin architecture.
For non-custom resources, we are saying that we will have a CCAPI handler if that gets us at least a 2x speed improvement over CloudFormation. It will be hard to do better than CCAPI using raw SDK calls, which means that you should not have a much reason to have a custom handler for any regular resource types: you will be within 2x of the theoretical max performance regardless, which should be good enough for most development flows.
After that, there is custom resources. But for example the "BucketDeployment" custom resource doesn't seem like a strong example for a plug-in architecture yet -- that's a CDK-provided custom resource, and I think a good argument could be made that a good local implementation of that one should just be provided with the CDK CLI out of the box. It will be so commonly used in local development that it makes perfect sense for us to take on this lift, and punting the responsibility to users is the wrong decision there.
After that one common custom resource, what remains? A custom resource that gets triggered so often that it is worth specifically optimizing for, because the ~10 additional seconds it takes to wait for CloudFormation are too much to bear? I would like to see some good examples before we take that on.
There was a problem hiding this comment.
There are more custom resources, and then there will be userland custom resources. I also see a lot of value for integrating bundling with hotswap. This will be hard for us to optimize generally without knowing a specific app. Users do know there apps and can optimize for them.
There was a problem hiding this comment.
@tmokmss There currently is a hotswap implementation for BucketDeployments (https://github.com/aws/aws-cdk-cli/blob/main/packages/%40aws-cdk/toolkit-lib/lib/api/hotswap/s3-bucket-deployments.ts), what kind of improvements would you like to see from it that we could do without introducing a plugin interface?
Also with a plugin interface in general would your main vision for its use be adding support for hotswapping custom resources? Or do you have other ideas for what you would want out of it?
There was a problem hiding this comment.
would your main vision for its use be adding support for hotswapping custom resources?
Yes. Although I know this is a small use case, I maintain a construct library named deploy-time-build (recently migrated under cdklabs). It contains a custom resource to build certain assets like Node.js apps or container images at deploy time. With a plugin architecture, I could provide a hotswap handler that runs the build locally instead of going through CloudFormation, which would be significantly faster when making frequent changes to the source code being built.
That said, I understand we need to prioritize. If I had to pick one thing, I'd rather see the intrinsic function resolution issue (#887 (comment)) addressed first, as that's a bigger blocker in my day-to-day workflow.
There was a problem hiding this comment.
Improvements to intrinsic function resolution is actually required work to get the CCAPI hotswap implementation to work, so it will be a part of this RFC. The plugin system probably will be implemented at some point in the future, but it will not be a part of this RFC this comment may be interesting to you: #887 (comment)
| Instead we will add hotswap support for a minimal set of resource types that are most commonly used in practical iterative deployment activities. | ||
| Resources that will be supported by hotswap must meet the following criteria: | ||
|
|
||
| 1. Hotswappable resources are in the top 80% of resources that are included in hotswap deployments but are not currently hotswappable. |
There was a problem hiding this comment.
Just out of curiosity, could you include the list of resources that would be covered under this criteria?
There was a problem hiding this comment.
Some examples that I have been looking at would be Bedrock Agents, DynamoDB Tables, SQS Queues, Cloudwatch Alarms and Dashboards, SNS Topics and Subscriptions, and RestAPIs from ApiGateway. Let me know if there are any resource types you would be particularly interested in seeing - will be adding a list in the RFC.
| does not know about the last hotswap deployment that happened. | ||
| Which leads to creating diffs that includes changes that have already been hotswapped. | ||
| This incurs a performance penalty over time since the time it takes for a --hotswap deployment to complete is proportional to the number of changed resources. | ||
| To address this problem we are saving the Cloudformation template synthesized from the most recent successful hotswap deployment |
There was a problem hiding this comment.
Is this saved locally or globally? What if multiple people are hotswapping at the same time? Can this lead to drift between local versions of the same code?
There was a problem hiding this comment.
This state is saved locally.
This proposal does not solve the problem of multiple people attempting to run hotswap in the same environment at the same time. Multiple people making changes to the same stack at the same time would be complicated to resolve even if you were making changes through full Cloudformation deployments.
I do not think hotswap is the right place to resolve issues stemming from multiple people using the same environment to make changes to the same resources. The solution to this would be to use isolated environments or coordinate to work on different stacks if you are developing in the same environment.
| To address this problem we are saving the Cloudformation template synthesized from the most recent successful hotswap deployment | ||
| so we can refer back to it when new changes are made instead of referring back to the Cloudformation template from the last full deployment. | ||
| These hotswap templates are wiped when a Cloudformation deployment happens and they do not attempt to alter | ||
| or replace the Cloudformation template from the last successful deployment. |
There was a problem hiding this comment.
Same question here. How do we deal with multiple people using hotswap on the same environment?
There was a problem hiding this comment.
see answer to previous question
There was a problem hiding this comment.
Any plans for rollback? I'm curious as to how we deal with a broken state
There was a problem hiding this comment.
No, rollback will slow down hotswap (it is part of the reason Cloudformation deployments are currently so slow), if a resource ends up in a broken state the expectation is that you perform subsequent deployments to fix the broken state.
Additionally the current implementation of hotswap disables rollback by default, we intend to keep that implementation: https://docs.aws.amazon.com/cdk/v2/guide/cli.html#cli-deploy
There was a problem hiding this comment.
I see! Do we have a current workaround for broken state? How does a customer get out of it?
|
|
||
| #### Broader Resource Coverage with AWS Cloud Control API | ||
|
|
||
| The primary driver behind this improvement is the introduction of a new hotswap engine built on the AWS Cloud Control API (CCAPI). |
There was a problem hiding this comment.
@ShadowCat567 the proposal should be more clear about this if true: we will maintain and add bespoke SDK implementations based on quantitative benchmarks compared to CCAPI in specific use cases, but feel like the CCAPI implementation will be adequate for most use cases.
| * **Optimized asset handling** — Assets are now rebuilt only when necessary, and the cdk synth step is skipped when only asset files have changed. | ||
| This reduces pre-deployment overhead. |
There was a problem hiding this comment.
question: am i correct in thinking that this improvement is not specific to hotswap and will improve CDK synth times for regular deployment as well? If so, it should be celebrated as such and not as an afterthought in the hotswap improvments
There was a problem hiding this comment.
it should, additionally any changes made to improve cdk synth would also improve the performance of hotswap
| Current behavior will remain the same: | ||
|
|
||
| ``` | ||
| $ cdk deploy --hotswap --hotswap-fallback |
There was a problem hiding this comment.
this is not the current behavior. you cannot set --hotswap and --hotswap-fallback at the same time
There was a problem hiding this comment.
we currently have two fallback options:
--hotswap: ignore non-hotswappable changes and report success--hotswap-fallback: fall back to full deployment for all when non-hostwappable changes exist
if in the future we wanted to support additional fall back modes we'd likely do --hotswap-another-fallback as a separate cdk deploy option. It sounds like you are proposing a different API here, but also a bit confused as to how it works today.
There was a problem hiding this comment.
This is where I was getting confused. What I am suggesting is a different API where all of the fallback options are grouped together and you specify that flag alongside cdk deploy --hotswap to configure the fallback mode.
There was a problem hiding this comment.
Ended up removing this section after further research, see: #887 (comment)
|
|
||
| #### Plugin system with customer-created plugins | ||
|
|
||
| This was the proposal associated with the initial Github issue associated with hotswap anything (https://github.com/aws/aws-cdk-cli/issues/882). |
There was a problem hiding this comment.
at least i don't see an argument right now for why we can't expose plugin architecture in addition to what else we are supporting as a use-at-your-own-risk feature.
is there a technical limitation that stops us from allowing plugins?
| Changes to resource types not on this allow list will be classified has non-hotswappable changes. | ||
| [See the appendix for a prototype of the CCAPI hotswap engine](#ai-generated-implementation-of-ccapi-hotswap-engine) | ||
|
|
||
| #### Hotswapping Assets |
There was a problem hiding this comment.
This section doesn't really describe a technical solution.
There was a problem hiding this comment.
has been updated
|
|
||
| * feat(hotswap): hotswap now covers significantly more resource types due to using a CCAPI-based deployment engine | ||
| * feat(hotswap): asset hotswapping has been improved | ||
| * feat(cli): by default hotswap deployments fallback to an AWS CloudFormation deployment |
There was a problem hiding this comment.
This seems to be a breaking change. What is the rollout plan here?
There was a problem hiding this comment.
Went back to do some research on the history of hotswap fallback and decided that it would be better for us to leave it alone unless we get a faster deployment mode through Cloudformation, in that case the hotswap-fallback flag would be getting changed to use that deployment mode.
The current cdk deploy --hotswap fallback behavior is fine, this suggestion was to try to fix a problem that was based on a misunderstanding of how it worked.
| ### CHANGELOG | ||
|
|
||
| * feat(hotswap): hotswap now covers significantly more resource types due to using a CCAPI-based deployment engine | ||
| * feat(hotswap): asset hotswapping has been improved |
There was a problem hiding this comment.
updated this by splitting it into 2 changelog annotations - 1 for asset bundling, 1 for skipping synth
| This would let customers add hotswap support for their own resources without waiting for the CDK team. However, it shifts that responsibility to the community. | ||
| While some community members would write and share hotswap implementations for new resource types, only those who write their own or | ||
| find publicly shared ones would benefit. | ||
| Which makes this an incomplete solution to the speed limitations that come from hotswapping non-hotswappable resources. |
There was a problem hiding this comment.
While there are good reasons against a plugin API, I don't think this is one of them. Elsewhere we are stating that not all resource are supported by CCAPI and that we will support resource that "are in the top 80% of resources that are included in hotswap deployments but are not currently hotswappable". Both explicitly exclude resource and thus make this RFC an incomplete solution as well.
There was a problem hiding this comment.
This is true. The goal is not "100% coverage". The goal is: fast iteration times for local development.
There was a problem hiding this comment.
Summary/Conclusions from an offline discussion with @rix0rrr and @mrgrain about whether plugins should be included in this RFC:
Even if this RFC does not include plugins for hotswap, we should be designing this in a way where adding hotswap plugins should be easy/straightforward in the future. The main concerns with adding a plugin in system to this project would be that we are adding a lot of work to this project without much progress towards the main success metric (make hotswap faster for most people) and the packaging/representation of common types that will cross the plugin border like types from SDK could be more complicated and offer more opportunity for breakage than we expect (especially since we have not started getting to the weeds of how exactly this will work).
We should consider making a pseudo plugin system so we know what a full plugin system would need and can build the hotswap architecture improvements this RFC suggests around it so if/when we implement a plugin interface for hotswap it can slot in cleanly.
This RFC should also be a bit more clear that the goal is speed rather than coverage from hotswap.
|
RFC is now in Final Comments period for 1 week. |
| The new CCAPI-based engine takes a different approach: | ||
| for any change that is not already handled by an existing implementation, the engine attempts to perform an in-place update using Cloud Control APIs. |
There was a problem hiding this comment.
not already handled by an existing implementation
Is it at the resource type level or the property level? If at the resource type level, resource types with existing SDK implementations would not benefit from CCAPI's broader property coverage, which seems worth noting in the RFC.
If at the property level, suppose a resource type has an SDK-based hotswap implementation that only covers certain properties, and a change includes both covered and uncovered properties. Is the entire change delegated to CCAPI, split between SDK and CCAPI, or classified as non-hotswappable?
If split, both SDK and CCAPI calls are needed, making it slower than either approach alone. If the entire change is delegated to CCAPI, SDK implementations would only be used when all changed properties happen to be SDK-covered. Are SDK implementations maintained for that case?
There was a problem hiding this comment.
This section should have been changed as the RFC has evolved. We are planning to have a clear split between resources that are hotswapped with SDKs and resources that are hotswapped with CCAPIs. This means, if a resources has an SDK-based hotswap implementation it will not fallback to being hotswapped by CCAPIs if a property that cannot be hotswapped by the SDK implementation is specified. Additionally, the resources that use the CCAPI implementation will be managed by an allow list, which means adding hotswap support for new resource types becomes a one-line change.
Essentially, there are boundaries between SDK-hotswappable resources, CCAPI-hotswappable resources, and non-hotswappable resources and resources cannot cross those boundaries and be hotswappable by both SDKs and CCAPIs. I'll clarify that in this section of the RFC, thank you for asking about this!
This is a request for comments about improvements to hotswap that will make it faster. See #886 for
additional details.
APIs are signed off by @rix0rrr .
By submitting this pull request, I confirm that my contribution is made under
the terms of the Apache-2.0 license