Skip to content

Commit 58a9fb8

Browse files
authored
HDDS-13857. [STS] Design doc for STS (#9223)
1 parent a2e865a commit 58a9fb8

1 file changed

Lines changed: 215 additions & 0 deletions

File tree

Lines changed: 215 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,215 @@
1+
---
2+
title: AWS STS Design for Ozone S3
3+
summary: STS Support in Ozone
4+
date: 2025-10-30
5+
jira: HDDS-13323
6+
status: implementing
7+
author: Madhan Neethiraj, Ren Koike, Fabian Morgan, Stephen O'Donnell, Istvan Fajth, Uma Maheswara Rao Gangumalla
8+
---
9+
<!--
10+
Licensed under the Apache License, Version 2.0 (the "License");
11+
you may not use this file except in compliance with the License.
12+
You may obtain a copy of the License at
13+
14+
http://www.apache.org/licenses/LICENSE-2.0
15+
16+
Unless required by applicable law or agreed to in writing, software
17+
distributed under the License is distributed on an "AS IS" BASIS,
18+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
19+
See the License for the specific language governing permissions and
20+
limitations under the License. See accompanying LICENSE file.
21+
-->
22+
23+
# AWS STS Design for Ozone S3
24+
25+
# 1. Introduction
26+
27+
S3 credentials used to communicate with Ozone S3 APIs are based on a Kerberos identity.
28+
29+
Historically, the Ozone community has had interest in a REST API capable of programmatically generating
30+
temporary S3 credentials.
31+
32+
Amazon AWS has the [Security Token Service (STS)](https://docs.aws.amazon.com/STS/latest/APIReference/welcome.html) which
33+
provides the ability to generate short-lived access to resources.
34+
35+
The primary scope of this document is to detail the initial implementation of STS within the Ozone ecosystem.
36+
37+
# 2. Why Use STS Tokens?
38+
39+
Providing short-lived access to various resources in Ozone is useful in scenarios such as Data Lake
40+
solutions that want to aggregate data across multiple cloud providers.
41+
42+
# 3. How Ozone STS Works
43+
44+
The initial implementation of Ozone STS supports only the [AssumeRole](https://docs.aws.amazon.com/STS/latest/APIReference/API_AssumeRole.html)
45+
API from the AWS specification. A new STS endpoint `/sts` on port `9880` (port `9881` for https) will be created to service STS requests in the S3 Gateway.
46+
We use a separate port for STS to align with AWS so we don't have conflicts at a later time. This means we have:
47+
- Admin port for Ozone specific S3 admin operations
48+
- STS port for STS APIs, analogous to AWS' separate STS endpoint
49+
- Existing dedicated port/endpoint for S3 object APIs.
50+
51+
Furthermore, the initial implementation of Ozone STS focuses only on Apache Ranger for authorization in the first phase,
52+
as it aligns more with IAM policies. Support for the Ozone Native Authorizer may be provided in a future phase.
53+
54+
## 3.1 Capabilities
55+
56+
The Ozone STS implementation has the following capabilities:
57+
58+
- Create temporary credentials that last from a minimum of 15 minutes to a maximum of 12 hours. The
59+
return value of the AssumeRole call will be temporary credentials consisting of 3 components:
60+
- accessKeyId - a generated String identifier (cryptographically strong using SecureRandom) beginning with the sequence "ASIA"
61+
- secretAccessKey - a generated String password (cryptographically strong using SecureRandom)
62+
- sessionToken - a Base64-encoded opaque String identifier
63+
- The temporary credentials will have the permissions associated with a role. Furthermore, an
64+
[AWS IAM Session Policy](https://docs.aws.amazon.com/IAM/latest/UserGuide/access_policies.html#policies_session) can
65+
**optionally** be sent in the AssumeRole API call to limit the scope of the permissions further. If
66+
an IAM policy is specified, the temporary credential will have the permissions comprising the intersection of the role permissions
67+
and the IAM policy permissions. **Note:** If the IAM policy is specified and does not grant any permissions, then
68+
the generated temporary credentials won't have any permissions and will essentially be useless.
69+
70+
## 3.2 Limitations in AssumeRole API Support
71+
72+
The AWS AssumeRole API has various required and optional fields. We will support the two required fields, i.e. `RoleArn`
73+
and `RoleSessionName`. Additionally, we will support the following optional fields (all others will be rejected):
74+
- `DurationSeconds`
75+
- `Policy`
76+
77+
## 3.3 Limitations in IAM Session Policy Support
78+
79+
The AWS IAM policy specification is vast and wide-ranging. The initial Ozone STS implementation supports a limited
80+
subset of its capabilities. The restrictions are outlined below:
81+
82+
- The only supported prefix in ResourceArn is `arn:aws:s3:::` - all others will be rejected. **Note**: a ResourceArn
83+
of `*` is supported as well.
84+
- The only supported Condition operator is `StringEquals` - all others will be rejected.
85+
- The only supported Condition key is `s3:prefix` - all others will be rejected.
86+
- Only one Condition operator per Statement is supported - a Statement with more than one Condition will be rejected.
87+
- The only supported Effect is `Allow` - all others will be rejected.
88+
- If a (currently) unsupported S3 action is requested, such as `s3:GetAccelerateConfiguration`, it will be silently ignored.
89+
Similarly, an invalid S3 action will be silently ignored.
90+
- Supported wildcard expansions in Actions are: `s3:*`, `s3:Get*`, `s3:Put*`, `s3:List*`,
91+
`s3:Create*`, and `s3:Delete*`.
92+
93+
A sample IAM policy that allows read access to all objects in the `example-bucket` bucket is shown below:
94+
```JSON
95+
{
96+
"Version": "2012-10-17",
97+
"Statement": [
98+
{
99+
"Effect": "Allow",
100+
"Action": "s3:GetObject",
101+
"Resource": "arn:aws:s3:::example-bucket/*"
102+
}
103+
]
104+
}
105+
106+
```
107+
108+
### 3.3.1 Additional Context on Design Behavior
109+
110+
As mentioned above, some limitations in IAM Session Policy support result in API calls being rejected, while others are
111+
silently ignored. This design behavior came after (much) discussion with external teams. One external team will send
112+
S3 actions that Ozone doesn't support, and they don't have flexibility to change what they are sending, so that's one
113+
reason for the silent ignore. This external team also mentioned that some research indicates AWS also does not fail
114+
the AssumeRole request just because the inline session policy references unknown or unsupported actions, but rather
115+
it will fail when the temporary credentials are used, so this design is accordance with that finding. Another external
116+
team agreed that behavior is fine for actions, but does not work for Conditions, because one can have a Condition to
117+
restrict calls by sourceIp, and if we silently ignore this, the client may incorrectly think the temporary credentials
118+
are restricted for use by that IP address, so the consensus was to reject the request for that scenario.
119+
120+
## 3.4 SessionToken Format
121+
122+
As mentioned above, one of the return values from the AssumeRole call will be the sessionToken. To support not
123+
storing temporary credentials server-side in Ozone, the sessionToken will comprise various components needed to validate
124+
subsequent S3 calls that use the token. The sessionToken will have the following information encoded:
125+
126+
- originalAccessKeyId - this is the Kerberos identity of the user that created the sessionToken via the AssumeRole call.
127+
When the temporary credentials are used to make S3 API calls, this Kerberos identity (in conjunction with the role permissions and
128+
optional session policy) will be used to authorize the call. This identity is included in the sessionToken because
129+
S3 API calls (such as PutObject) require a Kerberos identity, but the temporary credentials don't have a
130+
Kerberos identity associated to them, therefore the Kerberos identity of the user that created the token will be used in
131+
these cases.
132+
- roleArn - the role used in the original AssumeRole call
133+
- encrypted secretAccessKey - this will be used to validate the AWS signature when the temporary credentials are used
134+
to make S3 API calls
135+
- sessionPolicy - when using the RangerOzoneAuthorizer, if Ranger successfully authorizes the AssumeRole call,
136+
it will return a String representing the role the token was authorized for. Furthermore, if an AWS IAM Session Policy
137+
was included with the AssumeRole request, the String return value will also include resources (i.e. buckets, keys, etc.)
138+
and permissions (i.e. ACLType) corresponding to the AWS IAM Session Policy. These resources and permissions, if present,
139+
would further limit the scope of the permissions and resources granted by the role in Ranger, such that the temporary
140+
credential will have the permissions comprising the intersection of the role permissions and the sessionPolicy permissions.
141+
- HMAC-SHA256 signature - used to ensure the sessionToken was created by Ozone and was not altered since it was created.
142+
- expiration time of the token (via `ShortLivedTokenIdentifier#getExpiry()`)
143+
- UUID of the OzoneManager secret key used to sign the sessionToken and encrypt the secretAccessKey (via `ShortLivedTokenIdentifier#getSecretKeyId()`)
144+
145+
## 3.5 STS Token Revocation
146+
147+
In the rare event temporary credentials need to be revoked (ex. for security reasons), a table in the OzoneManager RocksDB will be created
148+
to store revoked tokens, and a command-line utility will be created to add tokens to the table. A background cleaner service
149+
will be created to run every 3 hours to delete revoked tokens that have been in the table for more than 12 hours. The
150+
input parameter for the command-line utility will be the sessionToken - this value is returned in plain text as a result
151+
of the AssumeRole call (mentioned above). In this way, specific STS tokens can be revoked as opposed to all tokens. Furthermore,
152+
AWS doesn't have a standard API to revoke tokens therefore we are creating our own system.
153+
Note: STS token revocation checks are strictly enforced and will fail-closed if there are internal errors such as not
154+
being able to communicate with the revocation database table, etc.
155+
Note: The creator of the STS token or an S3/tenant admin are the only ones allowed to revoke a token.
156+
157+
## 3.6 Prerequisites
158+
159+
A user must be configured with a Kerberos identity in Ozone and the S3 `getSecret` command
160+
must be called to issue permanent S3 credentials. With these credentials, the AssumeRole API call can be made, but additional
161+
steps below are needed for the call to be successfully authorized.
162+
When using RangerOzoneAuthorizer, a role must be configured in Ranger UI for each role the AssumeRole API
163+
can be used with. Further, Apache Ranger policies should be in place to grant the user permission to assume the role.
164+
165+
### 3.6.1 Additions to RangerOzoneAuthorizer
166+
167+
The `IAccessAuthorizer` interface that both the RangerOzoneAuthorizer and OzoneNativeAuthorizer implement, will have a
168+
new method:
169+
170+
```java
171+
default String generateAssumeRoleSessionPolicy(AssumeRoleRequest assumeRoleRequest) throws OMException {
172+
throw new OMException("The generateAssumeRoleSessionPolicy call is not supported", NOT_SUPPORTED_OPERATION);
173+
}
174+
```
175+
176+
When using RangerOzoneAuthorizer, the AssumeRole API call must invoke this method to ensure the caller is authorized to create
177+
temporary credentials, given the criteria in the AssumeRoleRequest. The AssumeRoleRequest input parameter will have the
178+
components:
179+
- `String` host - hostname of caller
180+
- `InetAddress` ip - IP address of caller
181+
- `UserGroupInformation` ugi - the user making the call
182+
- `String` targetRoleName - what role is being assumed
183+
- `Set<AssumeRoleRequest.OzoneGrant>` grants - further limiting the scope of the role according to the grants
184+
185+
The grants parameter is optional, and would only be present if the AssumeRole API call had an IAM session policy JSON
186+
parameter supplied. A conversion utility, `IamSessionPolicyResolver` will process the IAM policy and convert it to a
187+
`Set<AssumeRoleRequest.OzoneGrant>`, in effect translating from S3 nomenclature for resources and actions to Ozone nomenclature of
188+
`IOzoneObj` and `ACLType`. Ranger would use all of this information to determine if the AssumeRole call should be
189+
successfully authorized, and if so, it will return a String representation of the granted permissions and paths.
190+
191+
The format of this String is entirely up to the Ranger team. What is required from the Ozone side is to supply this String to Ranger when any
192+
subsequent S3 API calls are made that use STS tokens. In order to achieve this, the sessionPolicy String from Ranger will
193+
be included in the sessionToken response to the AssumeRole API call (as mentioned above), and Ozone will supply this String
194+
to Ranger whenever STS tokens are used on S3 API calls via a new `RequestContext.sessionPolicy` field in the
195+
`IAccessAuthorizer#checkAccess(IOzoneObj, RequestContext)` call.
196+
197+
## 3.7 Overall Flow
198+
199+
The following section outlines the overall flow when using STS in Ozone:
200+
201+
- An authorized user for AssumeRole API calls must be configured in Ozone, and if using RangerOzoneAuthorizer, the role
202+
created in Ranger as per the Prerequisites above.
203+
- This authorized user (having permanent S3 credentials) makes the AssumeRole STS call to Ozone.
204+
- If successful, Ozone responds with the temporary credentials.
205+
- A client makes S3 API calls with the temporary credentials for up to as long as the credentials last.
206+
- When Ozone receives an S3 api call using temporary credentials, it will use the Kerberos identity associated with the
207+
originalAccessKeyId in the session token and perform the following checks:
208+
- Ensure that if the accessKeyId starts with "ASIA", that a sessionToken was included in the `x-amz-security-token` header
209+
- Ensure the sessionToken is not expired
210+
- Ensure the sessionToken is not revoked via a `keyMayExist` check in OzoneManager RocksDB
211+
- Validate the HMAC-SHA256 signature in the sessionToken
212+
- Decrypt the secretAccessKey from the sessionToken and validate the AWS signature
213+
- Authorize the call with either RangerOzoneAuthorizer or OzoneNativeAuthorizer
214+
215+
Assuming all these checks pass, the S3 API call will be invoked.

0 commit comments

Comments
 (0)