Skip to content

Commit ed954a3

Browse files
updating feature layout section.
1 parent ab07c7e commit ed954a3

1 file changed

Lines changed: 17 additions & 13 deletions

File tree

hadoop-hdds/docs/content/design/storage-distribution.md

Lines changed: 17 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -270,26 +270,30 @@ The pending deletion bytes are calculated and updated within the BlockDeletionSe
270270

271271
### New HDDS Layout and Upgrade
272272

273-
A new HDDS layout version, DATA_DISTRIBUTION, has been introduced to handle the potential issues during upgrading.
273+
A new HDDS layout feature, DATA_DISTRIBUTION, has been introduced to handle upgrade and downgrade scenarios correctly and ensure accurate persistence of pending deletion metrics.
274274

275-
- In the OM to SCM block deletion request, a new field is added to protobuf to represent the block with its size. If a new OM connects to an old SCM, the request will be accepted by SCM, but the new field will be ignored by the old SCM. So OM will think these blocks are deleted by actually not, leading to orphan blocks residual in containers.
276-
- In SCM, for an existing Ozone cluster, SCM may already have many DeletedBlocksTransactions in DB without block size information. As SCM maintains a DeletedBlocksTransactionSummary which requires update on new transactions creation, and finished transactions creation. For those old existing transactions, DeletedBlocksTransactionSummary doesn't need to be updated when they are finished and deleted. The DATA_DISTRIBUTION feature finalization action is a good timing to distinguish which are new transactions, which are old transactions.
275+
#### Why a New Layout Feature Is Needed
277276

278-
Before DATA_DISTRIBUTION is not finalized,
277+
Although rolling upgrades are not supported in Ozone (meaning OM and SCM should never run different versions simultaneously), SCM and Datanodes may persist aggregated size metrics in their databases during the upgrade process.
278+
If these metrics are written before the cluster is finalized, and later the cluster is downgraded and re-upgraded, the old values can become stale or inaccurate since old code won’t update them correctly.
279279

280-
- The OM to SCM block deletion request will use the current existing field for block ID info
281-
- SCM will not collect DeletedBlocksTransactionSummary, nor it will expose the information
282-
- Datanode will not expose the TotalPendingBytes
283-
- Since Recon doesn't be covered in current Ozone upgrading framework, Recon should be prepared for the case that both SCM and Datanode doesn't have the info required for /storagedistribution
280+
To prevent such inaccuracies, SCM and DNs must only start persisting the aggregated pending deletion data once the layout feature is finalized.
284281

285-
After the upgrade:
282+
#### Behavior Before DATA_DISTRIBUTION Finalization
286283

287-
- The OM to SCM block deletion request will use the new field for block ID info and size
288-
- SCM will know the start ID of new transactions, aggregate block size in OM request into DeletedBlocksTransaction, update and expose DeletedBlocksTransactionSummary. Old requests from old OM are still supported.
289-
- Datanode will receive DeletedBlocksTransactions with or without block size included from SCM, for new transactions or old existing transactions. Datanode should handle them properly, and begin to publish metrics, including pending deletion bytes for new transactions.
284+
- SCM does not collect or expose DeletedBlocksTransactionSummary.
285+
- DNs do not report TotalPendingBytes.
286+
- Recon must handle the case where SCM and DN do not expose storage distribution data.
287+
288+
#### Behavior After DATA_DISTRIBUTION Finalization
289+
290+
- SCM aggregates block sizes for new transactions, updates DeletedBlocksTransactionSummary, and exposes these metrics.
291+
- DNs handle both legacy and new transactions, calculate pending deletion bytes for new ones, and report them.
292+
- The persisted counters in SCM and DN reflect only data deleted after finalization, ensuring accuracy even across upgrade/downgrade cycles.
290293

291294
#### Known Limitations
292-
For an existing Ozone cluster updating, if there are existing pending deletion transactions in SCM and DN, these transactions will not be covered in the new data/metrics exposed by SCM and DN. So the data shown by Recon UI can vary a lot from the real total pending deletion size. The gap will reduce gradually after existing old transactions are executed and finished.
295+
For an existing Ozone cluster updating, if there are existing pending deletion transactions in SCM and DN, these transactions will not be covered in the new data/metrics exposed by SCM and DN.
296+
So the data shown by Recon UI can vary a lot from the real total pending deletion size. The gap will reduce gradually after existing old transactions are executed and finished.
293297

294298
## Approach 2: CLI-based (Not Proceeding)
295299

0 commit comments

Comments
 (0)