Skip to content

HDDS-13197. Design doc for storage capacity distribution.#8907

Merged
ChenSammi merged 25 commits intoapache:masterfrom
priyeshkaratha:HDDS-13177-design
Feb 24, 2026
Merged

HDDS-13197. Design doc for storage capacity distribution.#8907
ChenSammi merged 25 commits intoapache:masterfrom
priyeshkaratha:HDDS-13177-design

Conversation

@priyeshkaratha
Copy link
Copy Markdown
Contributor

@priyeshkaratha priyeshkaratha commented Aug 6, 2025

What changes were proposed in this pull request?

This PR adds a detailed design proposal for Storage Capacity Distribution Dashboard in Apache Ozone.
It includes the problem statement, goals, proposed CLI and Recon-based approaches, technical challenges, and output format for better observability into Ozone-used storage and deletion diagnostics.

The primary focus is on the Recon-based approach, and the CLI approach is included for reference but not being pursued due to complexity in large-scale clusters.

What is the link to the Apache JIRA

HDDS-13197

How was this patch tested?

NA

Copy link
Copy Markdown
Contributor

@errose28 errose28 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a Prometheus + Grafana based approach summary to this document, since it seems to meet all the use cases yet requires much less code change. Bear in mind that Recon also publishes metrics which can be used in Grafana dashboards. For example, if we want to track the number of pending delete keys/blocks/bytes from the OM DB, Recon can still do that calculation by walking the deleted table, but publish the number as a metric which can be consumed by a Grafana dashboard that is also pulling metrics from other components.

Comment thread hadoop-hdds/docs/content/design/storage-distribution.md Outdated
Comment thread hadoop-hdds/docs/content/design/storage-distribution.md Outdated
Comment thread hadoop-hdds/docs/content/design/storage-distribution.md Outdated
Comment thread hadoop-hdds/docs/content/design/storage-distribution.md Outdated
@errose28
Copy link
Copy Markdown
Contributor

errose28 commented Sep 8, 2025

We should make progress on this doc before introducing related code changes like #8995

@priyeshkaratha
Copy link
Copy Markdown
Contributor Author

Hi @errose28 I have updated document. I agree with your points but I have following concerns to go ahead with Recon approach.

Recon currently maintains synchronization with the OM database and constructs the NSSummary tree, providing established calculation logic for metrics such as openKeysBytes and committedBytes.
Additionally, Recon already possesses a comprehensive physical and logical capacity to break down information through its OM DB insights component.
These existing capabilities can be effectively leveraged to minimize development effort and ensure consistency.
While certain enhancements to OM are required regardless of the chosen implementation approach—whether CLI-based or Prometheus-driven—the foundational data processing infrastructure is already in place.
The modifications outlined for OM, SCM, and DataNode components remain mandatory across all proposed approaches to ensure complete and accurate storage distribution reporting.

@errose28
Copy link
Copy Markdown
Contributor

Hi @priyeshkaratha after looking at #8995 I think we are mostly on the same page about this feature, but the way the doc was written was a communication barrier.

The doc does not make a clear distinction between time series data (tracking deletion over time) and a point-in-time view of deletion. Recon is a good spot to have an overview of the immediate state of pending deletions, like #8995 has currently. Additionally, it is good to expose metrics and create a grafana dashboard to track deletion progress over time. Currently the doc frames these as two competing ideas, when really they should both be implemented in parallel.

I suggest some improvements to the doc so others are better able to understand the goals:

  • Clarify that a Recon page only provides a point-in-time view of deletion
  • Remove the current last section about Prometheus and Grafana.
    • These are industry standard tools that most large production deployments are already running for Ozone and other components in their stack as well.
    • The existence of metrics and dashboards does not mean that existing issues in Recon cannot be fixed as this section currently implies.
  • Replace the Prometheus and Grafana section with an outline of what pending delete metrics are exposed from each component, and how these can be aggregated into a dashboard to track space usage over time.
    • This should include Recon exposing metrics for things like number of pending delete blocks + bytes in the OM DB, which are too expensive for OM to calculate itself.

@priyeshkaratha
Copy link
Copy Markdown
Contributor Author

@errose28 Thanks for the detailed suggestions and corrections. I have addressed your points. Can you please review again once.
cc : @ChenSammi

@priyeshkaratha priyeshkaratha marked this pull request as ready for review September 23, 2025 08:34
@errose28
Copy link
Copy Markdown
Contributor

@swamirishi can you review this for correctness in the presence of snapshots?

@ChenSammi
Copy link
Copy Markdown
Contributor

ChenSammi commented Sep 30, 2025

@swamirishi can you review this for correctness in the presence of snapshots?

@errose28 , we have discussed the pending deletion file size held by snapshot info with @swamirishi sometime back. @swamirishi proposed HDDS-13036(#8587) which is a good idea to provide this info.

@errose28
Copy link
Copy Markdown
Contributor

@swamirishi made a comment on this during the community sync yesterday and I tagged him as a reminder to check it out since I had it open. Basically he wanted himself added as a reviewer so it seems there is not yet consensus in this area.

Copy link
Copy Markdown
Contributor

@errose28 errose28 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

High level approach looks good. I left a few comments on implementation details. Also if there are upgrade/downgrade compatibility concerns like this please define them here.

Comment thread hadoop-hdds/docs/content/design/storage-distribution.md Outdated
Comment thread hadoop-hdds/docs/content/design/storage-distribution.md Outdated
Comment thread hadoop-hdds/docs/content/design/storage-distribution.md Outdated
Comment thread hadoop-hdds/docs/content/design/storage-distribution.md Outdated
Comment thread hadoop-hdds/docs/content/design/storage-distribution.md Outdated
@priyeshkaratha priyeshkaratha changed the title HDDS-13177. Adding design doc for storage capacity distribution. HDDS-13197. Adding design doc for storage capacity distribution. Oct 16, 2025
@ChenSammi ChenSammi changed the title HDDS-13197. Adding design doc for storage capacity distribution. HDDS-13197. Design doc for storage capacity distribution. Oct 16, 2025
Copy link
Copy Markdown
Contributor

@errose28 errose28 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for updating the doc. I'm still having difficulty understanding the upgrade requirements though.

Comment thread hadoop-hdds/docs/content/design/storage-distribution.md Outdated
Comment thread hadoop-hdds/docs/content/design/storage-distribution.md Outdated
Comment thread hadoop-hdds/docs/content/design/storage-distribution.md Outdated
Comment thread hadoop-hdds/docs/content/design/storage-distribution.md Outdated
Comment thread hadoop-hdds/docs/content/design/storage-distribution.md Outdated
Comment thread hadoop-hdds/docs/content/design/storage-distribution.md
Comment thread hadoop-hdds/docs/content/design/storage-distribution.md Outdated
@priyeshkaratha priyeshkaratha deleted the HDDS-13177-design branch November 25, 2025 05:56
@priyeshkaratha priyeshkaratha restored the HDDS-13177-design branch November 25, 2025 05:56
@priyeshkaratha
Copy link
Copy Markdown
Contributor Author

@errose28 @ChenSammi can you review revised design doc.

Comment thread hadoop-hdds/docs/content/design/storage-distribution.md Outdated
Copy link
Copy Markdown
Contributor

@errose28 errose28 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@priyeshkaratha can you address the previous open comments here and here as well?

@priyeshkaratha
Copy link
Copy Markdown
Contributor Author

@errose28 I have addressed all the open comments. Thanks for all the suggestions. Can you have a final look into this?

Copy link
Copy Markdown
Contributor

@errose28 errose28 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the updates @priyeshkaratha

|------------------|--------|------------------------------------------------|
| datanodeUuid | String | Unique identifier for the DataNode |
| hostName | String | Hostname of the DataNode |
| capacity | Long | Total capacity of the DataNode in bytes. |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@priyeshkaratha, can we change this to ozoneCapacity? It's capacity for ozone, right?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Recon and SCM, the term capacity is consistently used across the codebase. So I think we should keep it as it is. Changing it would require significant refactoring and could also impact existing APIs like /datanodes and /clusterState.

For now, I would prefer to retain the current naming to avoid unnecessary changes and potential side effects.

Copy link
Copy Markdown
Contributor

@ChenSammi ChenSammi Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you change the comment to reflect that it's configured capacity for Ozone(or Datanode), not full disk capacity?

And it's better to reframe the comment for "reserved" to something like

"Configured reserved space in bytes, for non Ozone usage"

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ChenSammi for the suggestions. I have updated the suggestions. Also few existing field was missing dn reports which is recently added in master. I have updated that too.

Copy link
Copy Markdown
Contributor

@ChenSammi ChenSammi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @priyeshkaratha for updating the doc, LGTM. Thanks @errose28 for the review.

@ChenSammi ChenSammi merged commit e8ef91a into apache:master Feb 24, 2026
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants