You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Leverage the existing Recon service to build the dashboard with centralized and efficient data collection.
72
+
Recon currently maintains synchronization with the OM database and constructs the NSSummary tree, providing established calculation logic for metrics such as openKeysBytes and committedBytes.
73
+
Additionally, Recon already possesses a comprehensive physical and logical capacity to break down information through its OM DB insights component.
74
+
These existing capabilities can be effectively leveraged to minimize development effort and ensure consistency.
75
+
While certain enhancements to OM are required regardless of the chosen implementation approach—whether CLI-based or Prometheus-driven—the foundational data processing infrastructure is already in place.
76
+
The modifications outlined for OM, SCM, and DataNode components remain mandatory across all proposed approaches to ensure complete and accurate storage distribution reporting.
72
77
73
78
### Benefits
74
79
75
-
-**Unified Data Source**: All metrics aggregated centrally in Recon
This approach would involve publishing storage distribution metrics directly from individual components (OM, SCM, DataNodes) to Prometheus, with visualization handled entirely through Grafana dashboards.
136
+
137
+
### Why This Approach Is Not Recommended
138
+
139
+
#### **1. Customer Adoption and User Experience**
140
+
-**Current Reality**: Customers are already actively using Recon for storage analysis and monitoring
141
+
-**Existing Feedback**: Users have specifically identified gaps in Recon's current calculations and requested improvements within the existing interface
142
+
-**User Workflow Disruption**: Introducing a completely separate monitoring stack would fragment the user experience
143
+
-**Training and Adoption Overhead**: Teams would need to learn new tools and workflows, creating adoption barriers
144
+
145
+
#### **2. Incomplete Current State**
146
+
The primary driver for this enhancement is that **customers have identified that Recon's existing calculations are incomplete or incorrect**. Key issues include:
147
+
- Inconsistent storage usage calculations across different views
148
+
- Missing pending deletion visibility at granular levels
149
+
- Lack of real-time correlation between logical and physical storage metrics
150
+
- Incomplete breakdown of storage distribution across cluster components
151
+
152
+
Moving to Prometheus/Grafana would not address these calculation issues. It would simply relocate them to a different platform while requiring significant additional implementation effort.
-**Data Access**: Recon already has optimized access to OM DB, SCM metadata, and DN reports
156
+
-**Calculation Engine**: Existing framework for cross-component metric aggregation and correlation
157
+
-**Web Interface**: Established a UI framework for complex data visualization and drill-down capabilities
158
+
-**User Base**: Active user community familiar with Recon's interface and capabilities
159
+
160
+
## Approach 3: CLI-based (Not Proceeding)
129
161
130
162
A CLI-based approach was evaluated to compute detailed usage and pending deletion breakdown by analyzing offline OM and SCM database checkpoints and querying DataNodes.
131
163
While it offers precise, up-to-date results and independence from Recon, it introduces significant operational overhead.
@@ -137,12 +169,7 @@ Given its complexity, dependency on manual execution, and high resource consumpt
137
169
138
170
# Summary
139
171
140
-
The proposed dashboard enhances visibility into cluster storage dynamics, enabling better debugging and decision-making. Recon is the ideal location for this feature due to its existing role as the observability hub in Ozone.
141
-
142
-
This enhancement lays the foundation for future innovations like:
143
-
144
-
- Storage heatmaps
145
-
- Auto-balancing recommendations
146
-
- UI-based debugging for deletion backlogs
172
+
The proposed dashboard improves visibility into cluster storage dynamics, providing deeper insights for effective debugging and informed decision-making.
173
+
Recon is the ideal place to host this feature, given its established role as the central storage overview in Ozone.
0 commit comments