HDDS-13579. [Docs] Explain how Ratis write pipelines are calculated#9580
HDDS-13579. [Docs] Explain how Ratis write pipelines are calculated#9580jojochuang wants to merge 4 commits into
Conversation
|
@jojochuang , thanks for working on this! Could you limit the line length to 120 characters like the code? It is easier to comment on it. |
Discovered and corrected the documentation for how the number of EC pipelines is calculated. The previous analysis was incorrect. - `ErasureCoding.md` is updated to describe the two new properties `ozone.scm.ec.pipeline.minimum` and `ozone.scm.ec.pipeline.per.volume.factor` and the `max()` logic used to determine the target number of pipelines. - `ProductionDeployment.md` is updated to reference the correct and existing configuration property for tuning EC pipelines. Change-Id: I393dc60d8745da2b2bb7899530665a108956446d
Change-Id: I0ec667c21155436eb6a0654782b43b48636f75d5
|
thanks for review. updated. |
ashishkumar50
left a comment
There was a problem hiding this comment.
@jojochuang Thanks for writing this up.
| recommended, as it allows the cluster to scale pipeline capacity naturally with its resources. You can use the | ||
| global limit (`ozone.scm.ratis.pipeline.limit`) as a safety cap if needed. A good starting value for | ||
| `ozone.scm.pipeline.per.metadata.disk` is **2**. Monitor the `NumOpenPipelines` metric in SCM to see if the | ||
| actual number of pipelines aligns with your configured targets. |
There was a problem hiding this comment.
I think NumOpenPipelines metrics doesn't exist. We can either use admin command to see number of open pipelines or may use recon as well to see the open pipelines.
There was a problem hiding this comment.
To clarify, the SCM web UI has a section Pipeline Statistics, and it pulls the metrics from:
{
"name": "Hadoop:service=SCMPipelineManager,name=SCMPipelineManagerInfo",
"modelerType": "org.apache.hadoop.hdds.scm.pipeline.PipelineManagerImpl",
"PipelineInfo": [
{
"key": "CLOSED",
"value": 0
},
{
"key": "ALLOCATED",
"value": 0
},
{
"key": "OPEN",
"value": 1
},
{
"key": "DORMANT",
"value": 0
}
]
},
Change-Id: I2537905761cf45d23cdb3701b2f0c94e7ff2485a
szetszwo
left a comment
There was a problem hiding this comment.
@jojochuang , thanks for working on this!
The code is quite distributed so it is not easy to understand the exact calculation. Please see the comments inlined (not yet reviewed the example).
|
|
||
| ### Calculating Ratis Pipeline Limits | ||
|
|
||
| The target number of open, FACTOR_THREE Ratis pipelines is controlled by three properties that define the maximum |
There was a problem hiding this comment.
... FACTOR_THREE ...
Let's use ReplicationFactor.THREE, which is a client API.
... three properties ... that define the maximum
three configuration properties ... that limit the
| ### Calculating Ratis Pipeline Limits | ||
|
|
||
| The target number of open, FACTOR_THREE Ratis pipelines is controlled by three properties that define the maximum | ||
| number of pipelines in the cluster at a cluster-wide level, datanode level, and metadata disk level, respectively. |
There was a problem hiding this comment.
... a cluster-wide level, datanode level, and metadata disk level, respectively.
a cluster-wide level and a datanode level.
| 1. **Cluster-wide Limit (`ozone.scm.ratis.pipeline.limit`)** | ||
| * **Description**: An absolute, global limit for the total number of open, FACTOR_THREE Ratis pipelines | ||
| across the entire cluster. This acts as a final cap on the total number of pipelines. | ||
| * **Default Value**: `0` (which means no global limit is enforced by default). |
There was a problem hiding this comment.
... no global limit is enforced by default).
no global limit by default).
| * **Calculation**: If this is set, the target is `(<this value> * <number of healthy datanodes>) / 3`. | ||
|
|
||
| 3. **Datanode-level Dynamic Limit (`ozone.scm.pipeline.per.metadata.disk`)** | ||
| * **Description**: This property is used only when `ozone.scm.datanode.pipeline.limit` is explicitly set to `0`. |
There was a problem hiding this comment.
... This property is used ... is explicitly set to
0.
This property takes effect ... is not set to a positive number.
|
|
||
| The target number of open, FACTOR_THREE Ratis pipelines is controlled by three properties that define the maximum | ||
| number of pipelines in the cluster at a cluster-wide level, datanode level, and metadata disk level, respectively. | ||
| SCM will create pipelines until the most restrictive limit is met. |
There was a problem hiding this comment.
Replace this sentence with
- "The number of pipelines created by SCM is restricted by these limits."
| SCM will create pipelines until the most restrictive limit is met. | ||
|
|
||
| 1. **Cluster-wide Limit (`ozone.scm.ratis.pipeline.limit`)** | ||
| * **Description**: An absolute, global limit for the total number of open, FACTOR_THREE Ratis pipelines |
There was a problem hiding this comment.
Remove FACTOR_THREE since the code actually includes ReplicationFactor.ONE pipeline; see
| * **Description**: When set to a positive number, this property defines a fixed maximum number of pipelines for | ||
| every datanode. This is one of two ways to calculate a cluster-wide target. | ||
| * **Default Value**: `2` | ||
| * **Calculation**: If this is set, the target is `(<this value> * <number of healthy datanodes>) / 3`. |
There was a problem hiding this comment.
* **Cluster-wide Limit Calculation**: If this property is set,
the number of pipelines in the cluster is in addition limited by
`(<this value> * <number of healthy datanodes>) / 3`.(revised)
| * **Calculation**: The limit for each datanode is | ||
| `(<this value> * <number of metadata disks on that datanode>)`. | ||
| The total cluster-wide target is the sum of all individual datanode limits, divided by 3. |
There was a problem hiding this comment.
Remove this section since it seems not true (or I cannot find the code enforcing it). I do see that the value is used for filtering datanodes.
|
|
||
| 2. **Datanode-level Fixed Limit (`ozone.scm.datanode.pipeline.limit`)** | ||
| * **Description**: When set to a positive number, this property defines a fixed maximum number of pipelines for | ||
| every datanode. This is one of two ways to calculate a cluster-wide target. |
There was a problem hiding this comment.
Remove the sentence "This is one of two ways to calculate a cluster-wide target." since the Dynamic Limit
is not used in RatisPipelineProvider.exceedPipelineNumberLimit(..).
| For most production deployments, using the dynamic per-disk limit (`ozone.scm.datanode.pipeline.limit=0`) is | ||
| recommended, as it allows the cluster to scale pipeline capacity naturally with its resources. You can use the | ||
| global limit (`ozone.scm.ratis.pipeline.limit`) as a safety cap if needed. A good starting value for | ||
| `ozone.scm.pipeline.per.metadata.disk` is **2**. Monitor the section **Pipeline Statistics** in SCM web UI, or run | ||
| the command `ozone admin pipeline list` to see if the actual number of pipelines aligns with your configured targets. |
There was a problem hiding this comment.
I think there should a tradeoff of having a lot of concurrent pipelines. This might be worth documenting.
Here are a few I can thinks of
- Each Ratis group takes some resources (e.g. 8MB for the write buffer IIRC)
- Larger number of pipelines increase the load on the metadata volume which might cause contention
- The higher number of pipelines, there will be higher number concurrent storage containers. If one DN is down and all the pipelines are closed, we might end up with a lot of small containers, which might have overhead long term.
Change-Id: Ieff1a5ffe67c4eed32a7c61c655c19ba9f307d26
|
Ok. updated based on Nicholas' comment. I am going to go over this again since it's been awhile. |
|
This PR has been marked as stale due to 21 days of inactivity. Please comment or remove the stale label to keep it open. Otherwise, it will be automatically closed in 7 days. |
|
Thank you for your contribution. This PR is being closed due to inactivity. Please contact a maintainer if you would like to reopen it. |
What changes were proposed in this pull request?
HDDS-13579. [Docs] Explain how Ratis write pipelines are calculated
Please describe your PR in detail:
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-13579
How was this patch tested?
Doc only.