|
| 1 | +# [SAI] Hashing Enhancements for Efficient RoCE Traffic Distribution |
| 2 | +------------------------------------------------------------------------------- |
| 3 | + Title | Hashing Enhancements for Efficient RoCE Traffic Distribution |
| 4 | +-------------|----------------------------------------------------------------- |
| 5 | + Authors | Satheesh Kumar Karra, Ravindranath C K (Marvell) |
| 6 | + Status | In review |
| 7 | + Type | Standards track |
| 8 | + Created | 2025-02-27 |
| 9 | + SAI-Version | 1.16 |
| 10 | +------------------------------------------------------------------------------- |
| 11 | + |
| 12 | +## 1.0 Introduction |
| 13 | + |
| 14 | + |
| 15 | +SAI (Switch Abstraction Interface) supports customization of hash field |
| 16 | +selection through the `saihash` object, allowing users to define hash fields |
| 17 | +based on network requirements. Configured `saihash` objects can be applied |
| 18 | +to different ECMP (Equal cost multi path) traffic flows using the following |
| 19 | +SAI switch attributes: |
| 20 | + |
| 21 | + |
| 22 | +1) SAI_SWITCH_ATTR_ECMP_HASH_IPV4 – Specifies the hash object for IPv4 packets in ECMP. |
| 23 | +2) SAI_SWITCH_ATTR_ECMP_HASH_IPV4_IN_IPV4 – Specifies the hash object for IPv4-in-IPv4 encapsulated packets in ECMP. |
| 24 | +3) SAI_SWITCH_ATTR_ECMP_HASH_IPV6 – Specifies the hash object for IPv6 packets in ECMP. |
| 25 | + |
| 26 | +These attributes allow fine-tuned ECMP hashing, optimizing traffic |
| 27 | +distribution based on application needs. Network administrators can create |
| 28 | +custom hash lists using SAI native hash fields and bind them to above switch |
| 29 | +attributes. SAI provided similar configurations even for LAG (Link |
| 30 | +Aggregation Groups), which ensures balanced traffic distribution across |
| 31 | +member links, reducing congestion, and enhancing overall network |
| 32 | +efficiency. |
| 33 | + |
| 34 | +In the current configuration, Remote Direct Memory Access over Converged |
| 35 | +Ethernet (RoCE) traffic utilizes the same ECMP and LAG hash objects as |
| 36 | +standard IP traffic. However, this can lead to traffic polarization, |
| 37 | +especially when multiple RoCE streams share the same IP endpoints. |
| 38 | + |
| 39 | + |
| 40 | +## 2.0 Motivation |
| 41 | + |
| 42 | +The packet fields up to the L4 header for different RDMA streams between the |
| 43 | +same endpoints will be mostly identical leading to all these streams to hash |
| 44 | +to the same member. In order to improve the hash distribution for RDMA |
| 45 | +traffic, modern NPUs have native support for hashing on RDMA header fields. |
| 46 | + |
| 47 | +This proposal adds SAI native hash field support for the below fields in the |
| 48 | +RDMA Base Transport Header: |
| 49 | + |
| 50 | +- Queue Pair (QP) Number |
| 51 | +- RDMA opcode(Operation type) |
| 52 | + |
| 53 | + |
| 54 | +## 3.0 SAI Enhancements |
| 55 | + |
| 56 | +1) New Hash fields to support RoCE : |
| 57 | + ```c |
| 58 | + |
| 59 | + /** |
| 60 | + * @brief Attribute data for SAI native hash fields |
| 61 | + */ |
| 62 | + typedef enum _sai_native_hash_field_t |
| 63 | + { |
| 64 | + |
| 65 | + ... |
| 66 | + /** Native hash field RDMA packet BTH(Base Transport Header) opcode */ |
| 67 | + SAI_NATIVE_HASH_FIELD_RDMA_BTH_OPCODE, |
| 68 | + |
| 69 | + /** Native hash field RDMA packet BTH destination queue pair */ |
| 70 | + SAI_NATIVE_HASH_FIELD_RDMA_BTH_DEST_QP, |
| 71 | + |
| 72 | + } sai_native_hash_field_t; |
| 73 | + ``` |
| 74 | +2) Switch Attributes to configure Hashing for RoCE Traffic: |
| 75 | + ```c |
| 76 | + /** |
| 77 | + * @brief Attribute Id in sai_set_switch_attribute() and |
| 78 | + * sai_get_switch_attribute() calls. |
| 79 | + */ |
| 80 | + typedef enum _sai_switch_attr_t |
| 81 | + { |
| 82 | + ... |
| 83 | + /** |
| 84 | + * @brief The hash object for IPv4 RDMA packets going through ECMP |
| 85 | + * |
| 86 | + * @type sai_object_id_t |
| 87 | + * @flags CREATE_AND_SET |
| 88 | + * @objects SAI_OBJECT_TYPE_HASH |
| 89 | + * @allownull true |
| 90 | + * @default SAI_NULL_OBJECT_ID |
| 91 | + */ |
| 92 | + SAI_SWITCH_ATTR_ECMP_HASH_IPV4_RDMA, |
| 93 | +
|
| 94 | + /** |
| 95 | + * @brief The hash object for IPv6 RDMA packets going through ECMP |
| 96 | + * |
| 97 | + * @type sai_object_id_t |
| 98 | + * @flags CREATE_AND_SET |
| 99 | + * @objects SAI_OBJECT_TYPE_HASH |
| 100 | + * @allownull true |
| 101 | + * @default SAI_NULL_OBJECT_ID |
| 102 | + */ |
| 103 | + SAI_SWITCH_ATTR_ECMP_HASH_IPV6_RDMA, |
| 104 | +
|
| 105 | + /** |
| 106 | + * @brief The hash object for IPv4 RDMA packets going through LAG |
| 107 | + * |
| 108 | + * @type sai_object_id_t |
| 109 | + * @flags CREATE_AND_SET |
| 110 | + * @objects SAI_OBJECT_TYPE_HASH |
| 111 | + * @allownull true |
| 112 | + * @default SAI_NULL_OBJECT_ID |
| 113 | + */ |
| 114 | + SAI_SWITCH_ATTR_LAG_HASH_IPV4_RDMA, |
| 115 | + |
| 116 | + /** |
| 117 | + * @brief The hash object for IPv6 RDMA packets going through LAG |
| 118 | + * |
| 119 | + * @type sai_object_id_t |
| 120 | + * @flags CREATE_AND_SET |
| 121 | + * @objects SAI_OBJECT_TYPE_HASH |
| 122 | + * @allownull true |
| 123 | + * @default SAI_NULL_OBJECT_ID |
| 124 | + */ |
| 125 | + SAI_SWITCH_ATTR_LAG_HASH_IPV6_RDMA, |
| 126 | + ... |
| 127 | + } sai_switch_attr_t; |
| 128 | + ``` |
| 129 | + |
| 130 | + |
| 131 | +## 4.0 API Example |
| 132 | + |
| 133 | +### Create Hash Object |
| 134 | + |
| 135 | +```c |
| 136 | + |
| 137 | +hash_count = 0; |
| 138 | +sai_attr_list[0].id = SAI_HASH_ATTR_NATIVE_HASH_FIELD_LIST; |
| 139 | +...(Other hash fileds) |
| 140 | +sai_attr_list[0].value.s32list.list[hash_count++] = |
| 141 | +SAI_NATIVE_HASH_FIELD_RDMA_BTH_OPCODE; |
| 142 | +sai_attr_list[0].value.s32list.list[hash_count++] = |
| 143 | +SAI_NATIVE_HASH_FIELD_RDMA_BTH_DEST_QP; |
| 144 | +sai_attr_list[0].value.s32list.count = hash_count; |
| 145 | +attr_count =1 |
| 146 | + |
| 147 | +sai_create_hash_fn( |
| 148 | + &hash_rdma_v4_oid, |
| 149 | + switch_id, |
| 150 | + attr_count, |
| 151 | + sai_attr_list); |
| 152 | +``` |
| 153 | +
|
| 154 | +### Configure RDMA Hash on Switch |
| 155 | +
|
| 156 | +```c |
| 157 | +attr_count = 0 |
| 158 | +sai_attr_list[attr_count].id = SAI_SWITCH_ATTR_ECMP_HASH_IPV4_RDMA; |
| 159 | +sai_attr_list[attr_count].value.oid = hash_rdma_v4_oid; |
| 160 | +
|
| 161 | +sai_set_switch_attribute_fn( |
| 162 | + switch_id, |
| 163 | + sai_attr_list); |
| 164 | +``` |
0 commit comments