Skip to content

Commit 099a9d7

Browse files
authored
stats: optimize StatName hot paths in symbol table (envoyproxy#44448)
Optimizations to StatName to make it more friendly for use as a key in a map. Specifically we improve `operator==` and similar calls and clean up some duplicate unnecessary work across the file. (nb: if preferred i can resubmit the commits as separated PRs, do let me know) Key changes: - Inline `dataSize()`, `decodeNumber()`, `encodingSizeBytes()` into the header to make inlining easier - Fast-path `decodeNumber()` for values <= 127 - skip loop for values with length less than 127 - Branch-free `encodingSizeBytes()` via `std::countl_zero` - `StatNameList::front()` + `MetricHelper::statName()` bypass - avoid `std::function` allocation and iterator setup for the common "get first element" - `operator==` fast-path pointer-identity check and early size comparison before `memcmp`; `AbslHashValue` also avoids redundant varint decode - `empty()` checks first byte instead of full varint decode - `0x00` uniquely encodes zero length (`d040567`) Cumulative benchmark results (`--benchmark_min_time=5s`): | Benchmark | Before | After | Improvement | |---|---|---|---| | `BM_StatsNoTls` | 5697 us | 5077 us | **-10.9%** | | `BM_StatsWithTls` | 6023 us | 5453 us | **-9.5%** | | `BM_StatsWithTlsAndRejectionsWithDot` | 2889 us | 2715 us | **-6.1%** | | `BM_StatsWithTlsAndRejectionsWithoutDot` | 5486 us | 4866 us | **-11.3%** | | `bmCreateRace` | 49.2 ms | 49.7 ms | ~flat | | `bmJoinStatNames` | 101 ns | 87.6 ns | **-13.3%** | | `bmJoinElements` | 233 ns | 208 ns | **-10.7%** | | `bmCompareElements` | 34.3 ns | 33.5 ns | **-2.3%** | | `bmSortByStatNames` | 49976 us | 50547 us | ~flat | | `bmStdSort` | 60993 us | 56058 us | **-8.1%** | | `bmStatNameMapLookup` | 22.4 ns | 8.52 ns | **-62.0%** | | `bmStatNameMapLookupDifferentPointer` | 22.4 ns | 11.8 ns | **-47.3%** | | `bmStatNameMapLookupMiss` | 12.1 ns | 6.70 ns | **-44.6%** | | `bmSortStrings` | 20443 us | 20488 us | ~flat | | `bmSetStrings` | 38079 us | 38100 us | ~flat | Optimizations were developed and benchmarked incrementally (see individual commit messages). The map-lookup benchmarks were added in the first commit to establish a baseline. Generative AI (Claude) was used as a development aid for this change. I understand all code in the PR. Risk Level: Medium - all changes are to internal implementation details with no API or behavioral changes. The varint encoding format is unchanged. But, it is a core component. Testing: - Existing `bazel test //test/common/stats/...` passes incl config=asan. - New `StatNameEqualityFastPaths` test covers pointer-identity, null-vs-zero-length, and null-vs-non-empty cases - Benchmarked with `symbol_table_benchmark` and `thread_local_store_speed_test` Docs Changes: N/A Release Notes: N/A Platform Specific Features: N/A --------- Signed-off-by: Jason Koch <jkoch@netflix.com>
1 parent 3f5a818 commit 099a9d7

5 files changed

Lines changed: 696 additions & 67 deletions

File tree

source/common/stats/metric_impl.cc

Lines changed: 2 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -32,33 +32,9 @@ MetricHelper::MetricHelper(StatName name, StatName tag_extracted_name,
3232
symbol_table.populateList(names.begin(), num_names, stat_names_);
3333
}
3434

35-
StatName MetricHelper::statName() const {
36-
StatName stat_name;
37-
stat_names_.iterate([&stat_name](StatName s) -> bool {
38-
stat_name = s;
39-
return false; // Returning 'false' stops the iteration.
40-
});
41-
return stat_name;
42-
}
35+
StatName MetricHelper::statName() const { return stat_names_.at(0); }
4336

44-
StatName MetricHelper::tagExtractedStatName() const {
45-
// The name is the first element in stat_names_. The second is the
46-
// tag-extracted-name. We don't have random access in that format,
47-
// so we iterate through them, skipping the first element (name),
48-
// and terminating the iteration after capturing the tag-extracted
49-
// name by returning false from the lambda.
50-
StatName tag_extracted_stat_name;
51-
bool skip = true;
52-
stat_names_.iterate([&tag_extracted_stat_name, &skip](StatName s) -> bool {
53-
if (skip) {
54-
skip = false;
55-
return true;
56-
}
57-
tag_extracted_stat_name = s;
58-
return false; // Returning 'false' stops the iteration.
59-
});
60-
return tag_extracted_stat_name;
61-
}
37+
StatName MetricHelper::tagExtractedStatName() const { return stat_names_.at(1); }
6238

6339
void MetricHelper::iterateTagStatNames(const Metric::TagStatNameIterFn& fn) const {
6440
enum { Name, TagExtractedName, TagName, TagValue } state = Name;

source/common/stats/symbol_table.cc

Lines changed: 0 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -14,13 +14,6 @@
1414
namespace Envoy {
1515
namespace Stats {
1616

17-
// Masks used for variable-length encoding of arbitrary-sized integers into a
18-
// uint8-array. The integers are typically small, so we try to store them in as
19-
// few bytes as possible. The bottom 7 bits hold values, and the top bit is used
20-
// to determine whether another byte is needed for more data.
21-
static constexpr uint32_t SpilloverMask = 0x80;
22-
static constexpr uint32_t Low7Bits = 0x7f;
23-
2417
// When storing Symbol arrays, we disallow Symbol 0, which is the only Symbol
2518
// that will decode into uint8_t array starting (and ending) with {0}. Thus we
2619
// can use a leading 0 in the first byte to indicate that what follows is
@@ -31,13 +24,6 @@ static constexpr uint32_t Low7Bits = 0x7f;
3124
static constexpr Symbol FirstValidSymbol = 1;
3225
static constexpr uint8_t LiteralStringIndicator = 0;
3326

34-
size_t StatName::dataSize() const {
35-
if (size_and_data_ == nullptr) {
36-
return 0;
37-
}
38-
return SymbolTable::Encoding::decodeNumber(size_and_data_).first;
39-
}
40-
4127
#ifndef ENVOY_CONFIG_COVERAGE
4228
void StatName::debugPrint() {
4329
// TODO(jmarantz): capture this functionality (always prints regardless of
@@ -105,17 +91,6 @@ void SymbolTable::Encoding::addSymbols(const std::vector<Symbol>& symbols) {
10591
}
10692
}
10793

108-
std::pair<uint64_t, size_t> SymbolTable::Encoding::decodeNumber(const uint8_t* encoding) {
109-
uint64_t number = 0;
110-
uint64_t uc = SpilloverMask;
111-
const uint8_t* start = encoding;
112-
for (uint32_t shift = 0; (uc & SpilloverMask) != 0; ++encoding, shift += 7) {
113-
uc = static_cast<uint32_t>(*encoding);
114-
number |= (uc & Low7Bits) << shift;
115-
}
116-
return std::make_pair(number, encoding - start);
117-
}
118-
11994
SymbolVec SymbolTable::Encoding::decodeSymbols(StatName stat_name) {
12095
SymbolVec symbol_vec;
12196
symbol_vec.reserve(stat_name.dataSize());

source/common/stats/symbol_table.h

Lines changed: 70 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -192,8 +192,31 @@ class SymbolTable final {
192192
*
193193
* @param The encoded byte array, written previously by appendEncoding.
194194
* @return A pair containing the decoded number, and the number of bytes consumed from encoding.
195+
*
196+
* Defined inline in the header to allow the compiler to inline on hot paths.
195197
*/
196-
static std::pair<uint64_t, size_t> decodeNumber(const uint8_t* encoding);
198+
static std::pair<uint64_t, size_t> decodeNumber(const uint8_t* encoding) {
199+
// fast-path for the case 127 or less
200+
if (encoding[0] < SpilloverMask) {
201+
return {encoding[0], 1};
202+
}
203+
204+
uint64_t number = 0;
205+
uint64_t uc = SpilloverMask;
206+
const uint8_t* start = encoding;
207+
for (uint32_t shift = 0; (uc & SpilloverMask) != 0; ++encoding, shift += 7) {
208+
uc = static_cast<uint32_t>(*encoding);
209+
number |= (uc & Low7Bits) << shift;
210+
}
211+
return std::make_pair(number, encoding - start);
212+
}
213+
214+
// Masks used for variable-length encoding of arbitrary-sized integers into a
215+
// uint8-array. The integers are typically small, so we try to store them in as
216+
// few bytes as possible. The bottom 7 bits hold values, and the top bit is used
217+
// to determine whether another byte is needed for more data.
218+
static constexpr uint32_t SpilloverMask = 0x80;
219+
static constexpr uint32_t Low7Bits = 0x7f;
197220

198221
private:
199222
friend class StatName;
@@ -587,6 +610,14 @@ class StatName {
587610
uint64_t hash() const { return absl::Hash<StatName>()(*this); }
588611

589612
bool operator==(const StatName& rhs) const {
613+
if (size_and_data_ == rhs.size_and_data_) {
614+
return true;
615+
}
616+
617+
if (size_and_data_ == nullptr || rhs.size_and_data_ == nullptr) {
618+
return empty() && rhs.empty();
619+
}
620+
590621
return dataAsStringView() == rhs.dataAsStringView();
591622
}
592623
bool operator!=(const StatName& rhs) const { return !(*this == rhs); }
@@ -595,7 +626,7 @@ class StatName {
595626
* @return size_t the number of bytes in the symbol array, excluding the
596627
* overhead for the size itself.
597628
*/
598-
size_t dataSize() const;
629+
size_t dataSize() const { return dataAsStringView().size(); }
599630

600631
/**
601632
* @return size_t the number of bytes in the symbol array, including the
@@ -624,7 +655,8 @@ class StatName {
624655
* @param mem_block_builder the builder to receive the storage.
625656
*/
626657
void appendDataToMemBlock(MemBlockBuilder<uint8_t>& storage) {
627-
storage.appendData(absl::MakeSpan(data(), dataSize()));
658+
auto sv = dataAsStringView();
659+
storage.appendData(absl::MakeSpan(reinterpret_cast<const uint8_t*>(sv.data()), sv.size()));
628660
}
629661

630662
#ifndef ENVOY_CONFIG_COVERAGE
@@ -635,18 +667,19 @@ class StatName {
635667
* @return A pointer to the first byte of data (skipping over size bytes).
636668
*/
637669
const uint8_t* data() const {
638-
if (size_and_data_ == nullptr) {
639-
return nullptr;
640-
}
641-
return size_and_data_ + SymbolTable::Encoding::encodingSizeBytes(dataSize());
670+
return reinterpret_cast<const uint8_t*>(dataAsStringView().data());
642671
}
643672

644673
const uint8_t* dataIncludingSize() const { return size_and_data_; }
645674

646675
/**
647676
* @return whether this is empty.
648677
*/
649-
bool empty() const { return size_and_data_ == nullptr || dataSize() == 0; }
678+
bool empty() const {
679+
// Avoid a full varint decode: it is sufficient to know the first byte,
680+
// since 0x00 uniquely encodes zero
681+
return size_and_data_ == nullptr || size_and_data_[0] == 0;
682+
}
650683

651684
/**
652685
* Determines whether this starts with the prefix. Note: dynamic segments
@@ -660,13 +693,16 @@ class StatName {
660693

661694
private:
662695
/**
663-
* Casts the raw data as a string_view. Note that this string_view will not
664-
* be in human-readable form, but it will be compatible with a string-view
665-
* hasher and comparator.
696+
* Casts the raw data as a string_view, decoding the varint length prefix.
697+
* All accessors that need the data pointer and/or size should delegate to
698+
* this method so the decode happens in exactly one place.
666699
*/
667700
absl::string_view dataAsStringView() const {
668-
return {reinterpret_cast<const char*>(data()),
669-
static_cast<absl::string_view::size_type>(dataSize())};
701+
if (size_and_data_ == nullptr) {
702+
return {};
703+
}
704+
const auto [data_size, prefix_size] = SymbolTable::Encoding::decodeNumber(size_and_data_);
705+
return {reinterpret_cast<const char*>(size_and_data_ + prefix_size), data_size};
670706
}
671707

672708
const uint8_t* size_and_data_{nullptr};
@@ -863,6 +899,27 @@ class StatNameList {
863899
*/
864900
void clear(SymbolTable& symbol_table);
865901

902+
/**
903+
* @return the number of StatNames in the list, or 0 if not populated.
904+
*/
905+
uint32_t size() const { return populated() ? storage_[0] : 0; }
906+
907+
/**
908+
* @return the StatName at the given index. This is O(index): each element
909+
* has a variable-length encoding, so callers that need to visit
910+
* many elements should prefer iterate().
911+
*
912+
* Requires: populated() && index < size().
913+
*/
914+
StatName at(uint32_t index) const {
915+
ASSERT(populated() && index < storage_[0]);
916+
const uint8_t* p = &storage_[1];
917+
for (uint32_t i = 0; i < index; ++i) {
918+
p += StatName(p).size();
919+
}
920+
return StatName(p);
921+
}
922+
866923
private:
867924
friend class SymbolTable;
868925

0 commit comments

Comments
 (0)