The String module provides comprehensive string handling capabilities for the Malterlib framework, supporting multiple character encodings (ANSI, UTF-8, UTF-16, UTF-32/Unicode) with efficient algorithms and flexible storage implementations. This module offers high-performance string operations, formatting, parsing, and Unicode support.
- CStr - UTF-8 string (most common, default)
- CStrAnsi - ANSI/ASCII string (single-byte, no Unicode)
- CWStr - UTF-16 string (Windows wide string compatible)
- CUStr - UTF-32/Unicode string (full Unicode codepoint per character)
- gc_Str - Compile-time constant strings (constexpr)
- NonTracked variants - No memory tracking (e.g.,
CStrNonTracked) - Secure variants - Secure memory wiping (e.g.,
CStrSecure) - VMem variants - Virtual memory backed strings
- Span types - Non-owning string views (e.g.,
CStrSpan)
- Dynamic (
TCStrImp_Dynamic) - Heap-allocated, growable - Fixed (
TCStrImp_Fixed) - Stack-allocated, fixed size - Pointer (
TCStrImp_Pointer) - References external memory - Virtual (
TCStrImp_Virtual) - Abstract interface for custom storage
Extensive algorithm library with modular design:
- Text Operations: Find, Replace, Split, Trim, Capitalize, Case conversion
- Comparison: Compare, StartsWith, EndsWith, WildcardMatch
- Encoding: UTF-8/16/32 conversion, ANSI conversion
- Hashing: DJB2, Murmur3, SDBM algorithms
- Special: FuzzyMatch, Escape (Bash), Sanitize
- Formatters: Integer, Float, Binary, Time, String formatting
- Parsers: Integer, Float, String parsing with pattern matching
- Format Utils: Printf-style formatting with type safety
- Character Iterators: Navigate by characters/codepoints
- Unicode Iterator: Proper Unicode grapheme cluster iteration
- UTF Encode Iterators: Convert between encodings during iteration
- Output Iterators: Write encoded data during iteration
- Primary namespace:
NMib::NStr - System integration:
NMib::NSys::NStr - Private implementations:
NMib::NStr::NPrivate
- String classes:
C[Encoding]Str[Variant](e.g.,CStr,CWStrSecure) - Template classes:
TC[Component](e.g.,TCStr,TCFormat) - String traits:
CStrTraits_[Type](e.g.,CStrTraits_CStr) - Algorithms: Direct names (e.g.,
Find,Replace,Trim) - Character types:
ch8,ch16,ch32(signed),uch8,uch16,uch32(unsigned)
// Signed character types
ch8 - UTF-8/ANSI character
ch16 - UTF-16 character
ch32 - UTF-32/Unicode codepoint
// Unsigned variants
uch8, uch16, uch32
// Zero-on-destruction variants (secure)
zuch8, zuch16, zuch32enum EStrType {
EStrType_Ansi, // Single-byte ANSI/ASCII
EStrType_Unicode, // Full Unicode (UTF-32)
EStrType_UTF, // Variable-width UTF (8 or 16)
EStrType_Undefined
};- Core - Basic types and platform abstractions
- Container - Vector for string storage
- Memory - Allocator interfaces
- Algorithm - Sorting and searching primitives
- Iterator - Iterator base classes
- Encoding - Character encoding conversions
// Strings use template-based storage implementations
template <typename t_CStrTraits>
struct TCStr {
// Storage delegated to implementation class
typename t_CStrTraits::CImpl m_Impl;
};
// Dynamic implementation example
template <typename t_CStrTraits>
struct TCStrImp_Dynamic {
ch8* m_pData;
umint m_Capacity;
umint m_Length;
};Each algorithm is in a separate header for compilation efficiency:
Malterlib_String_Algorithm_[Name].h- InterfaceMalterlib_String_Algorithm_[Name].hpp- Implementation
// Type-safe formatting with compile-time checking
auto Result = "Value: {}, Float: {}"_f << 42 << 3.14f;
// Or using CFormat directly
CStr Result2 = CStr::CFormat("Value: {}, Float: {}") << 42 << 3.14f;- Full Unicode support with proper grapheme cluster handling
- Automatic encoding conversion between string types
- Iterator-based encoding transformation
- Normalization and case folding support
CStr Str("Hello World");
Str.f_Replace("World", "Malterlib");
Str.f_ToUpperCase();
if (Str.f_StartsWith("HELLO")) {
auto Pos = Str.f_Find("MALTERLIB");
}// Create formatted string directly
CStr Formatted = CStr::CFormat("Value: {}") << 42;
// Multiple values
CStr UserInfo = CStr::CFormat("User: {}, ID: {}") << Username << UserID;
// With format specifiers
CStr HexFormat = CStr::CFormat("{nh}") << 255; // Outputs hex without 0x prefix
// Format into existing string by concatenation
CStr Result;
Result += CStr::CFormat("Temperature: {}°C") << Temperature;// Using the _f suffix for format strings
auto Result = "User: {}, ID: {}"_f << Username << UserID;
// Works with different string types
auto WideResult = u"Value: {}"_f << 42; // UTF-16
auto UnicodeResult = U"Value: {}"_f << 42; // UTF-32
// In test paths and debugging
DMibTestPath("{}"_f << TestValue);// Generic format function
CStr Result = fg_Format("Temperature: {}°C", Temperature);
// Format with specific return type
CWStr WideResult = fg_Format<CWStr>("Value: {}", Value);
// Used with format modifiers
auto HexStr = fg_Format("Hex: {}", fg_FormatIntFormat<16>(255));
// Format integer with specific radix
auto Binary = fg_Format("Binary: {}", fg_FormatIntFormat<2>(42));// Parse with format string
CStr Input("42 3.14 Hello");
int32 IntVal;
float FloatVal;
CStr StrVal;
aint nParsed = 0;
(CStr::CParse("{} {} {}") >> IntVal >> FloatVal >> StrVal).f_Parse(Input, nParsed);
// Parse with delimiters
CStr String1, String2, String3;
(CStr::CParse("{}...{}...{}") >> String1 >> String2 >> String3).f_Parse("Test1...Test2...Test3", nParsed);
// Parse escaped strings
CStr QuotedStr;
(CStr::CParse("{se}") >> QuotedStr).f_Parse("\"Hello World\"", nParsed); // {se} = string escapedCStr UTF8String("Hello 世界");
CWStr UTF16String = UTF8String; // Automatic conversion
CUStr UnicodeString = UTF8String; // Full Unicode
// Manual conversion
CAnsiStr AnsiStr;
fg_SystemEncodeAnsiStr(UTF8String, AnsiStr, '?'); // '?' for unmappable chars// Fixed-size stack string
TCStrFixed<256> StackStr("Stack allocated");
// Secure string (wiped on destruction)
CStrSecure Password("secret");
// String span (non-owning view)
CStrSpan View(SomeString.f_GetArray(), SomeString.f_GetLen());
// Compile-time constant strings
constexpr auto& MyConstStr = gc_Str<"Compile-time constant">.m_Str; // CStr const
constexpr auto& MyWideStr = gc_Str<str_utf16("Wide string")>.m_Str; // CWStr const
constexpr auto& MyUnicodeStr = gc_Str<str_utf32("Unicode")>.m_Str; // CUStr const
// Interoperability with runtime strings
CStr RuntimeStr = gc_Str<"Hello">.m_Str; // Seamless usage
RuntimeStr += " World";
// Same gc_Str instance across translation units (singleton)
auto& ConstRef1 = gc_Str<"Same">.m_Str;
auto& ConstRef2 = gc_Str<"Same">.m_Str; // Same address as ConstRef1// Function returns EMatchWildcardResult enum
if (NStr::fg_StrMatchWildcard(Filename, "*.txt") == NStr::EMatchWildcardResult_WholeStringMatchedAndPatternExhausted) {
// Process text file
}
// Wildcard patterns:
// ? - matches single character
// * - matches zero or more characters
auto Result = NStr::fg_StrMatchWildcard("test.txt", "*.txt");
auto Result2 = NStr::fg_StrMatchWildcard("file123.doc", "file???.doc");# Build tests
MalterlibBuildShowProgress=false ./mib build Tests
# Run all string tests
/opt/Deploy/Tests/RunAllTests --paths '["Malterlib/String/*"]'
# Run specific algorithm tests
/opt/Deploy/Tests/RunAllTests --paths '["Malterlib/String/Algorithm/Compare", "Malterlib/String/Algorithm/Find"]'
# Run format/parse tests
/opt/Deploy/Tests/RunAllTests --paths '["Malterlib/String/Container/Format/*", "Malterlib/String/Container/Parse"]'Include/Mib/String/String- Main string classesInclude/Mib/String/Algorithm- Algorithm interfacesInclude/Mib/String/Formatters/*- Formatting componentsInclude/Mib/String/Parsers/*- Parsing componentsInclude/Mib/String/Implementations/*- Storage implementations
Source/Malterlib_String.h/cpp- Main string implementationSource/Malterlib_String_Container.h/cpp- Container baseSource/Malterlib_String_Types.h- Type definitions and traits
Source/Malterlib_String_Algorithm_*.h/hpp- Individual algorithmsSource/Malterlib_String_Algorithm_Common.h- Shared algorithm code
Source/Malterlib_String_Container_Imp*.h- Storage backends- Dynamic, Fixed, Virtual, Pointer implementations
Source/Malterlib_String_Container_Format_*.h- FormattersSource/Malterlib_String_Container_Parse_*.h- ParsersSource/Malterlib_String_FormatUtils.h/hpp- Format utilities
Source/Malterlib_String_Iterator_Unicode.h/hpp- Unicode iterationSource/Malterlib_String_Iterator_UTF*.h/hpp- UTF encoding iteratorsSource/Malterlib_String_UnicodeConversion.h- Conversion utilitiesSource/Malterlib_String_AnsiConversion.h- ANSI conversion
Source/Malterlib_String_FuzzyMatch.h/cpp- Fuzzy string matchingSource/Malterlib_String_MultiReplace.h/hpp- Batch replacementsSource/Malterlib_String_Appender.h/hpp- Efficient string building
- Small String Optimization (SSO) in dynamic implementation
- Copy-on-Write (COW) for efficient string copies
- Algorithm complexity:
- Find/Replace: O(n*m) worst case, optimized for common cases
- Hash functions: O(n) with good distribution
- Case conversion: O(n) with Unicode support
- Wildcard match: O(n*m) with optimization for simple patterns
- UTF-8 is default and recommended for most use cases
- UTF-16 for Windows API compatibility
- UTF-32 when direct codepoint access needed
- ANSI for legacy system compatibility
- Automatic conversion between types with potential data loss warnings
- Dynamic strings use exponential growth (typically 1.5x)
- NonTracked variants bypass memory tracking system
- Secure variants clear memory on destruction
- Fixed variants never allocate heap memory
- String objects are NOT thread-safe for modification
- Read-only access from multiple threads is safe
- Use synchronization for concurrent modifications
- Consider thread-local strings for performance
- Full Unicode 15.0 support
- Proper handling of:
- Combining characters
- Surrogate pairs (UTF-16)
- Grapheme clusters
- Normalization forms
- Case operations are Unicode-aware
- Type-safe compile-time checking
- Custom format specifiers:
{}- Default formatting{nh}- Hex without prefix (no 0x)- Custom radix via
fg_FormatIntFormat<radix>
- Three formatting approaches:
CFormat- Object-oriented, best for building complex strings_fliteral - Concise, best for inline formattingfg_Format- Functional, best for simple format operations
- Automatic type deduction
- Efficient buffer management
- Supports all basic types and custom types with format traits
- Wildcard matching doesn't support full regex (use separate Regex module)
- ANSI conversion may lose data for Unicode strings
- Fixed strings have compile-time size limits
- Some algorithms not optimized for very long strings (>1MB)
- Use
CStr(UTF-8) as default string type - Use
CStrSpanfor function parameters to avoid copies - Prefer algorithms over manual iteration
- Use secure strings for sensitive data (passwords, keys)
- Reserve capacity when final size is known
- Use MultiReplace for batch replacements (more efficient)
- Consider Fixed strings for stack allocation in performance-critical code
Compilation Error with _f Operator
The _f string formatting operator requires the NMib::NStr namespace to be in scope:
// INCORRECT - will cause compilation error
auto Result = "Value: {}"_f << 42;
// CORRECT - add using declaration before formatting
using namespace NMib::NStr;
auto Result = "Value: {}"_f << 42;Solution: Add using namespace NMib::NStr; locally before using the _f formatting operator at the top of the function. If you are in a cpp file you can put the using declaration at the top of the file.