Skip to content

Latest commit

 

History

History
128 lines (79 loc) · 3.93 KB

File metadata and controls

128 lines (79 loc) · 3.93 KB

hammingDistanceCodePoints

Calculate the Hamming distance between two equal-length strings by comparing Unicode code points.

Usage

var hammingDistanceCodePoints = require( '@stdlib/string/base/distances/hamming-code-points' );

hammingDistanceCodePoints( s1, s2 )

Calculates the Hamming distance between two equal-length strings by comparing Unicode code points.

var dist = hammingDistanceCodePoints( 'frog', 'from' );
// returns 1

dist = hammingDistanceCodePoints( 'tooth', 'froth' );
// returns 2

dist = hammingDistanceCodePoints( 'cat', 'cot' );
// returns 1

dist = hammingDistanceCodePoints( '', '' );
// returns 0

// Emoji are treated as single Unicode code points:
dist = hammingDistanceCodePoints( '👋', '🌍' );
// returns 1

dist = hammingDistanceCodePoints( 'a👋b', 'c🌍d' );
// returns 3

Notes

  • If the two strings differ in the number of Unicode code points, the Hamming distance is not defined. Consequently, when provided two input strings with an unequal number of Unicode code points, the function returns a sentinel value of -1.
  • Unlike the UTF-16 code unit implementation in @stdlib/string/base/distances/hamming, this function iterates over Unicode code points rather than UTF-16 code units. This means surrogate pairs (used to encode characters outside the Basic Multilingual Plane, such as most emoji) are treated as a single unit of comparison. For example, the emoji '👋' (U+1F44B) is encoded as a UTF-16 surrogate pair \uD83D\uDC4B and has a String.length of 2, but this function treats it as a single code point.
  • The function is not grapheme-cluster aware. Characters composed of multiple Unicode code points (e.g., family emoji built from multiple code points joined by Zero Width Joiners, or letters with combining diacritical marks) are treated as multiple code points.

Examples

var hammingDistanceCodePoints = require( '@stdlib/string/base/distances/hamming-code-points' );

var dist = hammingDistanceCodePoints( 'algorithms', 'altruistic' );
// returns 7

dist = hammingDistanceCodePoints( 'elephant', 'hippopod' );
// returns 7

dist = hammingDistanceCodePoints( 'javascript', 'typescript' );
// returns 4

dist = hammingDistanceCodePoints( 'hamming', 'ladybug' );
// returns 5

// Emoji strings (each emoji = 1 Unicode code point):
dist = hammingDistanceCodePoints( '👋🌍🎉', '🌟💫✨' );
// returns 3

// Mixed ASCII and emoji:
dist = hammingDistanceCodePoints( 'hello👋', 'hallo🌍' );
// returns 2