Skip to content

Commit fe19519

Browse files
committed
update docs, parse integer arrays for benchmark
Do a benchmark that actually uses a JsonParser to parse arrays of integers, rather than just repeatedly parsing a number outside of a JsonParser.
1 parent 99b60f8 commit fe19519

7 files changed

Lines changed: 127 additions & 77 deletions

File tree

CHANGELOG.md

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,6 @@ and this project adheres to [Semantic Versioning](http://semver.org/).
3636

3737
### To Be Fixed
3838

39-
- Consider adding bounds checks for integer addition and subtraction in RemesPath, and coercing any results to doubles if they would cause 64-bit int overflow.
4039
- Make sure there aren't any easily-triggered race conditions induced by [automatic parsing and validation after editing](/docs/README.md#automatically-check-for-errors-after-editing).
4140
- In 6.1.1.18, there is no longer a global shared JsonParser, which was the main potential source of race conditions.
4241
- Fix issue where pretty-printing or compressing causes tree view position tracking to be out of sync with the document until a query is issued or the `Refresh` button is hit.

JsonToolsNppPlugin/Properties/AssemblyInfo.cs

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -28,5 +28,5 @@
2828
// Build Number
2929
// Revision
3030
//
31-
[assembly: AssemblyVersion("8.4.0.1")]
32-
[assembly: AssemblyFileVersion("8.4.0.1")]
31+
[assembly: AssemblyVersion("8.4.0.2")]
32+
[assembly: AssemblyFileVersion("8.4.0.2")]

JsonToolsNppPlugin/Tests/Benchmarker.cs

Lines changed: 55 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -348,6 +348,61 @@ private static string GenerateRandomIntegerStr(bool isHex)
348348
return new string(chars);
349349
}
350350

351+
public static bool BenchmarkParseIntegerArray(int numInts = 256, int numTrials = 100)
352+
{
353+
var watch = new Stopwatch();
354+
var parser = new JsonParser();
355+
foreach (bool greaterThanUintMax in new[] { true, false })
356+
{
357+
var ticksEachTrial = new long[numTrials];
358+
string previewStr = "";
359+
for (int ii = 0; ii < numTrials; ii++)
360+
{
361+
var sb = new StringBuilder();
362+
sb.Append('[');
363+
for (int jj = 0; jj < numInts; jj++)
364+
{
365+
long val = RandomJsonFromSchema.random.Next(int.MaxValue);
366+
// The internal implementation of BigIntegers uses an array to store extra bits
367+
// of any number with value greater than uint.MaxValue.
368+
// This array is only initialized if the number is greater than uint.MaxValue,
369+
// so this test is bifurcated to see if big integers are significantly slower to parse
370+
// than small integers.
371+
if (greaterThanUintMax)
372+
val <<= 32;
373+
if (RandomJsonFromSchema.random.Next(2) == 1)
374+
sb.Append('-');
375+
sb.Append(val);
376+
sb.Append(',');
377+
if (ii == 0 && jj == 7 && !greaterThanUintMax)
378+
previewStr = sb.ToString() + "...";
379+
}
380+
sb.Remove(sb.Length - 1, 1);
381+
sb.Append(']');
382+
var s = sb.ToString();
383+
watch.Reset();
384+
watch.Start();
385+
try
386+
{
387+
var json = parser.Parse(s);
388+
}
389+
catch (Exception ex)
390+
{
391+
Npp.AddLine($"While parsing an array of integers, got exception {ex}");
392+
return true;
393+
}
394+
watch.Stop();
395+
ticksEachTrial[ii] = watch.ElapsedTicks;
396+
}
397+
var description = "integers with absolute value " + (greaterThanUintMax ? "greater" : "less") + " than 2**32 - 1";
398+
(double mean, double sd) = GetMeanAndSd(ticksEachTrial);
399+
Npp.AddLine($"In {numTrials} trials, it took {ConvertTicks(mean, "mus", 0)} +/- {ConvertTicks(sd, "mus", 0)} μs to parse an array of {numInts} {description}.");
400+
if (previewStr.Length > 0)
401+
Npp.AddLine("Here is a preview of a representative example of the integer arrays parsed: " + previewStr);
402+
}
403+
return false;
404+
}
405+
351406
public static bool BenchmarkBigIntegerVersusLongParse(int numInts = 50, int numHexInts = 25, int trialsPerInt = 60)
352407
{
353408
var watch = new Stopwatch();

JsonToolsNppPlugin/Tests/TestRunner.cs

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -113,8 +113,8 @@ public static async Task RunAll()
113113
"JsonParser performance",
114114
true, false
115115
),
116-
(() => Benchmarker.BenchmarkBigIntegerVersusLongParse(),
117-
"performance of parsing longs versus BigIntegers",
116+
(() => Benchmarker.BenchmarkParseIntegerArray(),
117+
"performance of parsing an array of integers",
118118
false, false),
119119
(() => Benchmarker.BenchmarkAndFuzzParseAndFormatDoubles(40, 5500),
120120
"performance and correctness of parsing and dumping arrays of non-integer numbers",

docs/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -83,7 +83,7 @@ You'll notice that icons appear next to the nodes in the tree. They are as follo
8383
* <span style="color:blue">Blue</span> square braces: __array__
8484
* <span style="color:green">Green</span> curly braces: __object__
8585
* ☯️ (yin-yang, half-black, half-white circle): __boolean__
86-
* <span style="color:red">123</span>: __integer__ (represented by 64-bit integer)
86+
* <span style="color:red">123</span>: __integer__ (*NOTE*: these can be arbitrarily large, but prior to [v9.0.0](/CHANGELOG.md#900---unreleased-yyyy-mm-dd), these were 64-bit integers)
8787
* <span style="color:red">-3.5</span>: __float__ (represented by 64-bit floating point number)
8888
* abc: __string__
8989
* <span style="color:grey">grey</span> square: __null__

docs/RemesPath.md

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -117,7 +117,7 @@ The `not` operator introduced in [5.4.0](/CHANGELOG.md#540---2023-07-04) (which
117117

118118
### WARNING about arithmetic with integers ###
119119

120-
Since JsonTools stores integers as 64-bit integers, overflow and underflow are possible when doing arithmetic. For example, `int(4e18)*3` incorrectly returns `-6446744073709551616` because the true result would be greater than `2**63 - 1`, the largest 64-bit integer. Use caution when doing arithmetic on very large integers.
120+
*Prior to [v9.0.0](/CHANGELOG.md#900---unreleased-yyyy-mm-dd)*, JsonTools stored integers as 64-bit integers, overflow and underflow are possible when doing arithmetic. For example, `int(4e18)*3` incorrectly returns `-6446744073709551616` because the true result would be greater than `2**63 - 1`, the largest 64-bit integer. Use caution when doing arithmetic on very large integers.
121121

122122
### Truthiness ###
123123

@@ -793,8 +793,8 @@ __Note:__
793793
----
794794
`int(x: number | string) -> int`
795795

796-
* If x is a boolean or integer: returns a 64-bit integer equal to x.
797-
* If x is a float: returns the closest 64-bit integer to x.
796+
* If x is a boolean or integer: returns an integer equal to x.
797+
* If x is a float: returns the closest integer to x.
798798
* Note that this is *NOT* the same as the Python `int` function, because __if x is halfway between two integers, the nearest *even integer* is returned.__
799799
* If x is a __*decimal* string representation of an integer__: returns the integer that is represented. *This means hex numbers can't be parsed by this function, and you should use `num` below instead for that.*
800800

@@ -887,7 +887,7 @@ The query `parse(@)` will return
887887

888888
`x` must be an integer or a floating-point number, *not* a boolean.
889889

890-
* If sigfigs is 0: Returns the closest 64-bit integer to `x`.
890+
* If sigfigs is 0: Returns the closest integer to `x`.
891891
* If sigfigs > 0: Returns the closest 64-bit floating-point number to `x` rounded to `sigfigs` decimal places.
892892

893893
----
@@ -989,18 +989,18 @@ The fourth argument and any subsequent argument must all be the number of a capt
989989

990990
__SPECIAL NOTES FOR `s_fa`:__
991991
1. *`s_fa` treats `^` as the beginning of a line and `$` as the end of a line*, but elsewhere in JsonTools (prior to [v7.0](/CHANGELOG.md#700---2024-02-09)) `^` matches only the beginning of the string and `$` matches only the end of the string.
992-
2. Every instance of `(INT)` in `pat` will be replaced by a regex that captures a decimal number or (a hex integer preceded by `0x`), optionally preceded by a `+` or `-`. A noncapturing regex that matches the same thing is available through `(?:INT)`.
993-
3. Every instance of `(NUMBER)` in `pat` will be replaced by a regex that captures a decimal floating point number or (a hex integer preceded by `0x`). A noncapturing regex that matches the same thing is available through `(?:NUMBER)`. *Neither `(NUMBER)` nor `(?:NUMBER)` matches `NaN` or `Infinity`, but those can be parsed if desired.*
992+
2. Every instance of `(INT)` in `pat` will be replaced by a regex that captures a decimal number or (a 64-bit hex integer preceded by `0x`), optionally preceded by a `+` or `-`. A noncapturing regex that matches the same thing is available through `(?:INT)`.
993+
3. Every instance of `(NUMBER)` in `pat` will be replaced by a regex that captures a decimal floating point number or (a 64-bit hex integer preceded by `0x`). A noncapturing regex that matches the same thing is available through `(?:NUMBER)`. *Neither `(NUMBER)` nor `(?:NUMBER)` matches `NaN` or `Infinity`, but those can be parsed if desired.*
994994
4. *`s_fa` may be very slow if `pat` is a function of input,* because the above described regex transformations need to be applied every time the function is called instead of just once at compile time.
995995

996996
__Examples:__
997997
1. ``s_fa(`1 -1 +2 -0xF +0x1a 0x2B`, `(INT)`)`` will return `["1", "-1", "+2", "-0xF", "+0x1a", "0x2B"]`
998-
2. ``s_fa(`1 -1 +2 -0xF +0x1a 0x2B 0x10000000000000000`, `(?:INT)`,false, 0)`` will return `[1, -1, 2, -15, 26, 43, "0x10000000000000000"]` because passing `0` as the fourth arg caused all the match results to be parsed as integers, except `0x10000000000000000`, which stayed as a string because its numeric value was too big for the 64-bit integers used in JsonTools.
998+
2. ``s_fa(`1 -1 +2 -0xF +0x1a 0x2B 0x10000000000000000`, `(?:INT)`,false, 0)`` will return `[1, -1, 2, -15, 26, 43, "0x10000000000000000"]` because passing `0` as the fourth arg caused all the match results to be parsed as integers, except `0x10000000000000000`, which stayed as a string because __only 64-bit hex integers can be parsed in this way.__
999999
3. ``s_fa(`a 1.5 1\r\nb -3e4 2\r\nc -.2 6`, `^(\w+) (NUMBER) (INT)\r?$`,false, 1)`` will return `[["a",1.5,"1"],["b",-30000.0,"2"],["c",-0.2,"6"]]`. Note that the second column but not the third will be parsed as a number, because only `1` was passed in as the number of a capture group to parse as a number.
10001000
4. ``s_fa(`a 1.5 1\r\nb -3e4 2\r\nc -.2 6`, `^(\w+) (NUMBER) (INT)\r?$`,false, -2, 2)`` will return `[["a",1.5,1],["b",-30000.0,2],["c",-0.2,6]]`. This time the same input is parsed with numbers in the second-to-last and third columns because `-2` and `2` were passed as optional args.
10011001
5. ``s_fa(`a 1.5 1\r\nb -3e4 2\r\nc -.2 6`, `^(\w+) (?:NUMBER) (INT)\r?$`,false, 1)`` will return `[["a",1],["b",2],["c",6]]`. This time the same input is parsed with only two columns, because we used a noncapturing version of the number-matching regex.
1002-
6. 1. ``s_fa(`a1 b+2 c-0xF d+0x1a`, `[a-z](INT)`, true, 1)`` will return `[["a1",1],["b+2",2],["c-0xF",-15],["d+0x1a",26]]` because the third argument is `true` and there is one capture group, meaning that the matches will be represented as two-element subarrays, with the first element being the full text of the match, and the second element being the captured integer parsed as a number.
1003-
7. 1. ``s_fa(`a1 b+2 c-0xF d+0x1a`, `[a-z](?:INT)`, true)`` will return `["a1","b+2","c-0xF","d+0x1a"]` because the third argument is `true` but there are no capture groups, so an array of strings is returned instead of 1-element subarrays.
1002+
6. ``s_fa(`a1 b+2 c-0xF d+0x1a`, `[a-z](INT)`, true, 1)`` will return `[["a1",1],["b+2",2],["c-0xF",-15],["d+0x1a",26]]` because the third argument is `true` and there is one capture group, meaning that the matches will be represented as two-element subarrays, with the first element being the full text of the match, and the second element being the captured integer parsed as a number.
1003+
7. ``s_fa(`a1 b+2 c-0xF d+0x1a`, `[a-z](?:INT)`, true)`` will return `["a1","b+2","c-0xF","d+0x1a"]` because the third argument is `true` but there are no capture groups, so an array of strings is returned instead of 1-element subarrays.
10041004

10051005
----
10061006
`s_find(x: string, sub: regex | string) -> array[string]`

0 commit comments

Comments
 (0)