Skip to content

Commit 25b56c2

Browse files
committed
fix(scanner): document and test ~ disambiguation vs ISTRING (issue #23)
The scanner has two rules that both start with '~': - ISTRING: ~"..." (interned string literal) - BITWISE_NOT: ~ (unary operator) In re2c, the longest-match rule resolves this unambiguously: when ~ is followed by " the ISTRING path is taken; when ~ is followed by anything else (e.g. an identifier or function call) the 1-character BITWISE_NOT rule wins. The generated DFA in scanner.c has always implemented this correctly, but the disambiguation was undocumented and had no tests. Changes: - Add an explanatory comment in scanner.re near the ISTRING definition, referencing this issue, so future maintainers understand why both rules can coexist without conflict. - Add tests/operators/issue23-bitwise-not-fcall.phpt: regression test for the original bug report - '~' before a bare function call (~umask()) and combined with binary-AND (0666 & ~umask()). - Add tests/operators/issue23-istring-regression.phpt: guard that ISTRING (~"...") still parses correctly after the fix, i.e. the '~' in front of a double-quote is not misclassified as BITWISE_NOT. Closes #23
1 parent 2e0fdee commit 25b56c2

3 files changed

Lines changed: 257 additions & 1 deletion

File tree

parser/scanner.re

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -505,7 +505,14 @@ int xx_get_token(xx_scanner_state *s, xx_scanner_token *token) {
505505
return 0;
506506
}
507507
508-
/* interned strings, allowing to instantiate strings */
508+
/* interned strings, allowing to instantiate strings.
509+
* ISTRING begins with ~" (tilde IMMEDIATELY followed by a double-quote).
510+
* The tilde-only case (~identifier, ~fcall()) is handled separately by
511+
* the "~" rule below, which emits XX_T_BITWISE_NOT. re2c resolves the
512+
* ambiguity via longest-match: when ~ is not followed by ", only the
513+
* 1-character BITWISE_NOT rule matches, so no ~ is ever misclassified as
514+
* the start of an interned string. See: github.com/zephir-lang/php-zephir-parser/issues/23
515+
*/
509516
ISTRING = ([~]["] ([\\]["]|[\\].|[\001-\377]\[\\"])* ["]);
510517
ISTRING {
511518
start++; /* ~ */
Lines changed: 171 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,171 @@
1+
--TEST--
2+
Issue #23: bitwise NOT (~) before function call and combined with binary operators
3+
--SKIPIF--
4+
<?php include(__DIR__ . '/../skipif.inc'); ?>
5+
--FILE--
6+
<?php
7+
8+
$code =<<<ZEP
9+
function test() {
10+
let a = ~umask();
11+
let b = 0666 & ~umask();
12+
}
13+
ZEP;
14+
15+
var_dump(zephir_parse_file($code, '(eval code)'));
16+
17+
?>
18+
--EXPECT--
19+
array(1) {
20+
[0]=>
21+
array(6) {
22+
["type"]=>
23+
string(8) "function"
24+
["name"]=>
25+
string(4) "test"
26+
["statements"]=>
27+
array(2) {
28+
[0]=>
29+
array(5) {
30+
["type"]=>
31+
string(3) "let"
32+
["assignments"]=>
33+
array(1) {
34+
[0]=>
35+
array(7) {
36+
["assign-type"]=>
37+
string(8) "variable"
38+
["operator"]=>
39+
string(6) "assign"
40+
["variable"]=>
41+
string(1) "a"
42+
["expr"]=>
43+
array(5) {
44+
["type"]=>
45+
string(11) "bitwise_not"
46+
["left"]=>
47+
array(6) {
48+
["type"]=>
49+
string(5) "fcall"
50+
["name"]=>
51+
string(5) "umask"
52+
["call-type"]=>
53+
int(1)
54+
["file"]=>
55+
string(11) "(eval code)"
56+
["line"]=>
57+
int(2)
58+
["char"]=>
59+
int(21)
60+
}
61+
["file"]=>
62+
string(11) "(eval code)"
63+
["line"]=>
64+
int(2)
65+
["char"]=>
66+
int(21)
67+
}
68+
["file"]=>
69+
string(11) "(eval code)"
70+
["line"]=>
71+
int(2)
72+
["char"]=>
73+
int(21)
74+
}
75+
}
76+
["file"]=>
77+
string(11) "(eval code)"
78+
["line"]=>
79+
int(3)
80+
["char"]=>
81+
int(7)
82+
}
83+
[1]=>
84+
array(5) {
85+
["type"]=>
86+
string(3) "let"
87+
["assignments"]=>
88+
array(1) {
89+
[0]=>
90+
array(7) {
91+
["assign-type"]=>
92+
string(8) "variable"
93+
["operator"]=>
94+
string(6) "assign"
95+
["variable"]=>
96+
string(1) "b"
97+
["expr"]=>
98+
array(6) {
99+
["type"]=>
100+
string(11) "bitwise_and"
101+
["left"]=>
102+
array(5) {
103+
["type"]=>
104+
string(3) "int"
105+
["value"]=>
106+
string(4) "0666"
107+
["file"]=>
108+
string(11) "(eval code)"
109+
["line"]=>
110+
int(3)
111+
["char"]=>
112+
int(18)
113+
}
114+
["right"]=>
115+
array(5) {
116+
["type"]=>
117+
string(11) "bitwise_not"
118+
["left"]=>
119+
array(6) {
120+
["type"]=>
121+
string(5) "fcall"
122+
["name"]=>
123+
string(5) "umask"
124+
["call-type"]=>
125+
int(1)
126+
["file"]=>
127+
string(11) "(eval code)"
128+
["line"]=>
129+
int(3)
130+
["char"]=>
131+
int(28)
132+
}
133+
["file"]=>
134+
string(11) "(eval code)"
135+
["line"]=>
136+
int(3)
137+
["char"]=>
138+
int(28)
139+
}
140+
["file"]=>
141+
string(11) "(eval code)"
142+
["line"]=>
143+
int(3)
144+
["char"]=>
145+
int(28)
146+
}
147+
["file"]=>
148+
string(11) "(eval code)"
149+
["line"]=>
150+
int(3)
151+
["char"]=>
152+
int(28)
153+
}
154+
}
155+
["file"]=>
156+
string(11) "(eval code)"
157+
["line"]=>
158+
int(4)
159+
["char"]=>
160+
int(1)
161+
}
162+
}
163+
["file"]=>
164+
string(11) "(eval code)"
165+
["line"]=>
166+
int(1)
167+
["char"]=>
168+
int(9)
169+
}
170+
}
171+
Lines changed: 78 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,78 @@
1+
--TEST--
2+
Issue #23: ISTRING (~"...") still scanned correctly after bitwise-NOT fix
3+
--SKIPIF--
4+
<?php include(__DIR__ . '/../skipif.inc'); ?>
5+
--FILE--
6+
<?php
7+
8+
$code =<<<ZEP
9+
function test() {
10+
let a = ~"hello";
11+
}
12+
ZEP;
13+
14+
var_dump(zephir_parse_file($code, '(eval code)'));
15+
16+
?>
17+
--EXPECT--
18+
array(1) {
19+
[0]=>
20+
array(6) {
21+
["type"]=>
22+
string(8) "function"
23+
["name"]=>
24+
string(4) "test"
25+
["statements"]=>
26+
array(1) {
27+
[0]=>
28+
array(5) {
29+
["type"]=>
30+
string(3) "let"
31+
["assignments"]=>
32+
array(1) {
33+
[0]=>
34+
array(7) {
35+
["assign-type"]=>
36+
string(8) "variable"
37+
["operator"]=>
38+
string(6) "assign"
39+
["variable"]=>
40+
string(1) "a"
41+
["expr"]=>
42+
array(5) {
43+
["type"]=>
44+
string(7) "istring"
45+
["value"]=>
46+
string(5) "hello"
47+
["file"]=>
48+
string(11) "(eval code)"
49+
["line"]=>
50+
int(2)
51+
["char"]=>
52+
int(19)
53+
}
54+
["file"]=>
55+
string(11) "(eval code)"
56+
["line"]=>
57+
int(2)
58+
["char"]=>
59+
int(19)
60+
}
61+
}
62+
["file"]=>
63+
string(11) "(eval code)"
64+
["line"]=>
65+
int(3)
66+
["char"]=>
67+
int(1)
68+
}
69+
}
70+
["file"]=>
71+
string(11) "(eval code)"
72+
["line"]=>
73+
int(1)
74+
["char"]=>
75+
int(9)
76+
}
77+
}
78+

0 commit comments

Comments
 (0)