Skip to content

Commit c4887ad

Browse files
committed
Add XSD anyURI validator and tests
1 parent f605a9c commit c4887ad

4 files changed

Lines changed: 367 additions & 0 deletions

File tree

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -168,6 +168,7 @@ Validator | Description
168168
**isTime(str [, options])** | check if the string is a valid time e.g. [`23:01:59`, new Date().toLocaleTimeString()].<br/><br/> `options` is an object which can contain the keys `hourFormat` or `mode`.<br/><br/>`hourFormat` is a key and defaults to `'hour24'`.<br/><br/>`mode` is a key and defaults to `'default'`. <br/><br/>`hourFormat` can contain the values `'hour12'` or `'hour24'`, `'hour24'` will validate hours in 24 format and `'hour12'` will validate hours in 12 format. <br/><br/>`mode` can contain the values `'default', 'withSeconds', withOptionalSeconds`, `'default'` will validate `HH:MM` format, `'withSeconds'` will validate the `HH:MM:SS` format, `'withOptionalSeconds'` will validate `'HH:MM'` and `'HH:MM:SS'` formats.
169169
**isTaxID(str, locale)** | check if the string is a valid Tax Identification Number. Default locale is `en-US`.<br/><br/>More info about exact TIN support can be found in `src/lib/isTaxID.js`.<br/><br/>Supported locales: `[ 'bg-BG', 'cs-CZ', 'de-AT', 'de-DE', 'dk-DK', 'el-CY', 'el-GR', 'en-CA', 'en-GB', 'en-IE', 'en-US', 'es-AR', 'es-ES', 'et-EE', 'fi-FI', 'fr-BE', 'fr-CA', 'fr-FR', 'fr-LU', 'hr-HR', 'hu-HU', 'it-IT', 'lb-LU', 'lt-LT', 'lv-LV', 'mt-MT', 'nl-BE', 'nl-NL', 'pl-PL', 'pt-BR', 'pt-PT', 'ro-RO', 'sk-SK', 'sl-SI', 'sv-SE', 'uk-UA']`.
170170
**isURL(str [, options])** | check if the string is a URL.<br/><br/>`options` is an object which defaults to `{ protocols: ['http','https','ftp'], require_tld: true, require_protocol: false, require_host: true, require_port: false, require_valid_protocol: true, allow_underscores: false, host_whitelist: false, host_blacklist: false, allow_trailing_dot: false, allow_protocol_relative_urls: false, allow_fragments: true, allow_query_components: true, disallow_auth: false, validate_length: true }`.<br/><br/>`protocols` - valid protocols can be modified with this option.<br/>`require_tld` - If set to false isURL will not check if the URL's host includes a top-level domain.<br/>`require_protocol` - **RECOMMENDED** if set to true isURL will return false if protocol is not present in the URL. Without this setting, some malicious URLs cannot be distinguishable from a valid URL with authentication information.<br/>`require_host` - if set to false isURL will not check if host is present in the URL.<br/>`require_port` - if set to true isURL will check if port is present in the URL.<br/>`require_valid_protocol` - isURL will check if the URL's protocol is present in the protocols option.<br/>`allow_underscores` - if set to true, the validator will allow underscores in the URL.<br/>`host_whitelist` - if set to an array of strings or regexp, and the domain matches none of the strings defined in it, the validation fails.<br/>`host_blacklist` - if set to an array of strings or regexp, and the domain matches any of the strings defined in it, the validation fails.<br/>`allow_trailing_dot` - if set to true, the validator will allow the domain to end with a `.` character.<br/>`allow_protocol_relative_urls` - if set to true protocol relative URLs will be allowed.<br/>`allow_fragments` - if set to false isURL will return false if fragments are present.<br/>`allow_query_components` - if set to false isURL will return false if query components are present.<br/>`disallow_auth` - if set to true, the validator will fail if the URL contains an authentication component, e.g. `http://username:password@example.com`.<br/>`validate_length` - if set to false isURL will skip string length validation. `max_allowed_length` will be ignored if this is set as `false`.<br/>`max_allowed_length` - if set, isURL will not allow URLs longer than the specified value (default is 2084 that IE maximum URL length).<br/>
171+
**isXsdAnyURI(str)** | check if the string conforms to the [XML Schema `anyURI` type](https://www.w3.org/TR/xmlschema-2/#anyURI). Leading/trailing XML whitespace is collapsed before validation and any non-ASCII characters are percent-encoded via `encodeURI` semantics before the RFC 3986 rules are applied. Both absolute and relative references (including query-only or fragment-only references) are supported.
171172
**isULID(str)** | check if the string is a [ULID](https://github.com/ulid/spec).
172173
**isUUID(str [, version])** | check if the string is an RFC9562 UUID.<br/>`version` is one of `'1'`-`'8'`, `'nil'`, `'max'`, `'all'` or `'loose'`. The `'loose'` option checks if the string is a UUID-like string with hexadecimal values, ignoring RFC9565.
173174
**isVariableWidth(str)** | check if the string contains a mixture of full and half-width chars.

src/index.js

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@ import matches from './lib/matches';
88

99
import isEmail from './lib/isEmail';
1010
import isURL from './lib/isURL';
11+
import isXsdAnyURI from './lib/isXsdAnyURI';
1112
import isMACAddress from './lib/isMACAddress';
1213
import isIP from './lib/isIP';
1314
import isIPRange from './lib/isIPRange';
@@ -143,6 +144,7 @@ const validator = {
143144
matches,
144145
isEmail,
145146
isURL,
147+
isXsdAnyURI,
146148
isMACAddress,
147149
isIP,
148150
isIPRange,

src/lib/isXsdAnyURI.js

Lines changed: 301 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,301 @@
1+
import assertString from './util/assertString';
2+
import isIP from './isIP';
3+
4+
const MULTIPLE_SPACES_REGEX = / {2,}/g;
5+
const INVALID_PERCENT_REGEX = /%(?![0-9A-Fa-f]{2})/;
6+
const SCHEME_REGEX = /^[A-Za-z][A-Za-z0-9+.-]*$/;
7+
const BACKSLASH_REGEX = /\\/;
8+
const DISALLOWED_ASCII_REGEX = /["<>^`{}|]/;
9+
const OPEN_BRACKET_PLACEHOLDER = '__VALIDATOR_OPEN_BRACKET__';
10+
const CLOSE_BRACKET_PLACEHOLDER = '__VALIDATOR_CLOSE_BRACKET__';
11+
12+
const HEX_DIGIT = '[0-9A-Fa-f]';
13+
const PCT_ENCODED = `%${HEX_DIGIT}{2}`;
14+
const UNRESERVED = 'A-Za-z0-9\\-._~';
15+
const SUB_DELIMS = "!$&'()*+,;=";
16+
const PCHAR = `(?:[${UNRESERVED}]|${PCT_ENCODED}|[${SUB_DELIMS}:@])`;
17+
const SEGMENT = `(?:${PCHAR})*`;
18+
const SEGMENT_NZ = `(?:${PCHAR})+`;
19+
const SEGMENT_NZ_NC = `(?:${PCT_ENCODED}|[${UNRESERVED}${SUB_DELIMS}@])+`;
20+
21+
const PATH_ABEMPTY_REGEX = new RegExp(`^(?:/${SEGMENT})*$`);
22+
const PATH_ABSOLUTE_REGEX = new RegExp(`^/(?:${SEGMENT_NZ}(?:/${SEGMENT})*)?$`);
23+
const PATH_ROOTLESS_REGEX = new RegExp(`^${SEGMENT_NZ}(?:/${SEGMENT})*$`);
24+
const PATH_NOSCHEME_REGEX = new RegExp(`^(?:${SEGMENT_NZ_NC})(?:/${SEGMENT})*$`);
25+
const QUERY_FRAGMENT_REGEX = new RegExp(`^(?:${PCHAR}|[/?])*$`);
26+
const USERINFO_REGEX = new RegExp(`^(?:${PCT_ENCODED}|[${UNRESERVED}${SUB_DELIMS}:])*$`);
27+
const REG_NAME_REGEX = new RegExp(`^(?:${PCT_ENCODED}|[${UNRESERVED}${SUB_DELIMS}])*$`);
28+
const IPV_FUTURE_REGEX = /^v[0-9A-F]+\.[A-Za-z0-9._~!$&'()*+,;=:-]+$/i;
29+
30+
function collapseXmlWhitespace(input) {
31+
let normalized = '';
32+
33+
for (let i = 0; i < input.length; i += 1) {
34+
const code = input.charCodeAt(i);
35+
36+
if (code === 0x09 || code === 0x0a || code === 0x0d) {
37+
normalized += ' ';
38+
} else {
39+
normalized += input[i];
40+
}
41+
}
42+
43+
return normalized.replace(MULTIPLE_SPACES_REGEX, ' ').trim();
44+
}
45+
46+
function containsForbiddenControl(value) {
47+
for (let i = 0; i < value.length; i += 1) {
48+
const code = value.charCodeAt(i);
49+
50+
if (
51+
(code >= 0x00 && code <= 0x08) ||
52+
code === 0x0b ||
53+
code === 0x0c ||
54+
(code >= 0x0e && code <= 0x1f) ||
55+
code === 0x7f
56+
) {
57+
return true;
58+
}
59+
}
60+
61+
return false;
62+
}
63+
64+
function hasInvalidPercentEncoding(input) {
65+
return INVALID_PERCENT_REGEX.test(input);
66+
}
67+
68+
function isIPvFuture(address) {
69+
return IPV_FUTURE_REGEX.test(address);
70+
}
71+
72+
function isValidAuthority(authority, options) {
73+
const allowEmptyAuthority = Boolean(options && options.allowEmptyAuthority);
74+
if (authority === '') {
75+
return !!allowEmptyAuthority;
76+
}
77+
78+
let hostPort = authority;
79+
let userinfo = '';
80+
const atIndex = authority.lastIndexOf('@');
81+
82+
if (atIndex !== -1) {
83+
userinfo = authority.slice(0, atIndex);
84+
hostPort = authority.slice(atIndex + 1);
85+
86+
if (!USERINFO_REGEX.test(userinfo)) {
87+
return false;
88+
}
89+
}
90+
91+
let host = hostPort;
92+
let port = null;
93+
let hasHost = false;
94+
95+
if (hostPort.startsWith('[')) {
96+
const closingIndex = hostPort.indexOf(']');
97+
98+
if (closingIndex === -1) {
99+
return false;
100+
}
101+
102+
const address = hostPort.slice(1, closingIndex);
103+
104+
if (!isIP(address, 6) && !isIPvFuture(address)) {
105+
return false;
106+
}
107+
108+
const remainder = hostPort.slice(closingIndex + 1);
109+
110+
if (remainder) {
111+
if (!remainder.startsWith(':')) {
112+
return false;
113+
}
114+
115+
port = remainder.slice(1);
116+
}
117+
118+
host = '';
119+
hasHost = true;
120+
} else {
121+
const firstColon = hostPort.indexOf(':');
122+
const lastColon = hostPort.lastIndexOf(':');
123+
124+
if (firstColon !== lastColon) {
125+
return false;
126+
}
127+
128+
if (lastColon !== -1) {
129+
host = hostPort.slice(0, lastColon);
130+
port = hostPort.slice(lastColon + 1);
131+
}
132+
133+
if (host) {
134+
hasHost = true;
135+
136+
if (!isIP(host, 4) && !REG_NAME_REGEX.test(host)) {
137+
return false;
138+
}
139+
}
140+
}
141+
142+
if (!hasHost) {
143+
return false;
144+
}
145+
146+
if (port !== null) {
147+
if (port === '' || !/^[0-9]+$/.test(port)) {
148+
return false;
149+
}
150+
151+
const portNumber = parseInt(port, 10);
152+
153+
if (Number.isNaN(portNumber) || portNumber > 65535) {
154+
return false;
155+
}
156+
}
157+
158+
return true;
159+
}
160+
161+
function isValidPath(path, { hasAuthority, hasScheme }) {
162+
if (hasAuthority) {
163+
return PATH_ABEMPTY_REGEX.test(path);
164+
}
165+
166+
if (hasScheme) {
167+
if (path === '') {
168+
return true;
169+
}
170+
171+
if (path.startsWith('/')) {
172+
return PATH_ABSOLUTE_REGEX.test(path);
173+
}
174+
175+
return PATH_ROOTLESS_REGEX.test(path);
176+
}
177+
178+
if (path === '') {
179+
return true;
180+
}
181+
182+
if (path.startsWith('/')) {
183+
return PATH_ABSOLUTE_REGEX.test(path);
184+
}
185+
186+
return PATH_NOSCHEME_REGEX.test(path);
187+
}
188+
189+
function isValidQueryOrFragment(value) {
190+
return value === '' || QUERY_FRAGMENT_REGEX.test(value);
191+
}
192+
193+
function isValidUriReference(value) {
194+
let rest = value;
195+
let scheme = null;
196+
let hadScheme = false;
197+
198+
const colonIndex = rest.indexOf(':');
199+
200+
if (colonIndex > 0) {
201+
const potentialScheme = rest.slice(0, colonIndex);
202+
203+
if (SCHEME_REGEX.test(potentialScheme)) {
204+
scheme = potentialScheme;
205+
hadScheme = true;
206+
rest = rest.slice(colonIndex + 1);
207+
}
208+
}
209+
210+
let fragment = '';
211+
const hashIndex = rest.indexOf('#');
212+
213+
if (hashIndex !== -1) {
214+
fragment = rest.slice(hashIndex + 1);
215+
rest = rest.slice(0, hashIndex);
216+
217+
if (!isValidQueryOrFragment(fragment)) {
218+
return false;
219+
}
220+
}
221+
222+
let query = '';
223+
const questionIndex = rest.indexOf('?');
224+
225+
if (questionIndex !== -1) {
226+
query = rest.slice(questionIndex + 1);
227+
rest = rest.slice(0, questionIndex);
228+
229+
if (!isValidQueryOrFragment(query)) {
230+
return false;
231+
}
232+
}
233+
234+
let hasAuthority = false;
235+
let authority = '';
236+
let path = rest;
237+
238+
if (rest.startsWith('//')) {
239+
hasAuthority = true;
240+
rest = rest.slice(2);
241+
const nextSlash = rest.indexOf('/');
242+
243+
if (nextSlash === -1) {
244+
authority = rest;
245+
path = '';
246+
} else {
247+
authority = rest.slice(0, nextSlash);
248+
path = rest.slice(nextSlash);
249+
}
250+
251+
const allowEmptyAuthority = Boolean(hadScheme && scheme && scheme.toLowerCase() === 'file');
252+
const authorityOptions = allowEmptyAuthority
253+
? { allowEmptyAuthority: true }
254+
: undefined;
255+
256+
if (!isValidAuthority(authority, authorityOptions)) {
257+
return false;
258+
}
259+
}
260+
261+
return isValidPath(path, { hasAuthority, hasScheme: hadScheme });
262+
}
263+
264+
export default function isXsdAnyURI(input) {
265+
assertString(input);
266+
267+
let value = collapseXmlWhitespace(input);
268+
269+
if (value === '') {
270+
return true;
271+
}
272+
273+
if (
274+
containsForbiddenControl(value) ||
275+
hasInvalidPercentEncoding(value) ||
276+
BACKSLASH_REGEX.test(value) ||
277+
DISALLOWED_ASCII_REGEX.test(value)
278+
) {
279+
return false;
280+
}
281+
282+
let encoded;
283+
284+
try {
285+
const bracketSafeValue = value
286+
.replace(/\[/g, OPEN_BRACKET_PLACEHOLDER)
287+
.replace(/\]/g, CLOSE_BRACKET_PLACEHOLDER);
288+
289+
const encodedWithPlaceholders = encodeURI(bracketSafeValue);
290+
291+
encoded = encodedWithPlaceholders
292+
.split(OPEN_BRACKET_PLACEHOLDER)
293+
.join('[')
294+
.split(CLOSE_BRACKET_PLACEHOLDER)
295+
.join(']');
296+
} catch (err) {
297+
return false;
298+
}
299+
300+
return isValidUriReference(encoded);
301+
}

test/validators.test.js

Lines changed: 63 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1017,6 +1017,69 @@ describe('Validators', () => {
10171017
});
10181018
});
10191019

1020+
it('should validate XML Schema AnyURI values', () => {
1021+
test({
1022+
validator: 'isXsdAnyURI',
1023+
valid: [
1024+
'http://example.com',
1025+
'https://example.com:8080/path?query=1#frag',
1026+
'mailto:user@example.com',
1027+
'urn:isbn:0451450523',
1028+
'data:text/plain;charset=utf-8,Hello%20World',
1029+
'../relative/path',
1030+
'/absolute/path',
1031+
'//cdn.example.com/libs.js',
1032+
'#fragment-only',
1033+
'?queryOnly=true',
1034+
'file:///C:/Program%20Files/MyApp/app.exe',
1035+
'http://[2001:db8::1]:443/path',
1036+
'http://[v7.fe80::abcd]/resource',
1037+
'https://user:pa%20ss@example.com:8443/resource',
1038+
' https://example.com/with-space ',
1039+
' \t\nhttps://example.com/resource\r\n',
1040+
'foo%20bar/baz',
1041+
'tel:+123456789',
1042+
'foo:',
1043+
'foo:/bar',
1044+
'file:///var/log',
1045+
'http://[2001:db8::1]:1234',
1046+
'',
1047+
'file:///',
1048+
'//example.com/path#frag',
1049+
],
1050+
invalid: [
1051+
'http://example.com:99999',
1052+
'http://example.com:port',
1053+
'http://example.com:-1',
1054+
'http://[::1',
1055+
'http://example.com#frag#extra',
1056+
'foo%zz',
1057+
'foo%2',
1058+
'http://user@:8080',
1059+
'http://user[info@example.com',
1060+
'\\server\\share',
1061+
'http://example.com/pa|th',
1062+
'http://example.com/path\u0006',
1063+
'//:8080/path',
1064+
'http:///path',
1065+
'file://user@',
1066+
'http://example.com/%',
1067+
'foo#frag%2',
1068+
'http://example.com/%ZZ',
1069+
'http://example.com/?q=abc^123',
1070+
'http://example.com?foo[bar',
1071+
'foo://?query',
1072+
'foo%2/bar',
1073+
'http://[::g]/path',
1074+
'http://[::1]foo',
1075+
'http://host:80:123/path',
1076+
'http://exa[mple.com',
1077+
'http://example.com/\ud800',
1078+
'foo<bar',
1079+
],
1080+
});
1081+
});
1082+
10201083
it('should validate MAC addresses', () => {
10211084
test({
10221085
validator: 'isMACAddress',

0 commit comments

Comments
 (0)