Skip to content

Commit f82c1fc

Browse files
committed
Update regex HOWTO for re.prefixmatch
1 parent 289f19a commit f82c1fc

File tree

1 file changed

+41
-40
lines changed

1 file changed

+41
-40
lines changed

Doc/howto/regex.rst

Lines changed: 41 additions & 40 deletions
Original file line numberDiff line numberDiff line change
@@ -362,20 +362,20 @@ for a complete listing.
362362
+------------------+-----------------------------------------------+
363363
| Method/Attribute | Purpose |
364364
+==================+===============================================+
365-
| ``match()`` | Determine if the RE matches at the beginning |
366-
| | of the string. |
367-
+------------------+-----------------------------------------------+
368365
| ``search()`` | Scan through a string, looking for any |
369366
| | location where this RE matches. |
370367
+------------------+-----------------------------------------------+
368+
| ``prefixmatch()``| Determine if the RE matches at the beginning |
369+
| | of the string. |
370+
+------------------+-----------------------------------------------+
371371
| ``findall()`` | Find all substrings where the RE matches, and |
372372
| | returns them as a list. |
373373
+------------------+-----------------------------------------------+
374374
| ``finditer()`` | Find all substrings where the RE matches, and |
375375
| | returns them as an :term:`iterator`. |
376376
+------------------+-----------------------------------------------+
377377

378-
:meth:`~re.Pattern.match` and :meth:`~re.Pattern.search` return ``None`` if no match can be found. If
378+
:meth:`~re.Pattern.search` and :meth:`~re.Pattern.prefixmatch` return ``None`` if no match can be found. If
379379
they're successful, a :ref:`match object <match-objects>` instance is returned,
380380
containing information about the match: where it starts and ends, the substring
381381
it matched, and more.
@@ -391,21 +391,21 @@ Python interpreter, import the :mod:`re` module, and compile a RE::
391391
>>> p
392392
re.compile('[a-z]+')
393393

394-
Now, you can try matching various strings against the RE ``[a-z]+``. An empty
394+
Now, you can try searching various strings against the RE ``[a-z]+``. An empty
395395
string shouldn't match at all, since ``+`` means 'one or more repetitions'.
396-
:meth:`~re.Pattern.match` should return ``None`` in this case, which will cause the
396+
:meth:`~re.Pattern.search` should return ``None`` in this case, which will cause the
397397
interpreter to print no output. You can explicitly print the result of
398-
:meth:`!match` to make this clear. ::
398+
:meth:`!search` to make this clear. ::
399399

400-
>>> p.match("")
401-
>>> print(p.match(""))
400+
>>> p.search("")
401+
>>> print(p.search(""))
402402
None
403403

404404
Now, let's try it on a string that it should match, such as ``tempo``. In this
405-
case, :meth:`~re.Pattern.match` will return a :ref:`match object <match-objects>`, so you
405+
case, :meth:`~re.Pattern.search` will return a :ref:`match object <match-objects>`, so you
406406
should store the result in a variable for later use. ::
407407

408-
>>> m = p.match('tempo')
408+
>>> m = p.search('tempo')
409409
>>> m
410410
<re.Match object; span=(0, 5), match='tempo'>
411411

@@ -437,27 +437,28 @@ Trying these methods will soon clarify their meaning::
437437

438438
:meth:`~re.Match.group` returns the substring that was matched by the RE. :meth:`~re.Match.start`
439439
and :meth:`~re.Match.end` return the starting and ending index of the match. :meth:`~re.Match.span`
440-
returns both start and end indexes in a single tuple. Since the :meth:`~re.Pattern.match`
441-
method only checks if the RE matches at the start of a string, :meth:`!start`
442-
will always be zero. However, the :meth:`~re.Pattern.search` method of patterns
443-
scans through the string, so the match may not start at zero in that
444-
case. ::
440+
returns both start and end indexes in a single tuple.
441+
The :meth:`~re.Pattern.search` method of patterns
442+
scans through the string, so the match may not start at zero.
443+
However, the :meth:`~re.Pattern.prefixmatch`
444+
method only checks if the RE matches at the start of a string, so :meth:`!start`
445+
will always be zero in that case. ::
445446

446-
>>> print(p.match('::: message'))
447-
None
448447
>>> m = p.search('::: message'); print(m)
449448
<re.Match object; span=(4, 11), match='message'>
450449
>>> m.group()
451450
'message'
452451
>>> m.span()
453452
(4, 11)
453+
>>> print(p.prefixmatch('::: message'))
454+
None
454455

455456
In actual programs, the most common style is to store the
456457
:ref:`match object <match-objects>` in a variable, and then check if it was
457458
``None``. This usually looks like::
458459

459460
p = re.compile( ... )
460-
m = p.match( 'string goes here' )
461+
m = p.search( 'string goes here' )
461462
if m:
462463
print('Match found: ', m.group())
463464
else:
@@ -495,15 +496,15 @@ Module-Level Functions
495496
----------------------
496497

497498
You don't have to create a pattern object and call its methods; the
498-
:mod:`re` module also provides top-level functions called :func:`~re.match`,
499-
:func:`~re.search`, :func:`~re.findall`, :func:`~re.sub`, and so forth. These functions
499+
:mod:`re` module also provides top-level functions called :func:`~re.search`,
500+
:func:`~re.prefixmatch`, :func:`~re.findall`, :func:`~re.sub`, and so forth. These functions
500501
take the same arguments as the corresponding pattern method with
501502
the RE string added as the first argument, and still return either ``None`` or a
502503
:ref:`match object <match-objects>` instance. ::
503504

504-
>>> print(re.match(r'From\s+', 'Fromage amk'))
505+
>>> print(re.prefixmatch(r'From\s+', 'Fromage amk'))
505506
None
506-
>>> re.match(r'From\s+', 'From amk Thu May 14 19:12:10 1998') #doctest: +ELLIPSIS
507+
>>> re.prefixmatch(r'From\s+', 'From amk Thu May 14 19:12:10 1998') #doctest: +ELLIPSIS
507508
<re.Match object; span=(0, 5), match='From '>
508509

509510
Under the hood, these functions simply create a pattern object for you
@@ -812,7 +813,7 @@ of a group with a quantifier, such as ``*``, ``+``, ``?``, or
812813
``ab``. ::
813814

814815
>>> p = re.compile('(ab)*')
815-
>>> print(p.match('ababababab').span())
816+
>>> print(p.search('ababababab').span())
816817
(0, 10)
817818

818819
Groups indicated with ``'('``, ``')'`` also capture the starting and ending
@@ -825,7 +826,7 @@ argument. Later we'll see how to express groups that don't capture the span
825826
of text that they match. ::
826827

827828
>>> p = re.compile('(a)b')
828-
>>> m = p.match('ab')
829+
>>> m = p.search('ab')
829830
>>> m.group()
830831
'ab'
831832
>>> m.group(0)
@@ -836,7 +837,7 @@ to determine the number, just count the opening parenthesis characters, going
836837
from left to right. ::
837838

838839
>>> p = re.compile('(a(b)c)d')
839-
>>> m = p.match('abcd')
840+
>>> m = p.search('abcd')
840841
>>> m.group(0)
841842
'abcd'
842843
>>> m.group(1)
@@ -912,10 +913,10 @@ but aren't interested in retrieving the group's contents. You can make this fact
912913
explicit by using a non-capturing group: ``(?:...)``, where you can replace the
913914
``...`` with any other regular expression. ::
914915

915-
>>> m = re.match("([abc])+", "abc")
916+
>>> m = re.search("([abc])+", "abc")
916917
>>> m.groups()
917918
('c',)
918-
>>> m = re.match("(?:[abc])+", "abc")
919+
>>> m = re.search("(?:[abc])+", "abc")
919920
>>> m.groups()
920921
()
921922

@@ -949,7 +950,7 @@ given numbers, so you can retrieve information about a group in two ways::
949950
Additionally, you can retrieve named groups as a dictionary with
950951
:meth:`~re.Match.groupdict`::
951952

952-
>>> m = re.match(r'(?P<first>\w+) (?P<last>\w+)', 'Jane Doe')
953+
>>> m = re.search(r'(?P<first>\w+) (?P<last>\w+)', 'Jane Doe')
953954
>>> m.groupdict()
954955
{'first': 'Jane', 'last': 'Doe'}
955956

@@ -1274,18 +1275,18 @@ In short, before turning to the :mod:`re` module, consider whether your problem
12741275
can be solved with a faster and simpler string method.
12751276

12761277

1277-
match() versus search()
1278-
-----------------------
1278+
prefixmatch() versus search()
1279+
-----------------------------
12791280

1280-
The :func:`~re.match` function only checks if the RE matches at the beginning of the
1281+
The :func:`~re.prefixmatch` function only checks if the RE matches at the beginning of the
12811282
string while :func:`~re.search` will scan forward through the string for a match.
1282-
It's important to keep this distinction in mind. Remember, :func:`!match` will
1283+
It's important to keep this distinction in mind. Remember, :func:`!prefixmatch` will
12831284
only report a successful match which will start at 0; if the match wouldn't
1284-
start at zero, :func:`!match` will *not* report it. ::
1285+
start at zero, :func:`!prefixmatch` will *not* report it. ::
12851286

1286-
>>> print(re.match('super', 'superstition').span())
1287+
>>> print(re.prefixmatch('super', 'superstition').span())
12871288
(0, 5)
1288-
>>> print(re.match('super', 'insuperable'))
1289+
>>> print(re.prefixmatch('super', 'insuperable'))
12891290
None
12901291

12911292
On the other hand, :func:`~re.search` will scan forward through the string,
@@ -1296,7 +1297,7 @@ reporting the first match it finds. ::
12961297
>>> print(re.search('super', 'insuperable').span())
12971298
(2, 7)
12981299

1299-
Sometimes you'll be tempted to keep using :func:`re.match`, and just add ``.*``
1300+
Sometimes you'll be tempted to keep using :func:`re.prefixmatch`, and just add ``.*``
13001301
to the front of your RE. Resist this temptation and use :func:`re.search`
13011302
instead. The regular expression compiler does some analysis of REs in order to
13021303
speed up the process of looking for a match. One such analysis figures out what
@@ -1322,9 +1323,9 @@ doesn't work because of the greedy nature of ``.*``. ::
13221323
>>> s = '<html><head><title>Title</title>'
13231324
>>> len(s)
13241325
32
1325-
>>> print(re.match('<.*>', s).span())
1326+
>>> print(re.prefixmatch('<.*>', s).span())
13261327
(0, 32)
1327-
>>> print(re.match('<.*>', s).group())
1328+
>>> print(re.prefixmatch('<.*>', s).group())
13281329
<html><head><title>Title</title>
13291330

13301331
The RE matches the ``'<'`` in ``'<html>'``, and the ``.*`` consumes the rest of
@@ -1340,7 +1341,7 @@ example, the ``'>'`` is tried immediately after the first ``'<'`` matches, and
13401341
when it fails, the engine advances a character at a time, retrying the ``'>'``
13411342
at every step. This produces just the right result::
13421343

1343-
>>> print(re.match('<.*?>', s).group())
1344+
>>> print(re.prefixmatch('<.*?>', s).group())
13441345
<html>
13451346

13461347
(Note that parsing HTML or XML with regular expressions is painful.

0 commit comments

Comments
 (0)