Skip to content

Commit bdb0b36

Browse files
hugovkhauntsaninja
andauthored
gh-86519: Update docs for prefixmatch (#148096)
Co-authored-by: Shantanu <12621235+hauntsaninja@users.noreply.github.com>
1 parent 356a031 commit bdb0b36

File tree

5 files changed

+58
-59
lines changed

5 files changed

+58
-59
lines changed

Doc/howto/logging-cookbook.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3877,7 +3877,7 @@ subclassed handler which looks something like this::
38773877
def format(self, record):
38783878
version = 1
38793879
asctime = dt.datetime.fromtimestamp(record.created).isoformat()
3880-
m = self.tz_offset.match(time.strftime('%z'))
3880+
m = self.tz_offset.prefixmatch(time.strftime('%z'))
38813881
has_offset = False
38823882
if m and time.timezone:
38833883
hrs, mins = m.groups()

Doc/howto/regex.rst

Lines changed: 50 additions & 53 deletions
Original file line numberDiff line numberDiff line change
@@ -362,20 +362,21 @@ for a complete listing.
362362
+------------------+-----------------------------------------------+
363363
| Method/Attribute | Purpose |
364364
+==================+===============================================+
365-
| ``match()`` | Determine if the RE matches at the beginning |
366-
| | of the string. |
367-
+------------------+-----------------------------------------------+
368365
| ``search()`` | Scan through a string, looking for any |
369366
| | location where this RE matches. |
370367
+------------------+-----------------------------------------------+
368+
| ``prefixmatch()``| Determine if the RE matches at the beginning |
369+
| | of the string. Previously named :ref:`match() |
370+
| | <prefixmatch-vs-match>`. |
371+
+------------------+-----------------------------------------------+
371372
| ``findall()`` | Find all substrings where the RE matches, and |
372373
| | return them as a list. |
373374
+------------------+-----------------------------------------------+
374375
| ``finditer()`` | Find all substrings where the RE matches, and |
375376
| | return them as an :term:`iterator`. |
376377
+------------------+-----------------------------------------------+
377378

378-
:meth:`~re.Pattern.match` and :meth:`~re.Pattern.search` return ``None`` if no match can be found. If
379+
:meth:`~re.Pattern.search` and :meth:`~re.Pattern.prefixmatch` return ``None`` if no match can be found. If
379380
they're successful, a :ref:`match object <match-objects>` instance is returned,
380381
containing information about the match: where it starts and ends, the substring
381382
it matched, and more.
@@ -393,19 +394,19 @@ Python interpreter, import the :mod:`re` module, and compile a RE::
393394

394395
Now, you can try matching various strings against the RE ``[a-z]+``. An empty
395396
string shouldn't match at all, since ``+`` means 'one or more repetitions'.
396-
:meth:`~re.Pattern.match` should return ``None`` in this case, which will cause the
397+
:meth:`~re.Pattern.search` should return ``None`` in this case, which will cause the
397398
interpreter to print no output. You can explicitly print the result of
398-
:meth:`!match` to make this clear. ::
399+
:meth:`!search` to make this clear. ::
399400

400-
>>> p.match("")
401-
>>> print(p.match(""))
401+
>>> p.search("")
402+
>>> print(p.search(""))
402403
None
403404

404405
Now, let's try it on a string that it should match, such as ``tempo``. In this
405-
case, :meth:`~re.Pattern.match` will return a :ref:`match object <match-objects>`, so you
406+
case, :meth:`~re.Pattern.search` will return a :ref:`match object <match-objects>`, so you
406407
should store the result in a variable for later use. ::
407408

408-
>>> m = p.match('tempo')
409+
>>> m = p.search('tempo')
409410
>>> m
410411
<re.Match object; span=(0, 5), match='tempo'>
411412

@@ -437,27 +438,28 @@ Trying these methods will soon clarify their meaning::
437438

438439
:meth:`~re.Match.group` returns the substring that was matched by the RE. :meth:`~re.Match.start`
439440
and :meth:`~re.Match.end` return the starting and ending index of the match. :meth:`~re.Match.span`
440-
returns both start and end indexes in a single tuple. Since the :meth:`~re.Pattern.match`
441-
method only checks if the RE matches at the start of a string, :meth:`!start`
442-
will always be zero. However, the :meth:`~re.Pattern.search` method of patterns
443-
scans through the string, so the match may not start at zero in that
444-
case. ::
441+
returns both start and end indexes in a single tuple.
442+
The :meth:`~re.Pattern.search` method of patterns
443+
scans through the string, so the match may not start at zero.
444+
However, the :meth:`~re.Pattern.prefixmatch`
445+
method only checks if the RE matches at the start of a string, so :meth:`!start`
446+
will always be zero in that case. ::
445447

446-
>>> print(p.match('::: message'))
447-
None
448448
>>> m = p.search('::: message'); print(m)
449449
<re.Match object; span=(4, 11), match='message'>
450450
>>> m.group()
451451
'message'
452452
>>> m.span()
453453
(4, 11)
454+
>>> print(p.prefixmatch('::: message'))
455+
None
454456

455457
In actual programs, the most common style is to store the
456458
:ref:`match object <match-objects>` in a variable, and then check if it was
457459
``None``. This usually looks like::
458460

459461
p = re.compile( ... )
460-
m = p.match( 'string goes here' )
462+
m = p.search( 'string goes here' )
461463
if m:
462464
print('Match found: ', m.group())
463465
else:
@@ -495,15 +497,15 @@ Module-level functions
495497
----------------------
496498

497499
You don't have to create a pattern object and call its methods; the
498-
:mod:`re` module also provides top-level functions called :func:`~re.match`,
499-
:func:`~re.search`, :func:`~re.findall`, :func:`~re.sub`, and so forth. These functions
500+
:mod:`re` module also provides top-level functions called :func:`~re.search`,
501+
:func:`~re.prefixmatch`, :func:`~re.findall`, :func:`~re.sub`, and so forth. These functions
500502
take the same arguments as the corresponding pattern method with
501503
the RE string added as the first argument, and still return either ``None`` or a
502504
:ref:`match object <match-objects>` instance. ::
503505

504-
>>> print(re.match(r'From\s+', 'Fromage amk'))
506+
>>> print(re.prefixmatch(r'From\s+', 'Fromage amk'))
505507
None
506-
>>> re.match(r'From\s+', 'From amk Thu May 14 19:12:10 1998') #doctest: +ELLIPSIS
508+
>>> re.prefixmatch(r'From\s+', 'From amk Thu May 14 19:12:10 1998') #doctest: +ELLIPSIS
507509
<re.Match object; span=(0, 5), match='From '>
508510

509511
Under the hood, these functions simply create a pattern object for you
@@ -812,7 +814,7 @@ of a group with a quantifier, such as ``*``, ``+``, ``?``, or
812814
``ab``. ::
813815

814816
>>> p = re.compile('(ab)*')
815-
>>> print(p.match('ababababab').span())
817+
>>> print(p.search('ababababab').span())
816818
(0, 10)
817819

818820
Groups indicated with ``'('``, ``')'`` also capture the starting and ending
@@ -825,7 +827,7 @@ argument. Later we'll see how to express groups that don't capture the span
825827
of text that they match. ::
826828

827829
>>> p = re.compile('(a)b')
828-
>>> m = p.match('ab')
830+
>>> m = p.search('ab')
829831
>>> m.group()
830832
'ab'
831833
>>> m.group(0)
@@ -836,7 +838,7 @@ to determine the number, just count the opening parenthesis characters, going
836838
from left to right. ::
837839

838840
>>> p = re.compile('(a(b)c)d')
839-
>>> m = p.match('abcd')
841+
>>> m = p.search('abcd')
840842
>>> m.group(0)
841843
'abcd'
842844
>>> m.group(1)
@@ -912,10 +914,10 @@ but aren't interested in retrieving the group's contents. You can make this fact
912914
explicit by using a non-capturing group: ``(?:...)``, where you can replace the
913915
``...`` with any other regular expression. ::
914916

915-
>>> m = re.match("([abc])+", "abc")
917+
>>> m = re.search("([abc])+", "abc")
916918
>>> m.groups()
917919
('c',)
918-
>>> m = re.match("(?:[abc])+", "abc")
920+
>>> m = re.search("(?:[abc])+", "abc")
919921
>>> m.groups()
920922
()
921923

@@ -949,7 +951,7 @@ given numbers, so you can retrieve information about a group in two ways::
949951
Additionally, you can retrieve named groups as a dictionary with
950952
:meth:`~re.Match.groupdict`::
951953

952-
>>> m = re.match(r'(?P<first>\w+) (?P<last>\w+)', 'Jane Doe')
954+
>>> m = re.search(r'(?P<first>\w+) (?P<last>\w+)', 'Jane Doe')
953955
>>> m.groupdict()
954956
{'first': 'Jane', 'last': 'Doe'}
955957

@@ -1274,40 +1276,35 @@ In short, before turning to the :mod:`re` module, consider whether your problem
12741276
can be solved with a faster and simpler string method.
12751277

12761278

1277-
match() versus search()
1278-
-----------------------
1279+
.. _match-versus-search:
12791280

1280-
The :func:`~re.match` function only checks if the RE matches at the beginning of the
1281-
string while :func:`~re.search` will scan forward through the string for a match.
1282-
It's important to keep this distinction in mind. Remember, :func:`!match` will
1283-
only report a successful match which will start at 0; if the match wouldn't
1284-
start at zero, :func:`!match` will *not* report it. ::
1281+
prefixmatch() (aka match) versus search()
1282+
-----------------------------------------
12851283

1286-
>>> print(re.match('super', 'superstition').span())
1284+
:func:`~re.prefixmatch` was added in Python 3.15 as the :ref:`preferred name
1285+
<prefixmatch-vs-match>` for :func:`~re.match`. Before this, it was only known
1286+
as :func:`!match` and the distinction with :func:`~re.search` was often
1287+
misunderstood.
1288+
1289+
:func:`!prefixmatch` aka :func:`!match` only checks if the RE matches at the
1290+
beginning of the string while :func:`!search` scans forward through the
1291+
string for a match. ::
1292+
1293+
>>> print(re.prefixmatch('super', 'superstition').span())
12871294
(0, 5)
1288-
>>> print(re.match('super', 'insuperable'))
1295+
>>> print(re.prefixmatch('super', 'insuperable'))
12891296
None
12901297

1291-
On the other hand, :func:`~re.search` will scan forward through the string,
1298+
On the other hand, :func:`~re.search` scans forward through the string,
12921299
reporting the first match it finds. ::
12931300

12941301
>>> print(re.search('super', 'superstition').span())
12951302
(0, 5)
12961303
>>> print(re.search('super', 'insuperable').span())
12971304
(2, 7)
12981305

1299-
Sometimes you'll be tempted to keep using :func:`re.match`, and just add ``.*``
1300-
to the front of your RE. Resist this temptation and use :func:`re.search`
1301-
instead. The regular expression compiler does some analysis of REs in order to
1302-
speed up the process of looking for a match. One such analysis figures out what
1303-
the first character of a match must be; for example, a pattern starting with
1304-
``Crow`` must match starting with a ``'C'``. The analysis lets the engine
1305-
quickly scan through the string looking for the starting character, only trying
1306-
the full match if a ``'C'`` is found.
1307-
1308-
Adding ``.*`` defeats this optimization, requiring scanning to the end of the
1309-
string and then backtracking to find a match for the rest of the RE. Use
1310-
:func:`re.search` instead.
1306+
This distinction is important to remember when using the old :func:`~re.match`
1307+
name in code requiring compatibility with older Python versions.
13111308

13121309

13131310
Greedy versus non-greedy
@@ -1322,9 +1319,9 @@ doesn't work because of the greedy nature of ``.*``. ::
13221319
>>> s = '<html><head><title>Title</title>'
13231320
>>> len(s)
13241321
32
1325-
>>> print(re.match('<.*>', s).span())
1322+
>>> print(re.prefixmatch('<.*>', s).span())
13261323
(0, 32)
1327-
>>> print(re.match('<.*>', s).group())
1324+
>>> print(re.prefixmatch('<.*>', s).group())
13281325
<html><head><title>Title</title>
13291326

13301327
The RE matches the ``'<'`` in ``'<html>'``, and the ``.*`` consumes the rest of
@@ -1340,7 +1337,7 @@ example, the ``'>'`` is tried immediately after the first ``'<'`` matches, and
13401337
when it fails, the engine advances a character at a time, retrying the ``'>'``
13411338
at every step. This produces just the right result::
13421339

1343-
>>> print(re.match('<.*?>', s).group())
1340+
>>> print(re.prefixmatch('<.*?>', s).group())
13441341
<html>
13451342

13461343
(Note that parsing HTML or XML with regular expressions is painful.

Doc/library/fnmatch.rst

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -103,7 +103,8 @@ functions: :func:`fnmatch`, :func:`fnmatchcase`, :func:`.filter`, :func:`.filter
103103
.. function:: translate(pat)
104104

105105
Return the shell-style pattern *pat* converted to a regular expression for
106-
using with :func:`re.match`. The pattern is expected to be a :class:`str`.
106+
using with :func:`re.prefixmatch`. The pattern is expected to be a
107+
:class:`str`.
107108

108109
Example:
109110

@@ -113,7 +114,7 @@ functions: :func:`fnmatch`, :func:`fnmatchcase`, :func:`.filter`, :func:`.filter
113114
>>> regex
114115
'(?s:.*\\.txt)\\z'
115116
>>> reobj = re.compile(regex)
116-
>>> reobj.match('foobar.txt')
117+
>>> reobj.prefixmatch('foobar.txt')
117118
<re.Match object; span=(0, 10), match='foobar.txt'>
118119

119120

Doc/library/glob.rst

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -140,7 +140,8 @@ The :mod:`!glob` module defines the following functions:
140140
.. function:: translate(pathname, *, recursive=False, include_hidden=False, seps=None)
141141

142142
Convert the given path specification to a regular expression for use with
143-
:func:`re.match`. The path specification can contain shell-style wildcards.
143+
:func:`re.prefixmatch`. The path specification can contain shell-style
144+
wildcards.
144145

145146
For example:
146147

@@ -150,7 +151,7 @@ The :mod:`!glob` module defines the following functions:
150151
>>> regex
151152
'(?s:(?:.+/)?[^/]*\\.txt)\\z'
152153
>>> reobj = re.compile(regex)
153-
>>> reobj.match('foo/bar/baz.txt')
154+
>>> reobj.prefixmatch('foo/bar/baz.txt')
154155
<re.Match object; span=(0, 15), match='foo/bar/baz.txt'>
155156

156157
Path separators and segments are meaningful to this function, unlike

Doc/library/typing.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3797,7 +3797,7 @@ Aliases to other concrete types
37973797
Match
37983798

37993799
Deprecated aliases corresponding to the return types from
3800-
:func:`re.compile` and :func:`re.match`.
3800+
:func:`re.compile` and :func:`re.search`.
38013801

38023802
These types (and the corresponding functions) are generic over
38033803
:data:`AnyStr`. ``Pattern`` can be specialised as ``Pattern[str]`` or

0 commit comments

Comments
 (0)