-
-
Notifications
You must be signed in to change notification settings - Fork 34.4k
gh-86519: Update docs for prefixmatch
#148096
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
hugovk
wants to merge
11
commits into
python:main
Choose a base branch
from
hugovk:3.15-howto-regex-re.match
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from 6 commits
Commits
Show all changes
11 commits
Select commit
Hold shift + click to select a range
f82c1fc
Update regex HOWTO for re.prefixmatch
hugovk 3362193
Update fnmatch to prefixmatch
hugovk 8d0571f
Update glob to prefixmatch
hugovk 2fd6236
Update typing to search
hugovk ca8b571
Update logging-cookbook to prefixmatch
hugovk 1f263e9
Preserve HTML ID
hugovk 4090e69
Mention previous name
hugovk ff22eb2
Revise 'prefixmatch() versus search()'
hugovk 920a695
Trim a bit more from 'prefixmatch() versus search()'
hugovk 96d80f4
Improve title
hugovk d2bdc8e
Fix underline length to fix docs build
hugovk File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -362,20 +362,20 @@ for a complete listing. | |
| +------------------+-----------------------------------------------+ | ||
| | Method/Attribute | Purpose | | ||
| +==================+===============================================+ | ||
| | ``match()`` | Determine if the RE matches at the beginning | | ||
| | | of the string. | | ||
| +------------------+-----------------------------------------------+ | ||
| | ``search()`` | Scan through a string, looking for any | | ||
| | | location where this RE matches. | | ||
| +------------------+-----------------------------------------------+ | ||
| | ``prefixmatch()``| Determine if the RE matches at the beginning | | ||
| | | of the string. | | ||
| +------------------+-----------------------------------------------+ | ||
| | ``findall()`` | Find all substrings where the RE matches, and | | ||
| | | returns them as a list. | | ||
| +------------------+-----------------------------------------------+ | ||
| | ``finditer()`` | Find all substrings where the RE matches, and | | ||
| | | returns them as an :term:`iterator`. | | ||
| +------------------+-----------------------------------------------+ | ||
|
|
||
| :meth:`~re.Pattern.match` and :meth:`~re.Pattern.search` return ``None`` if no match can be found. If | ||
| :meth:`~re.Pattern.search` and :meth:`~re.Pattern.prefixmatch` return ``None`` if no match can be found. If | ||
| they're successful, a :ref:`match object <match-objects>` instance is returned, | ||
| containing information about the match: where it starts and ends, the substring | ||
| it matched, and more. | ||
|
|
@@ -391,21 +391,21 @@ Python interpreter, import the :mod:`re` module, and compile a RE:: | |
| >>> p | ||
| re.compile('[a-z]+') | ||
|
|
||
| Now, you can try matching various strings against the RE ``[a-z]+``. An empty | ||
| Now, you can try searching various strings against the RE ``[a-z]+``. An empty | ||
| string shouldn't match at all, since ``+`` means 'one or more repetitions'. | ||
| :meth:`~re.Pattern.match` should return ``None`` in this case, which will cause the | ||
| :meth:`~re.Pattern.search` should return ``None`` in this case, which will cause the | ||
| interpreter to print no output. You can explicitly print the result of | ||
| :meth:`!match` to make this clear. :: | ||
| :meth:`!search` to make this clear. :: | ||
|
|
||
| >>> p.match("") | ||
| >>> print(p.match("")) | ||
| >>> p.search("") | ||
| >>> print(p.search("")) | ||
| None | ||
|
|
||
| Now, let's try it on a string that it should match, such as ``tempo``. In this | ||
| case, :meth:`~re.Pattern.match` will return a :ref:`match object <match-objects>`, so you | ||
| case, :meth:`~re.Pattern.search` will return a :ref:`match object <match-objects>`, so you | ||
| should store the result in a variable for later use. :: | ||
|
|
||
| >>> m = p.match('tempo') | ||
| >>> m = p.search('tempo') | ||
| >>> m | ||
| <re.Match object; span=(0, 5), match='tempo'> | ||
|
|
||
|
|
@@ -437,27 +437,28 @@ Trying these methods will soon clarify their meaning:: | |
|
|
||
| :meth:`~re.Match.group` returns the substring that was matched by the RE. :meth:`~re.Match.start` | ||
| and :meth:`~re.Match.end` return the starting and ending index of the match. :meth:`~re.Match.span` | ||
| returns both start and end indexes in a single tuple. Since the :meth:`~re.Pattern.match` | ||
| method only checks if the RE matches at the start of a string, :meth:`!start` | ||
| will always be zero. However, the :meth:`~re.Pattern.search` method of patterns | ||
| scans through the string, so the match may not start at zero in that | ||
| case. :: | ||
| returns both start and end indexes in a single tuple. | ||
| The :meth:`~re.Pattern.search` method of patterns | ||
| scans through the string, so the match may not start at zero. | ||
| However, the :meth:`~re.Pattern.prefixmatch` | ||
| method only checks if the RE matches at the start of a string, so :meth:`!start` | ||
| will always be zero in that case. :: | ||
|
|
||
| >>> print(p.match('::: message')) | ||
| None | ||
| >>> m = p.search('::: message'); print(m) | ||
| <re.Match object; span=(4, 11), match='message'> | ||
| >>> m.group() | ||
| 'message' | ||
| >>> m.span() | ||
| (4, 11) | ||
| >>> print(p.prefixmatch('::: message')) | ||
| None | ||
|
|
||
| In actual programs, the most common style is to store the | ||
| :ref:`match object <match-objects>` in a variable, and then check if it was | ||
| ``None``. This usually looks like:: | ||
|
|
||
| p = re.compile( ... ) | ||
| m = p.match( 'string goes here' ) | ||
| m = p.search( 'string goes here' ) | ||
| if m: | ||
| print('Match found: ', m.group()) | ||
| else: | ||
|
|
@@ -495,15 +496,15 @@ Module-Level Functions | |
| ---------------------- | ||
|
|
||
| You don't have to create a pattern object and call its methods; the | ||
| :mod:`re` module also provides top-level functions called :func:`~re.match`, | ||
| :func:`~re.search`, :func:`~re.findall`, :func:`~re.sub`, and so forth. These functions | ||
| :mod:`re` module also provides top-level functions called :func:`~re.search`, | ||
| :func:`~re.prefixmatch`, :func:`~re.findall`, :func:`~re.sub`, and so forth. These functions | ||
| take the same arguments as the corresponding pattern method with | ||
| the RE string added as the first argument, and still return either ``None`` or a | ||
| :ref:`match object <match-objects>` instance. :: | ||
|
|
||
| >>> print(re.match(r'From\s+', 'Fromage amk')) | ||
| >>> print(re.prefixmatch(r'From\s+', 'Fromage amk')) | ||
| None | ||
| >>> re.match(r'From\s+', 'From amk Thu May 14 19:12:10 1998') #doctest: +ELLIPSIS | ||
| >>> re.prefixmatch(r'From\s+', 'From amk Thu May 14 19:12:10 1998') #doctest: +ELLIPSIS | ||
| <re.Match object; span=(0, 5), match='From '> | ||
|
|
||
| Under the hood, these functions simply create a pattern object for you | ||
|
|
@@ -812,7 +813,7 @@ of a group with a quantifier, such as ``*``, ``+``, ``?``, or | |
| ``ab``. :: | ||
|
|
||
| >>> p = re.compile('(ab)*') | ||
| >>> print(p.match('ababababab').span()) | ||
| >>> print(p.search('ababababab').span()) | ||
| (0, 10) | ||
|
|
||
| Groups indicated with ``'('``, ``')'`` also capture the starting and ending | ||
|
|
@@ -825,7 +826,7 @@ argument. Later we'll see how to express groups that don't capture the span | |
| of text that they match. :: | ||
|
|
||
| >>> p = re.compile('(a)b') | ||
| >>> m = p.match('ab') | ||
| >>> m = p.search('ab') | ||
| >>> m.group() | ||
| 'ab' | ||
| >>> m.group(0) | ||
|
|
@@ -836,7 +837,7 @@ to determine the number, just count the opening parenthesis characters, going | |
| from left to right. :: | ||
|
|
||
| >>> p = re.compile('(a(b)c)d') | ||
| >>> m = p.match('abcd') | ||
| >>> m = p.search('abcd') | ||
| >>> m.group(0) | ||
| 'abcd' | ||
| >>> m.group(1) | ||
|
|
@@ -912,10 +913,10 @@ but aren't interested in retrieving the group's contents. You can make this fact | |
| explicit by using a non-capturing group: ``(?:...)``, where you can replace the | ||
| ``...`` with any other regular expression. :: | ||
|
|
||
| >>> m = re.match("([abc])+", "abc") | ||
| >>> m = re.search("([abc])+", "abc") | ||
| >>> m.groups() | ||
| ('c',) | ||
| >>> m = re.match("(?:[abc])+", "abc") | ||
| >>> m = re.search("(?:[abc])+", "abc") | ||
| >>> m.groups() | ||
| () | ||
|
|
||
|
|
@@ -949,7 +950,7 @@ given numbers, so you can retrieve information about a group in two ways:: | |
| Additionally, you can retrieve named groups as a dictionary with | ||
| :meth:`~re.Match.groupdict`:: | ||
|
|
||
| >>> m = re.match(r'(?P<first>\w+) (?P<last>\w+)', 'Jane Doe') | ||
| >>> m = re.search(r'(?P<first>\w+) (?P<last>\w+)', 'Jane Doe') | ||
| >>> m.groupdict() | ||
| {'first': 'Jane', 'last': 'Doe'} | ||
|
|
||
|
|
@@ -1274,18 +1275,20 @@ In short, before turning to the :mod:`re` module, consider whether your problem | |
| can be solved with a faster and simpler string method. | ||
|
|
||
|
|
||
| match() versus search() | ||
| ----------------------- | ||
| .. _match-versus-search: | ||
|
|
||
| prefixmatch() versus search() | ||
hugovk marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| ----------------------------- | ||
|
|
||
| The :func:`~re.match` function only checks if the RE matches at the beginning of the | ||
| The :func:`~re.prefixmatch` function only checks if the RE matches at the beginning of the | ||
| string while :func:`~re.search` will scan forward through the string for a match. | ||
| It's important to keep this distinction in mind. Remember, :func:`!match` will | ||
| It's important to keep this distinction in mind. Remember, :func:`!prefixmatch` will | ||
| only report a successful match which will start at 0; if the match wouldn't | ||
| start at zero, :func:`!match` will *not* report it. :: | ||
| start at zero, :func:`!prefixmatch` will *not* report it. :: | ||
|
|
||
| >>> print(re.match('super', 'superstition').span()) | ||
| >>> print(re.prefixmatch('super', 'superstition').span()) | ||
| (0, 5) | ||
| >>> print(re.match('super', 'insuperable')) | ||
| >>> print(re.prefixmatch('super', 'insuperable')) | ||
| None | ||
|
|
||
| On the other hand, :func:`~re.search` will scan forward through the string, | ||
|
|
@@ -1296,7 +1299,7 @@ reporting the first match it finds. :: | |
| >>> print(re.search('super', 'insuperable').span()) | ||
| (2, 7) | ||
|
|
||
| Sometimes you'll be tempted to keep using :func:`re.match`, and just add ``.*`` | ||
| Sometimes you'll be tempted to keep using :func:`re.prefixmatch`, and just add ``.*`` | ||
|
||
| to the front of your RE. Resist this temptation and use :func:`re.search` | ||
| instead. The regular expression compiler does some analysis of REs in order to | ||
| speed up the process of looking for a match. One such analysis figures out what | ||
|
|
@@ -1322,9 +1325,9 @@ doesn't work because of the greedy nature of ``.*``. :: | |
| >>> s = '<html><head><title>Title</title>' | ||
| >>> len(s) | ||
| 32 | ||
| >>> print(re.match('<.*>', s).span()) | ||
| >>> print(re.prefixmatch('<.*>', s).span()) | ||
| (0, 32) | ||
| >>> print(re.match('<.*>', s).group()) | ||
| >>> print(re.prefixmatch('<.*>', s).group()) | ||
| <html><head><title>Title</title> | ||
|
|
||
| The RE matches the ``'<'`` in ``'<html>'``, and the ``.*`` consumes the rest of | ||
|
|
@@ -1340,7 +1343,7 @@ example, the ``'>'`` is tried immediately after the first ``'<'`` matches, and | |
| when it fails, the engine advances a character at a time, retrying the ``'>'`` | ||
| at every step. This produces just the right result:: | ||
|
|
||
| >>> print(re.match('<.*?>', s).group()) | ||
| >>> print(re.prefixmatch('<.*?>', s).group()) | ||
| <html> | ||
|
|
||
| (Note that parsing HTML or XML with regular expressions is painful. | ||
|
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd at least leave a little mention of "Previously named
match()" with a link to the canonical doc section explaining the renaming and soft deprecation in here.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added.