Bug report
Bug description:
There appears to be a bug with the findall() function in the re library in which using different syntax for search for the same sequence of strings does not produce the same results. For example, when searching a DNA sequence for a GC rich region, using re.findall("[GC]{12,}", dnaString) versus re.findall("(G|C){12,}", dnaString) produces different results:
import re
dnaString = "GCCGCGGGGGCCCCCGCGCCCGGGGATATTATAAAGGGGGGGGCCCCCCCCCCCCCCCCCCCCGC"
allGCrich = re.findall("[GC]{12,}", dnaString)
print(allGCrich)
# prints "['GCCGCGGGGGCCCCCGCGCCCGGGG', 'GGGGGGGGCCCCCCCCCCCCCCCCCCCCGC']" as desired
allGCrich = re.findall("(G|C){12,}", dnaString)
print(allGCrich)
# prints "['G', 'C']" which doesn't appear to be correct
This does not appear to occur some of the other functions in the re library—such as search()—as using re.search("(G|C){12,}", dnaString) and re.search("[GC]{12,}", dnaString) produces the same results, as desired.
CPython versions tested on:
3.14
Operating systems tested on:
macOS
Bug report
Bug description:
There appears to be a bug with the
findall()function in therelibrary in which using different syntax for search for the same sequence of strings does not produce the same results. For example, when searching a DNA sequence for a GC rich region, usingre.findall("[GC]{12,}", dnaString)versusre.findall("(G|C){12,}", dnaString)produces different results:This does not appear to occur some of the other functions in the
relibrary—such assearch()—as usingre.search("(G|C){12,}", dnaString)andre.search("[GC]{12,}", dnaString)produces the same results, as desired.CPython versions tested on:
3.14
Operating systems tested on:
macOS