Skip to content

Commit c2acc2b

Browse files
committed
fix: Boyer-Moore bad character shift was dead code in for-loop
The bad_character_heuristic() method used a for-loop with an assignment to the loop variable i, which was immediately overwritten by the next iteration. This caused the algorithm to degrade from O(n/m) to O(n*m) naive search. Changed to a while-loop so the shift actually takes effect. Added max(i+1, shift) guard to prevent backward skips when the mismatched character appears to the right of the mismatch in the pattern. Added edge case doctests.
1 parent 6c04620 commit c2acc2b

1 file changed

Lines changed: 16 additions & 4 deletions

File tree

strings/boyer_moore_search.py

Lines changed: 16 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -83,18 +83,30 @@ def bad_character_heuristic(self) -> list[int]:
8383
>>> bms = BoyerMooreSearch(text="ABAABA", pattern="AB")
8484
>>> bms.bad_character_heuristic()
8585
[0, 3]
86+
87+
>>> bms = BoyerMooreSearch(text="AAAAA", pattern="AB")
88+
>>> bms.bad_character_heuristic()
89+
[]
90+
91+
>>> bms = BoyerMooreSearch(text="ABABAB", pattern="ABA")
92+
>>> bms.bad_character_heuristic()
93+
[0, 2]
94+
95+
>>> bms = BoyerMooreSearch(text="", pattern="AB")
96+
>>> bms.bad_character_heuristic()
97+
[]
8698
"""
8799

88100
positions = []
89-
for i in range(self.textLen - self.patLen + 1):
101+
i = 0
102+
while i <= self.textLen - self.patLen:
90103
mismatch_index = self.mismatch_in_text(i)
91104
if mismatch_index == -1:
92105
positions.append(i)
106+
i += 1
93107
else:
94108
match_index = self.match_in_pattern(self.text[mismatch_index])
95-
i = (
96-
mismatch_index - match_index
97-
) # shifting index lgtm [py/multiple-definition]
109+
i = max(i + 1, mismatch_index - match_index)
98110
return positions
99111

100112

0 commit comments

Comments
 (0)