Skip to content

Commit 28655bf

Browse files
committed
feat(strings): count anagrams
1 parent ad930f7 commit 28655bf

5 files changed

Lines changed: 215 additions & 0 deletions

File tree

Lines changed: 135 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,135 @@
1+
# Count Anagrams
2+
3+
You are given a string s containing one or more words. Every consecutive pair of words is separated by a single space ' '.
4+
5+
A string t is an anagram of string s if the `ith` word of t is a permutation of the `ith` word of s.
6+
7+
For example, "acb dfe" is an anagram of "abc def", but "def cab" and "adc bef" are not.
8+
Return the number of distinct anagrams of s. Since the answer may be very large, return it modulo 10^9 + 7.
9+
10+
## Examples
11+
12+
Example 1:
13+
14+
```text
15+
Input: s = "too hot"
16+
Output: 18
17+
Explanation: Some of the anagrams of the given string are "too hot", "oot hot", "oto toh", "too toh", and "too oht".
18+
```
19+
20+
Example 2:
21+
22+
```text
23+
Input: s = "aa"
24+
Output: 1
25+
Explanation: There is only one anagram possible for the given string.
26+
```
27+
28+
## Constraints
29+
30+
- 1 <= s.length <= 105
31+
- s consists of lowercase English letters and spaces ' '.
32+
- There is single space between consecutive words.
33+
34+
## Topics
35+
36+
- Hash Table
37+
- Math
38+
- String
39+
- Combinatorics
40+
- Counting
41+
42+
## Hints
43+
44+
- For each word, can you count the number of permutations possible if all characters are distinct?
45+
- How to reduce overcounting when letters are repeated?
46+
- The product of the counts of distinct permutations of all words will give the final answer.
47+
48+
## Solution
49+
50+
An anagram of the entire string means that each word can be rearranged independently, but the order of the words stays
51+
the same. This implies that we can find a string’s total number of distinct anagrams by calculating how many valid
52+
permutations exist for each word. The number of permutations for each word depends on its length and the frequency of its
53+
letters. By multiplying all words’ permutations, we get the total number of distinct anagrams for the whole string.
54+
55+
![Illustration 1](./images/solutions/count_anagrams_illustration_1.png)
56+
57+
Let’s explore a few optimizations we can apply to avoid these issues.
58+
59+
- **Precompute factorials**: Calculating factorials repeatedly for each word is very slow. Precompute factorials up to the
60+
maximum length (1000) to make these calculations faster. This is because it allows us to quickly access the precomputed
61+
values later.
62+
- **Take modular inverses**: As mentioned in the problem statement, we need to use 10^9 +7 modulo when computing factorial
63+
values. This prevents integer overflow and keeps the numbers manageable as factorials grow extremely fast. It is easier
64+
to use modulo when computing permutations of words with no duplicate characters, but it becomes challenging when we have
65+
words with duplicate characters. This is because a division is involved `(n! / (c1! * c2!*...))` Normal division isn’t
66+
allowed in modular arithmetic, so we use the modular inverse instead. This will allow us to perform division by
67+
converting it into multiplication. Therefore, when precomputing factorials, we also precompute their modular inverses,
68+
i.e. c1!^-1, c2!^-1, c3!^-1 using Fermat’s theorem.
69+
70+
> **Fermat’s Little theorem** states that if p is a prime number and a is an integer not divisible by p, then we we get
71+
> the modular inverse of a as follows:
72+
> a^-1 ≡ a^(p-2) mod p
73+
> In our case, MOD = 10^9 +7 (a prime number), and we need modular inverses of c1!, c2!, c3! and so on, so the modular
74+
> inverse of cx! is computed as:
75+
> cx^-1 ≡ cx^(MOD-2) mod MOD
76+
77+
In the example illustration above, the final answer becomes a more manageable number after applying the modulo operation.
78+
79+
![Illustration 2](./images/solutions/count_anagrams_illustration_2.png)
80+
81+
Let’s look at the algorithm steps of this solution:
82+
83+
- **Set up the constants and variables**:
84+
- We define a large prime number, MOD = 10**9 + 7, to keep calculations manageable.
85+
- We define a variable, MAX_LEN = 1000, representing the maximum possible word length. The length of any word will not
86+
exceed 1000.
87+
- We create two lists, factorials and invFactorials, to store precomputed values that make calculations faster later.
88+
- We create a variable, result, to store the final result.
89+
- **Precompute factorials and inverse factorials**: This step is done once at the start to speed things up later. A
90+
function preCompute() is used to:
91+
- Compute the factorials for all numbers up to MAX_LEN using the preCompute function.
92+
- Compute the modular inverse of these factorials (used for division in modular arithmetic). First, the modular inverse
93+
of factorials[MAX_LEN] is computed. Then, we calculate the inverse factorial for all numbers down to 1 using this.
94+
- Store them in lists, factorials and invFactorials, so we don’t have to recompute them every time.
95+
- **Count permutations for each word**: The function countPermutations(word) calculates how many unique ways we can rearrange
96+
a word. It uses the precomputed factorials and inverse factorials for fast calculations.
97+
- First, it counts how many times each letter appears in the word, letter_count.
98+
- Then, it calculates the number of ways to arrange the letters, total_permutations.
99+
- If a letter repeats, it adjusts the count to avoid counting the same arrangement multiple times.
100+
- **Count anagram groups in a sentence**: The function countAnagrams(s) works as follows:
101+
- It splits the sentence into words, words.
102+
- It finds the unique arrangements for each word using countPermutations(word).
103+
- It multiplies and stores all the results together in result to get the total number of anagram groups.
104+
- After processing all words, return the total number of valid anagrams of the entire string result.
105+
106+
### Time Complexity
107+
108+
Let’s break down and analyze the time complexity of this solution:
109+
110+
- **Precomputation step**: This step takes O(n) time, where n = MAX_LEN (the maximum possible word length in the input string
111+
s). Let’s see how:
112+
- **Factorials calculation**: Computing factorials for all numbers up to MAX_LEN takes O(n) time.
113+
- **Inverse factorials calculation**: Calculating inverse factorials in a backward loop also takes O(n).
114+
- **Processing each word:** This step takes O(m) time, where m is the length of the word. Let’s see how:
115+
- Counting characters for each word takes O(m) time.
116+
- **Calculating permutations for each word**: We loop over character counts (at most 26 for lowercase English letters),
117+
which takes O(1).
118+
- **Modular multiplications**: This takes O(1) per operation.
119+
- Iterating through all words: Let L represent the total length of the input string. As we split the input into words and
120+
process each word, this takes O(L).
121+
122+
If we sum these up, the overall time complexity simplifies to:
123+
124+
`O(n) + O(m) + O(L) = O(n+L)`
125+
126+
### Space Complexity
127+
128+
Let’s break down and analyze the space complexity of this solution:
129+
130+
- **Factorials array**: It stores factorials from 0 to MAX_LEN, so the space required is O(n).
131+
- **Inverse factorials array**: It stores modular inverses of factorials from 0 to MAX_LEN, so the space required is O(n)
132+
133+
As no additional significant space is used during the calculations, the total space complexity becomes:
134+
135+
`O(n) + O(n) = O(n)`
Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,56 @@
1+
from typing import Counter
2+
from collections import Counter
3+
4+
5+
def count_anagrams(s: str) -> int:
6+
# Define modulo constant for large number calculations
7+
mod = 10**9 + 7
8+
# Define maximum possible word length
9+
max_len = 10**5
10+
11+
# Arrays to store precomputed factorials and their modular inverses
12+
factorials = [1] * (max_len + 1)
13+
inv_factorials = [1] * (max_len + 1)
14+
15+
def pre_compute():
16+
"""
17+
Precomputes factorials and their modular inverses using Fermat's theorem.
18+
This allows efficient computation of permutations involving repeated characters
19+
"""
20+
# Compute factorials modulo MOD
21+
factorials[0] = 1
22+
for i in range(2, max_len + 1):
23+
factorials[i] = factorials[i - 1] * i % mod
24+
25+
# Compute modular inverse of MAX_LEN! using Fermat's Little Theorem
26+
inv_factorials[max_len] = pow(factorials[max_len], mod - 2, mod)
27+
28+
# Compute modular inverses for all numbers from MAX_LEN-1 down to 1
29+
for i in range(max_len - 1, 0, -1):
30+
inv_factorials[i] = inv_factorials[i + 1] * (i + 1) % mod
31+
32+
def count_permutations(w: str) -> int:
33+
"""
34+
Computes the number of distinct permutations of a given word, considering repeated characters
35+
"""
36+
# Count occurrences of each character
37+
letter_count: Counter = Counter(w)
38+
39+
# Compute n! for total characters
40+
total_permutations = factorials[len(w)]
41+
42+
# Divide by factorial of each character count to account for duplicates
43+
for freq in letter_count.values():
44+
total_permutations = (total_permutations * inv_factorials[freq]) % mod
45+
46+
return total_permutations
47+
48+
result = 1
49+
pre_compute()
50+
words = s.split()
51+
52+
# Multiply the permutations of each word to get the final count
53+
for word in words:
54+
result = (result * count_permutations(word)) % mod
55+
56+
return result
53.2 KB
Loading
49.4 KB
Loading
Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
import unittest
2+
from parameterized import parameterized
3+
from pystrings.anagram.count_anagrams import count_anagrams
4+
5+
COUNT_ANAGRAMS_TEST_CASES = [
6+
("too hot", 18),
7+
("aa", 1),
8+
("all good", 36),
9+
("a a a b b", 1),
10+
("hello world", 7200),
11+
("excel", 60),
12+
("ab ab cd cd ef ef", 64),
13+
]
14+
15+
16+
class CountAnagramsTestCase(unittest.TestCase):
17+
@parameterized.expand(COUNT_ANAGRAMS_TEST_CASES)
18+
def test_count_anagrams(self, s: str, expected: int):
19+
actual = count_anagrams(s)
20+
self.assertEqual(expected, actual)
21+
22+
23+
if __name__ == "__main__":
24+
unittest.main()

0 commit comments

Comments
 (0)