Skip to content

Commit 306a56a

Browse files
authored
Merge pull request #149 from BrianLusina/feat/algorithms-graphs-topological-sort
feat(algorithms, graphs, topological-sort): possible recipes from supplies
2 parents d960d9b + 3bc898f commit 306a56a

File tree

45 files changed

+772
-2
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

45 files changed

+772
-2
lines changed

DIRECTORY.md

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -83,6 +83,8 @@
8383
* Happy Number
8484
* [Test Happy Number](https://github.com/BrianLusina/PythonSnips/blob/master/algorithms/fast_and_slow/happy_number/test_happy_number.py)
8585
* Graphs
86+
* Alien Dictionary
87+
* [Test Alien Dictionary](https://github.com/BrianLusina/PythonSnips/blob/master/algorithms/graphs/alien_dictionary/test_alien_dictionary.py)
8688
* Cat And Mouse
8789
* [Test Cat And Mouse](https://github.com/BrianLusina/PythonSnips/blob/master/algorithms/graphs/cat_and_mouse/test_cat_and_mouse.py)
8890
* Course Schedule
@@ -106,6 +108,8 @@
106108
* [Union Find](https://github.com/BrianLusina/PythonSnips/blob/master/algorithms/graphs/number_of_islands/union_find.py)
107109
* Number Of Provinces
108110
* [Test Number Of Provinces](https://github.com/BrianLusina/PythonSnips/blob/master/algorithms/graphs/number_of_provinces/test_number_of_provinces.py)
111+
* Recipes Supplies
112+
* [Test Find All Possible Recipes](https://github.com/BrianLusina/PythonSnips/blob/master/algorithms/graphs/recipes_supplies/test_find_all_possible_recipes.py)
109113
* Reconstruct Itinerary
110114
* [Test Reconstruct Itinerary](https://github.com/BrianLusina/PythonSnips/blob/master/algorithms/graphs/reconstruct_itinerary/test_reconstruct_itinerary.py)
111115
* Reorder Routes
@@ -203,6 +207,11 @@
203207
* Stack
204208
* Daily Temperatures
205209
* [Test Daily Temperatures](https://github.com/BrianLusina/PythonSnips/blob/master/algorithms/stack/daily_temperatures/test_daily_temperatures.py)
210+
* Subsets
211+
* Cascading Subsets
212+
* [Test Cascading Subsets](https://github.com/BrianLusina/PythonSnips/blob/master/algorithms/subsets/cascading_subsets/test_cascading_subsets.py)
213+
* Find All Subsets
214+
* [Test Find All Subsets](https://github.com/BrianLusina/PythonSnips/blob/master/algorithms/subsets/find_all_subsets/test_find_all_subsets.py)
206215
* Taxi Numbers
207216
* [Taxi Numbers](https://github.com/BrianLusina/PythonSnips/blob/master/algorithms/taxi_numbers/taxi_numbers.py)
208217
* Two Pointers
@@ -963,7 +972,6 @@
963972
* [Test Array From Permutation](https://github.com/BrianLusina/PythonSnips/blob/master/tests/datastructures/arrays/test_array_from_permutation.py)
964973
* [Test Array Pair Sum](https://github.com/BrianLusina/PythonSnips/blob/master/tests/datastructures/arrays/test_array_pair_sum.py)
965974
* [Test Build Tower](https://github.com/BrianLusina/PythonSnips/blob/master/tests/datastructures/arrays/test_build_tower.py)
966-
* [Test Cascading Subsets](https://github.com/BrianLusina/PythonSnips/blob/master/tests/datastructures/arrays/test_cascading_subsets.py)
967975
* [Test Dynamic Array](https://github.com/BrianLusina/PythonSnips/blob/master/tests/datastructures/arrays/test_dynamic_array.py)
968976
* [Test Find Unique](https://github.com/BrianLusina/PythonSnips/blob/master/tests/datastructures/arrays/test_find_unique.py)
969977
* [Test Highest Rank](https://github.com/BrianLusina/PythonSnips/blob/master/tests/datastructures/arrays/test_highest_rank.py)
Lines changed: 289 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,289 @@
1+
# Alien Dictionary
2+
3+
You are given a list of words written in an alien language, where the words are sorted lexicographically by the rules of
4+
this language. Surprisingly, the aliens also use English lowercase letters, but possibly in a different order.
5+
6+
Given a list of words written in the alien language, return a string of unique letters sorted in the lexicographical
7+
order of the alien language as derived from the list of words.
8+
9+
If there’s no solution, that is, no valid lexicographical ordering, you can return an empty string "".
10+
11+
If multiple valid orderings exist, you may return any of them.
12+
13+
> Note: A string, a, is considered lexicographically smaller than string b if:
14+
> 1. At the first position where they differ, the character in a comes before the character in b in the alien alphabet.
15+
> 2. If one string is a prefix of the other, the shorter string is considered smaller.
16+
17+
## Constraints
18+
19+
- 1 <= `words.length` <= 10^3
20+
- 1 <= `words[i].length` <= 20
21+
- All characters in `words[i]` are English lowercase letters
22+
23+
## Examples
24+
25+
![Example 1](./images/examples/alien_dictionary_example_1.png)
26+
![Example 2](./images/examples/alien_dictionary_example_2.png)
27+
![Example 3](./images/examples/alien_dictionary_example_3.png)
28+
![Example 4](./images/examples/alien_dictionary_example_4.png)
29+
![Example 5](./images/examples/alien_dictionary_example_5.png)
30+
31+
## Solution
32+
33+
We can solve this problem using the topological sort pattern. Topological sort is used to find a linear ordering of
34+
elements that have dependencies on or priority over each other. For example, if A is dependent on B or B has priority
35+
over A, then B is listed before A in topological order.
36+
37+
Using the list of words, we identify the relative precedence order of the letters in the words and generate a graph to
38+
represent this ordering. To traverse a graph, we can use breadth-first search to find the letters’ order.
39+
40+
We can essentially map this problem to a graph problem, but before exploring the exact details of the solution, there
41+
are a few things that we need to keep in mind:
42+
43+
1. The letters within a word don’t tell us anything about the relative order. For example, the word “educative” in the list
44+
doesn’t tell us that the letter “e” is before the letter “d.”
45+
46+
2. The input can contain words followed by their prefix, such as “educated” and then “educate.” These cases will never result
47+
in a valid alphabet because in a valid alphabet, prefixes are always first. We need to make sure our solution detects
48+
these cases correctly.
49+
3. There can be more than one valid alphabet ordering. It’s fine for our algorithm to return any one of them.
50+
4. The output dictionary must contain all unique letters within the words list, including those that could be in any position
51+
within the ordering. It shouldn’t contain any additional letters that weren’t in the input.
52+
53+
### Step-by-step solution construction
54+
55+
For the graph problem, we can break this particular problem into three parts:
56+
57+
1. Extract the necessary information to identify the dependency rules from the words. For example, in the words
58+
[“patterns”, “interview”], the letter “p” comes before “i.”
59+
2. With the gathered information, we can put these dependency rules into a directed graph with the letters as nodes and
60+
the dependencies (order) as the edges.
61+
3. Lastly, we can sort the graph nodes topologically to generate the letter ordering (dictionary).
62+
63+
Let’s look at each part in more depth.
64+
65+
#### Part 1: Identifying the dependencies
66+
67+
Let’s start with example words and observe the initial ordering through simple reasoning:
68+
69+
`["mzosr", "mqov", "xxsvq", "xazv", "xazau", "xaqu", "suvzu", "suvxq", "suam", "suax", "rom", "rwx", "rwv"]`
70+
71+
As in the English language dictionary, where all the words starting with “a” come at the start followed by the words
72+
starting with “b,” “c,” “d,” and so on, we can expect the first letters of each word to be in alphabetical order.
73+
74+
`["m", "m", "x", "x", "x", "x", "s", "s", "s", "s", "r", "r", "r"]`
75+
76+
Removing the duplicates, we get the following:
77+
78+
`["m", "x", "s", "r"]`
79+
80+
Following the intuition explained above, we can assume that the first letters in the messages are in alphabetical order:
81+
82+
![Solution 1](./images/solutions/alien_dictionary_solution_1.png)
83+
84+
Looking at the letters above, we know the relative order of these letters, but we don’t know how these letters fit in
85+
with the rest of the letters. To get more information, we need to look further into our English dictionary analogy. The
86+
word “dirt” comes before “dorm.” This is because we look at the second letter when the first letter is the same. In this
87+
case, “i” comes before “o” in the alphabet.
88+
89+
We can apply the same logic to our alien words and look at the first two words, “mzsor” and “mqov.” As the first letter
90+
is the same in both words, we look at the second letter. The first word has “z,” and the second one has “q.” Therefore,
91+
we can safely say that “z” comes before “q” in this alien language. We now have two fragments of the letter order:
92+
93+
![Solution 2](./images/solutions/alien_dictionary_solution_2.png)
94+
95+
> Note: Notice that we didn’t mention rules such as “m -> a”. This is fine because we can derive this relation from
96+
> “m -> x”, “x -> a”.
97+
98+
This is it for the first part. Let’s put the pieces that we have in place.
99+
100+
#### Part 2: Representing the dependencies
101+
102+
We now have a set of relations mentioning the relative order of the pairs of letters:
103+
104+
`["z -> q", "m -> x", "x -> a", "x -> v", "x -> s", "z -> x", "v -> a", "s -> r", "o -> w"]`
105+
106+
Now the question arises, how can we put these relations together? It might be tempting to start chaining all these
107+
together. Let’s look at a few possible chains:
108+
109+
![Solution 3](./images/solutions/alien_dictionary_solution_3.png)
110+
111+
We can observe from our chains above that some letters might appear in more than one chain, and putting the chains into
112+
the output list one after the other won’t work. Some of the letters might be duplicated and would result in an invalid
113+
ordering. Let’s try to visualize the relations better with the help of a graph. The nodes are the letters, and an edge
114+
between two letters, “x” and “y” represents that “x” is before “y” in the alien words.
115+
116+
![Solution 4](./images/solutions/alien_dictionary_solution_4.png)
117+
118+
#### Part 3: Generating the dictionary
119+
120+
As we can see from the graph, four of the letters have no incoming arrows. This means that there are no letters that
121+
have to come before any of these four.
122+
123+
> Remember: There could be multiple valid dictionaries, and if there are, then it’s fine for us to return any of them.
124+
125+
Therefore, a valid start to the ordering we return would be as follows:
126+
`["o", "m", "u", "z"]`
127+
128+
We can now remove these letters and edges from the graph because any other letters that required them first will now have
129+
this requirement satisfied.
130+
131+
![Solution 5](./images/solutions/alien_dictionary_solution_5.png)
132+
133+
There are now three new letters on this new graph that have no in arrows. We can add these to our output list.
134+
135+
`["o", "m", "u", "z", "x", "q", "w"]`
136+
137+
Again, we can remove these from the graph.
138+
139+
![Solution 6](./images/solutions/alien_dictionary_solution_6.png)
140+
141+
Then, we add the two new letters with no in arrows.
142+
143+
`["o", "m", "u", "z", "x", "q", "w", "v", "s"]`
144+
This leaves the following graph:
145+
146+
![Solution 7](./images/solutions/alien_dictionary_solution_7.png)
147+
148+
We can place the final two letters in our output list and return the ordering:
149+
150+
`["o", "m", "u", "z", "x", "q", "w", "v", "s", "a", "r"]`
151+
Let’s now review how we can implement this approach.
152+
153+
Identifying the dependencies and representing them in the form of a graph is pretty straightforward. We extract the
154+
relations and insert them into an adjacency list:
155+
156+
![Solution 8](./images/solutions/alien_dictionary_solution_8.png)
157+
158+
Next, we need to generate the dictionary from the extracted relations: identify the letters (nodes) with no incoming links.
159+
Identifying whether a particular letter (node) has any incoming links or not from our adjacency list format can be a
160+
little complicated. A naive approach is to repeatedly iterate over the adjacency lists of all the other nodes and check
161+
whether or not they contain a link to that particular node.
162+
163+
This naive method would be fine for our case, but perhaps we can do it more optimally.
164+
165+
An alternative is to keep two adjacency lists:
166+
167+
One with the same contents as the one above.
168+
One reversed that shows the incoming links.
169+
This way, every time we traverse an edge, we can remove the corresponding edge from the reversed adjacency list:
170+
171+
![Solution 9](./images/solutions/alien_dictionary_solution_9.png)
172+
173+
What if we can do better than this? Instead of tracking the incoming links for all the letters from a particular letter,
174+
we can track the count of how many incoming edges there are. We can keep the in-degree count of all the letters along with
175+
the forward adjacency list.
176+
177+
> In-degree corresponds to the number of incoming edges of a node.
178+
179+
It will look like this:
180+
181+
![Solution 10](./images/solutions/alien_dictionary_solution_10.png)
182+
183+
Now, we can decrement the in-degree count of a node instead of removing it from the reverse adjacency list. When the in-degree of the node reaches 0, this particular node has no incoming links left.
184+
185+
We perform BFS on all the letters that are reachable, that is, the in-degree count of the letters is zero. A letter is
186+
only reachable once the letters that need to be before it have been added to the output, result.
187+
188+
We use a queue to keep track of reachable nodes and perform BFS on them. Initially, we put the letters that have zero
189+
in-degree count. We keep adding the letters to the queue as their in-degree counts become zero.
190+
191+
We continue this until the queue is empty. Next, we check whether all the letters in the words have been added to the
192+
output or not. This would only happen when some letters still have some incoming edges left, which means there is a cycle.
193+
In this case, we return an empty string.
194+
195+
> Remember: There can be letters that don’t have any incoming edges. This can result in different orderings for the same
196+
> set of words, and that’s all right.
197+
198+
### Solution summary
199+
200+
To recap, the solution to this problem can be divided into the following parts:
201+
202+
1. Build a graph from the given words and keep track of the in-degrees of alphabets in a dictionary.
203+
2. Add the sources to a result list.
204+
3. Remove the sources and update the in-degrees of their children. If the in-degree of a child becomes 0, it’s the next
205+
source.
206+
4. Repeat until all alphabets are covered.
207+
208+
### Time Complexity
209+
210+
There are three parts to the algorithm:
211+
212+
- Identifying all the relations.
213+
- Putting them into an adjacency list.
214+
- Converting it into a valid alphabet ordering.
215+
216+
In the worst case, the identification and initialization parts require checking every letter of every word, which is
217+
O(c), where c is the total length of all the words in the input list added together.
218+
219+
For the generation part, we can recall that a breadth-first search has a cost of O(v+e), where v is the number of vertices
220+
and e is the number of edges. Our algorithm has the same cost as BFS because it visits each edge and node once.
221+
222+
> Note: A node is visited once all of its edges are visited, unlike the traditional BFS where it’s visited once any edge
223+
> is visited.
224+
225+
Therefore, determining the cost of our algorithm requires determining how many nodes and edges there are in the graph.
226+
227+
**Nodes**: We know that there’s one vertex for each unique letter, that is, O(u) vertices, where u is the total number of
228+
unique letters in words. While this is limited to 26 in our case, we still look at how it would impact the complexity if
229+
this weren’t the case.
230+
231+
**Edges**: We generate each edge in the graph by comparing two adjacent words in the input list. There are n−1 pairs of
232+
adjacent words, and only one edge can be generated from each pair, where n is the total number of words in the input list.
233+
We can again look back at the English dictionary analogy to make sense of this:
234+
235+
"dirt"
236+
"dorm"
237+
238+
The only conclusion we can draw is that “i” is before “o.” This is the reason "dirt" appears before "dorm" in an English
239+
dictionary. The solution explains that the remaining letters “rt” and “rm” are irrelevant for determining the alphabetical
240+
ordering.
241+
242+
> Remember: We only generate rules for adjacent words and don’t add the “implied” rules to the adjacency list.
243+
244+
So with this, we know that there are at most n−1 edges.
245+
246+
We can place one additional upper limit on the number of edges since it’s impossible to have more than one edge between
247+
each pair of nodes. With u nodes, this means there can’t be more than u^2 edges.
248+
249+
Because the number of edges has to be lower than both n−1 and u^2, we know it’s at most the smallest of these two values:
250+
min(u^2 ,n−1).
251+
252+
We can now substitute the number of nodes and the number of edges in our breadth-first search cost:
253+
- v=u
254+
- e=min(u^2 ,n−1)
255+
256+
This gives us the following:
257+
> O(v+e) = O(u + min(u^2, n−1)) = O(u + min(u^2 ,n))
258+
259+
Finally, we combine the three parts: O(c) for the first two parts and O(u + min(u^2 ,n)) for the third part. Since we
260+
have two independent parts, we can add them and look at the final formula to see whether or not we can identify any
261+
relation between them. Combining them, we get the following:
262+
263+
> O(c) + O(u + min(u^2, n)) = O(c + u + min(u^2,n))
264+
265+
So, what do we know about the relative values of n, c and u? We can deduce that both n, the total number of words, and
266+
u, the total number of unique letters, are smaller than the total number of letters, c, because each word contains at
267+
least one character and there can’t be more unique characters than there are characters.
268+
269+
We know that c is the biggest of the three, but we don’t know the relation between n and u.
270+
271+
Let’s simplify our formulation a little since we know that the u bit is insignificant compared to c
272+
273+
> O(c+u+min(u^2,n))−>O(c+min(u^2 ,n))
274+
275+
Let’s now consider two cases to simplify it a little further:
276+
277+
- If u^2 is smaller than n, then min(u^2,n)=u^2. We have already established that u^2 is smaller than n, which is, in
278+
turn, smaller than c, and so u^2 is definitely less than c. This leaves us with O(c).
279+
- If u^2 is larger than n, then min(u^2,n)=n. Because c>n, we’re left with O(c).
280+
281+
So in all cases, we know that c>min(u^2 ,n). This gives us a final time complexity of O(c).
282+
283+
### Space Complexity
284+
285+
The space complexity is O(1) or O(u+min(u^2, n)). The adjacency list uses O(v+e) memory, which in the worst case is
286+
min(u^2 ,n), as explained in the time complexity analysis. So in total, the adjacency list takes
287+
O(u+min(u^2,n)) space. So, the space complexity for a large number of letters is O(u+min(u^2 ,n)). However, for our use
288+
case, where u is fixed at a maximum of 26, the space complexity is O(1). This is because u is fixed at 26, and the number
289+
of relations is fixed at 26^2, so O(min(26^2,n))=O(26^2)=O(1).

0 commit comments

Comments
 (0)