-
Notifications
You must be signed in to change notification settings - Fork 75
Expand file tree
/
Copy pathchapterHeap.tex
More file actions
186 lines (160 loc) · 6.04 KB
/
chapterHeap.tex
File metadata and controls
186 lines (160 loc) · 6.04 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
% !TEX root = algo-quicksheet.tex
\chapter{Heap}
\section{Introduction}
Heap-ordered. Binary heap is one of the implementations of Priority Queue (ADT). The core relationship of elements in the heap:
$A_{2i} \leq A_{i} \geq A_{2i+1}$.
\begin{figure}[hbtp]
\centering
\subfloat{\includegraphics[width=0.7\linewidth]{heap}}
\caption{Heap}
\label{fig:heap}
\end{figure}
\section{Applications}
\subsection{Python Heapq}
Python only has built in min-heap. To use max-heap, you can:
\begin{enumerate}
\item Invert the number: 1 becomes -1.
(usually the best solution)\item Wrap the data into another class and override \textbf{comparators}: \_\_cmp\_\_ or \_\_lt\_\_
\end{enumerate}
The following code presents the wrapping method:
\begin{python}
from dataclasses import dataclass
@dataclass
class HeapValue:
val: int
deleted: bool = False # lazy delete
def __lt__(self, other): # reverse for max-heap
return not self.val < other.val
\end{python}
Normally the deletion by value in Python is $O(n)$, to achieve $O(\lg n)$ we can use \textbf{lazy deletion}. Before take the top of the heap, we do the following:
\begin{python}
while heap and heap[0].deleted:
heapq.heappop(heap)
\end{python}
\subsection{Java Priority Queue}
\begin{java}
// min-heap
PriorityQueue<Integer> pq = new PriorityQueue<>(
(o1, o2) -> o1-o2
);
// max-heap
PriorityQueue<Integer> pq = new PriorityQueue<>(
(o1, o2) -> o2-o1
);
\end{java}
\subsection{Stale-Checking Heap for Update}
If introduceing \textbf{update} to the values in the heap, we can do stale checking as lazy validation.
Consider the following: $freq_i$ represents the frequency of $A_i$ at step $i$. We want to keep track of the most frequent number at each step $i$. $A$ contains duplicagtes. $freq_i$ can be negative.
\begin{python}
def most_frequent_num(self, A, freq):
counter = defaultdict(int)
h = []
ret = []
for n, cnt in zip(A, freq):
counter[n] += cnt
heapq.heappush(h, (-counter[n], n))
while True:
c, maxa = h[0]
if c != -counter[maxa]: # stale check
heapq.heappop(h)
else:
ret.append(maxa)
break
return ret
\end{python}
\subsection{Maintain min max in Sliding Window}
Count number of continuous subarrays in $A$ s.t. within the subarray $|A_i - A_j| \leq 2, \forall i, j$.
\rih{Core clues}
\begin{enumerate}
\item Need to maintain max and min in the sliding window $\Ra$ heap
\item In sliding window $i, j$, count of all valid subarrays ending AT j $\Ra j - i + 1$
\end{enumerate}
\begin{python}
def continuousSubarrays(self, A):
ret = 0
i = 0 # starting index of the window
j = 0 # ending index of the window
h_min = [] # (a, idx)
h_max = [] # (-a, idx)
while j < len(A):
heapq.heappush(h_min, (A[j], j))
heapq.heappush(h_max, (-A[j], j))
while -h_max[0][0] - h_min[0][0] > 2:
# shrink
i += 1
while h_min[0][1] < i:
heapq.heappop(h_min)
while h_max[0][1] < i:
heapq.heappop(h_max)
ret += j-i+1
j += 1
return ret
\end{python}
Note: 3-layred nested \pyinline{while} loop.
\subsection{Find $k$ smallest/largest}
Partial Quicksort \ref{find-k-th}
\section{Derivatives}
\subsection{Heap of Linked Lists}
Maintain a heap of linked lists, pop the min head, and push the head's next back to the heap.
\section{Inside the Library}
Assume the root \textbf{starts} at $a[1]$ rather than $a[0]$.
\\
Basic operations:
\begin{enumerate}
\item sink()/ sift\_down() - recursive
\item swim()/ sift\_up() - recursive
\item build()/ heapify() - bottom-up sink()
\end{enumerate}
\subsection{General}
The self-implemented binary heap's index usually starts at 1 rather than 0.
The array representation of heap is in \textbf{level-order}.
The main reason that we can use an array to represent the heap-ordered tree in a binary heap is because the tree is \textbf{complete}.
Suppose that we represent a BST containing N keys using an array, with $a[0]$ empty, the root at $a[1]$. The two children of $a[k]$ will be at $a[2k]$ and $a[2k+1]$. Then, the length of the array might need to be as large as $2^N$.
It is possible to have 3-heap. A 3-heap is an array representation (using 1-based indexing) of a complete 3-way tree.
The children of $a[k]$ are $a[3k-1]$, $a[3k]$, and $a[3k+1]$.
\begin{figure}[hbtp]
\centering
\subfloat{\includegraphics[width=0.9\linewidth]{heapRepr}}
\caption{Heap representation}
\label{fig:heap}
\end{figure}
\subsection{Sink (sift\_down)}
Core clue: compare parent to the \textit{larger} child (because we want to maintain the heap invariant).
\begin{python}
def sink(self, idx):
while 2*idx <= self.N:
c = 2*idx
if c+1 <= self.N and self.less(c, c+1):
c += 1
if not self.less(idx, c):
return
self.swap(idx, c)
idx = c
\end{python}
We can return the \pyinline{idx} at the end to report the final index of the element.
\subsection{Swim (sift\_up)}
Core clue: compare child to its parent.
\begin{python}
def swim(self, idx):
while idx > 1 and self.less(idx/2, idx):
pi = idx/2
self.swap(pi, idx)
idx = pi
\end{python}
\subsection{Heapify}
Core clue: bottom-up sink().
\begin{python}
def heapify(self):
for i in range(self.N/2, 0, -1):
self.sink(i);
\end{python}
\runinhead{Complexity.} Heapifying \textbf{a sorted array} is the worst case for heap construction, because the root of each subheap considered sinks all the way to the bottom. The worst case complexity $\sim 2N$.
Building a heap is $O(N)$ rather than $O(N \lg N)$. Intuitively, the deeper the level, the more the nodes, but the less the level to sink down.
At most $\big\lceil\frac{n}{2^{h+1}}\big\rceil$ nodes of any height $h$.
Proof:
\begin{align*}
\because \sum_{i=0}^{+\infty} {ix^i} =\frac{x}{(1-x)^2} \\
\therefore \sum_{h=0}^{\lfloor\lg n\rfloor}{\Big\lceil\frac{n}{2^{h+1}}\Big\rceil
O(h)} &= O\Bigg(n\sum_{h=0}^{\lfloor\lg n\rfloor}{\frac{h}{2^h}}\Bigg) \\
&= O(n)
\end{align*}