[NOTES] CS 6515: Intro to Graduate Algorithms Recap

06 Dec, 2024

Update: Finally got an A in CS-6515.

You'll know soon enough what this class is about. I'll instead tell y'all what it's like to take this class:

It's referred to as the "final-boss" of the MS CS degree at GaTech.
To graduate, you need a minimum grade of B.
A lot of students attempt this class twice (or more) to get a B.
This class often leads to severe burnout and also many students change their specialization to completely avoid this class.
This was the grade distribution in the summer (45.8% class basically either failed or dropped the class):

I have started writing this post on 5th December, 2024; five days before the CS-6515 final exam. Here are some reasons why I shouldn't be taking this exam:

I already have a B in this class so effectively the final is optional.
I start a new job soon after this exam. I could instead spend the time relaxing and doing fun things.
No one will care about my final grade.
Several of my classmates and TAs have advised against taking this exam because I need 85+ in the final exam to get an A. This is an especially hard target because the final exam covers the entire syllabus and hence, is significantly harder.
I'm running on fumes after a year of studying. It's mentally arduous and I don't think I can study even for a minute now.

I'm left with few convincing reasons to take this exam. After an hour of contemplation these are the reasons why I WILL take the exam:

I don't think I can score an 85+ in the finals. Attempting the final is futile. But I feel it is my duty to give my best and it would bring great honour to the legacy of Chatrapati Shivaji Maharaj, if I uphold my duties.
I'm severely burnt out and I cannot continue studying. Or maybe this is a lie. Only one way to find out.

Note: These are meant to be rough/short-hand notes that will help me retain information that I've already studied in extreme depth previously. It's difficult for me to study right now because there isn't enough mental stimulation/challenge to enter a flow-state. Writing notes in LaTex and actively trying to summarize the syllabus slightly increases the challenge and only maximise the flow-state. To the general audience, these will not be very useful.

1. Notations

Big-O notation: $f (n) = O (g (n))$ . Important: $f (n) \leq g (n)$
small-o notation: $f (n) = o (g (n))$ . Important: $f (n) < g (n)$
Big-Omega notation: $f (n) = Ω (g (n))$ . Important: $f (n) \geq g (n)$
small-omega notation: $f (n) = ω (g (n))$ . Important: $f (n) > g (n)$

2. Proofs by Induction

This is not gonna be directly tested but there are standard patterns in which proofs by induction that you can simply use in your exams.

Base Case: Easiest part where you argue why algorithm is right for input size 1 or 0 (or whatever is base level input). You WILL get partial marks here (so don't fumble this)
Hypothesis: Claim that the algo holds true for inputs up to size n.
Induction Step:

Pattern 1: Argue (argument 1) how problem changes with input $n + 1_{t h}$ input and how it can be solved correctly. Then use claim in step 2 to state that everything up to input size n works correctly hence, argument 1 + hypothesis is the required proof
Pattern 2: Change input to $2 n$ and follow the previous method
Pattern 3: Assume the opposite/contradiction and then show that algorithm works correctly without the contradiction and hence correct.

3. Master's Theorem

I just need to remember some basic results to solve short questions. This is mostly used to quickly figure out runtimes of some recursive functions.

T (n) = a . T (n \b) + O (n^{c})

When we unroll a recursion tree then on $i_{t h}$ level of the tree, the input size is $(n / b^{i})$ .
When we unroll a recursion tree then on $i_{t h}$ level of the tree, the number of nodes/vertices are $a^{i}$ .
When we unroll a recursion tree then on $i_{t h}$ level of the tree, the total time spent is:

O ((n / b^{i})^{c} . a^{i})

Across all the layers and nodes the total time spent is:

O (n^{c} \sum_{i = 0}^{l o g_{b} (n)} (a / b^{c})^{i})

There are three cases for Master's Theorem:

$a = b^{c}$ : $O (n^{c} l o g (n))$
$a < b^{c}$ : $O (n^{c})$
$a > b^{c}$ : $O (n^{l o g_{b} (a)})$

4. Polynomial Multiplication and FFT

Time Complexity: $O (d . l o g (d))$ where $d$ is the degree of the polynomial
Product Representation:

(\sum_{i = 0}^{d} x^{i} f_{i}) . (\sum_{i = 0}^{d} x^{i} g_{i}) = (\sum_{i = 0}^{2 d} (x^{i} . \sum_{k = 0}^{i} g_{k} . f_{i - k}))

Where $k = 0 \dots i$ and $i - k = i \dots 0$
Data structure is roughly this:

$f = [0, 1, \dots, d]$
$g = [0, 1, \dots, d]$
$h = [0, 1, \dots, 2 d]$
$h [i] = \sum_{k = 0}^{i} g [k] . f [i - k]$

Imp Observation:

$A [1, \dots, n]$
$B [1, \dots, m]$ ( $m \leq n$ )
$a (x) = \sum_{i = 1}^{n} x^{i} . A [i]$
$b (x) = \sum_{i = 1}^{m} x^{i} . A [m - i]$
$c (x) = a (x) . b (x)$
Then coefficient of $x^{i}$ in $c (x)$ is dot product of $B . A [i, i + 1, \dots, i + m - 1]$

Some FFT Tricks:

3-SUM: You have 3 sets A, B and C. You have to find if sum of two elements from A and B can be found in C. Trick: $A = 1, 3, 5 = x + x^{3} + x^{5}$ . A is larger array and B is smaller array then for FFT we must flip B to get convolution (Call this $h (x)$ ). For $c \in C$ if coefficient of $x^{c}$ is non-zero in $h (x)$ then 3-SUM output is true else false. TC: $O (d . l o g (d))$

5. DFS, BFS and Topological Order

DFS/BFS Runtime: $O (| E |)$
Topological Order: Order of vertices so that all edges go "left to right". Sorting by post-order in descending direction gives a valid topological order.
Edge Types: (note that you will have to refer to the algorithm at the end of this section)

Tree Edge: Edges used by DFS to recurse
Back Edge: If you find an edge that connects to a vertex that HASN'T finished it's traversal then it's a back edge (if postorder[u] == 0 then it's a back edge where 'u' the neighbour of some vertex we are currently visiting.
Forward Edge: (A,D) forward if A visited before D (preorder[u] > preorder[v])
Cross Edge: (A,D) cross edge if A visited after D

The algorithm roughly looks like this:

preorder = [0] * n
postorder = [0] * n
precounter = 0
postcounter = 0
visited = {}
def DFS(v):
    nonlocal preorder, postorder, precounter, postcounter
    visited[v] = True
    preorder[v] = precounter
    precounter += 1
    for neighbour u of v:
        if not visited[u]:
            DFS(u) # Tree Edge
        elif postorder[u] == 0:
            # (v, u) is a backedge
        elif preorder[u] > preorder[v]:
            # (v,u) is forward edge
        else:
            # (v,u) is cross edge
     postorder[v] = postordercounter
     postordercounter += 1

DFS/BFS: STACK/QUEUE.

FILO: Algo behaves like a stack (DFS)
FIFO: Algo behaves like a queue (BFS)

6. Search Algorithms

Note that these results can be directly used in exams.

Theorem: Dijkstra's Algorithm finds the shortest path start vertex to EVERY OTHER VERTEX if the graph has only positive edge weights.
Dijkstra's Algorithm uses distance as a priority in priority queue
The algorithm is roughly this:

def Dijkstra(S):
    queue.add((S, null, 0)) # current, parent and distance/priority
    distance[S] = 0
    while queue not empty:
        v, parent, _ = queue.pop() # pop out the lowest priority value (shortest distance)
        if visited[v]:
            continue
        visited[v] = True
        distance[v] = distance[parent] + cost(parent, v)
        for neighbour u of v:
            queue.add(u, v, distance[v] + cost(v, u)) # v is parent and the last param is the priority

Bottleneck path: Find path that allows carrying max amount of flow?
Algo is roughly this:

def Bottleneck(S):
    queue.add((S, null, 0)) # current, parent and distance/priority
    distance[S] = 0
    while queue not empty:
        v, parent, _ = queue.pop() # pop out the highest priority value (largest edge)
        if visited[v]:
            continue
        visited[v] = True
        distance[v] = min(distance[parent], cost(parent, v))
        for neighbour u of v:
            queue.add(u, v, min(distance[v], cost(v, u))) # v is parent and the last param is the priority

Min Spanning Tree of G is a spanning tree of G with smallest possible sum of edge weights
PRIMs algo is roughly this:

def PRIM(S):
    queue.add((S, null, 0)) # current, parent and distance/priority aint gonna be used
    while queue not empty:
        v, parent, _ = queue.pop() # pop out the highest priority value (largest edge)
        if visited[v]:
            continue
        visited[v] = True
        T = T union {v, parent}
        for neighbour u of v:
            queue.add(u, v, cost(v, u)) # v is parent and the last param is the priority

A* Algo is some heuristic based search algo
Algorithm is roughly this:

def AStar(S):
    queue.add((S, null, 0)) # current, parent and distance/priority
    distance[S] = 0
    while queue not empty:
        v, parent, _ = queue.pop() # pop out the lowest priority value (smallest heuristic value)
        if distance[v] > distance[parent] + cost(parent, v):
            distance[v] = distance[parent] + cost(parent, v) # found a new smaller path
        else:
            for neighbour u of v:
                queue.add(u, v, distance[v] + cost(v, u) + heuristic(u_)) # v is parent and the last param is the priority

7. Matchings

Definition: Augmenting path given M is a path consisting of

Alternate matched and unmatched edges in G (Graph)
First and Last vertices unmatched in M

M is a max-matching iff there is no augmenting path
We can find max-matching efficiently if we can find an augmenting path in polytime.
How to find augmenting paths? Use Edmond's Blossom algorithm (implementation details not on the test)
Theorem: We can find augmenting paths in a bipartite graph in O(|E| + |V|) time
Max Matching in Bipartite graphs runtime: $O ((m + n) . n)$ (Time to find a path using BFS * Max number of augmenting paths)

8. Max Flow and Min Cut

Flow Definition:

$f_{e} \geq 0$ for all $e \in E$
$f_{e} \leq c_{e}$ for all $e \in E$
For all $c \notin S, T$ , $\sum_{v} f_{v u} = \sum_{v} f_{u v}$

Max Flow problem is to find a flow $f_{e}$ that maximizes $\sum_{u \neq s} f_{s u}$
We can solve bipartite matching using max-flow (Add a source and sink vertex. Source connects to all vertices in bipartite graph vertex set A and sink connects to all vertices in bipartite graph vertex set B).
Residual Graphs:

Same vertices V
Edges $e i n E$ have capacity $c_{e} - f_{e}$
Edges $- e$ with capacity $f_{e}$

Ford Fulkerson Algorithm:

Start with $f = (0, \dots, 0)$
Create residual graph $G^{f}$
Find s-t path $P$ in $G^{f}$ -- Break if no such path exists
Update $f = f + P$ and go to step 2

Runtime for Ford-Fulkerson: $O ((m + n) * max_flow)$
Theorem: Ford-Fulkerson returns the max-flow
Theorem: Max-flow = Min-Cut
There exists a polynomial time algorithm that finds max-st-flow in $O ((m + n) * m * l o g (C))$ (This is called the Capacity Scaling algorithm)
Capacity Scaling Algorithm is roughly as follows:

f = 0
delta = C # C is max capacity
while delta >= 1:
    1. Construct Residual graph G^f(delta) where we remove edges of G^f with capacity < delta (keep only those edges that are ATLEAST delta amount)
    2. Find st-path in G^f(delta)
        2.1 f <- f + P*delta (go to 1)
    3. Else if no st-path found:
        3.1 delta <- delta / 2
return f

Runtime of Capacity Scaling is: 9. $O ((m + n) * m * l o g (C))$
Some observations:

Each scaling operation of $Δ$ (Delta) performs $\leq 2 m$ augmentations. Time $\leq O ((m + n) * 2 m * l o g (C))$
At the end of any scale $Δ$ , $f \leq f^{*} + m Δ$
if $Δ = 1$ then it (capacity scaling) behaves like Ford-Fulkerson algorithm.

9. Linear Programming

Feasible Region: Set of all points $x$ satisfying all constraints
Halfspace: linear inequality
Is feasible region of LP always connection? -> YES
Equalities are allowed in LPs. Strict inequalities are not allowed!
Definition of a Convex Region/Set: A set $S \subseteq ℝ^{n}$ is CONVEX if for any $x, y \in S$ we have $α (x) + (1 - α) y \in S \forall α \in [0, 1]$
Any LP can be solved in time that is polynomial in the number of input bits
Will a poly sized LP always have a polysized optimal solution?

YES
Optimal solution is a "Corner" of feasible region.

Theorem: For any LP, there exists some non-negative linear combination of constraints s.t.

LHS = objective
RHS = optimal value

10. LP Duality

Primal LP Objective: $m a x c^{T} x$
Primal LP Constraints: $A x \leq b$ and $x \geq 0$
Primal LP has $n$ variables and $m$ constraints
Dual LP Objective: $m i n b^{T} y$
Dual LP Constraints: $A^{T} y \geq c$ and $y \geq 0$
Dual LP has $m$ variables and $n$ constraints
Weak Duality Theorem:

Feasible $x$ for Primal LP
Feasible $y$ for Dual LP
Then optimal primal value is always less than or equal to optimal dual value
$c^{T} x \leq b^{T} y$

Weak Duality Corollary:

If we find feasible $x$ and feasible $y$ where $c^{T} x = b^{T} y$ then $x$ and $y$ are optimal
If primal LP is unbounded then dual is infeasible and vice-versa

Strong LP Duality Theorem:

If primal LP value is finite, then optimal dual LP value is the same.
OR: Primal LP is feasible and bounded iff dual LP is feasible and bounded.

Farkas Lemma:

There exists $x \in ℝ^{n}$ s.t. $A x = b$ and $x \geq 0$ is infeasible then there exists a $y$ s.t. $y^{T} A > = 0$ and $y^{T} b < 0$
Farkas' lemma is a certificate of infeasiblity

Theorem: There exists a polytime algorithm to check LP-feasibility then there exists a polytime algorithm for LP-Optimization

11. Dynamic Programming

Common Patterns on Test:

LIS: Runtime is $O (n^{2})$

    def LIS(self, nums: List[int]) -> int:
        L = len(nums)
        dp = [1]*L
        dp[0] = 1
        max_dp = 1
        for i in range(1, L):
            for j in range(i-1, -1, -1):
                if nums[i] > nums[j]:
                    dp[i] = max(dp[i], 1+dp[j])
            max_dp = max(max_dp, dp[i])
        return max_dp

Max Independent Sets on Trees: Runtime is $O (n)$

M[v] = max{
1 + Sum([M(u) for u in grandchildOf(v)]), # includes the root
Sum([M(u) for u in childOf(v)]) # excludes the root
}

Tip: For exam, present solution in form of table structure, base-case, induction-step and runtime-complexity.

12. Knapsack using Dynamic Programming

Note that Knapsack is a NP-Hard problem and hence we don't think it's solvable in polynomial time.
Polytime vs Pseudo-Polytime Algorithms:

Polytime: Algorithms that are poly functions of input value $n$
Pseudo-Polytime: Algorithms that are poly functions of the numeric value of input $n$

Pseudo-Polytime algo for knapsack:
Algo 1: Assume B is small (B = poly(n)) => all $s_{i}$ are small

Algo runtime: $O (n B)$
Base: $M (0, W) =$ { $v_{0}$ if $w \geq s_{0}$ else $0$ }
Inductive Step: $M (i, w) =$ max( $M i - 1, w)$ , $v_{i} + M (i - 1, w - s_{i})$ if $s_{i} \leq W$ )
Answer: $M (n - 1, B)$
TC = Space Complexity = $O (n B)$

Algo 2: Assume all values are small. All $v_{i} \leq v_{m a x}$

Time Complexity = Space Complexity = $O (n^{2} v_{m a x})$
Table Structure: $M (i, v)$ = min total knapsack size that suffices to get exactly value $v$ using subset of jobs $0, \dots, i$
$M (i, v) = inf$ if $v$ not achievable using subset of jobs $0, \dots, i$
Base Case: $M (0, v) =$ { $0$ if $v = 0$ else $inf$ if $v > 0$ }
$M (i, 0) = 0$
Inductive Step: min( $M (i - 1, v), S_{i} + M (i - 1, v - v_{i})$ if $v \geq v_{i}$ )
Answer: max $v$ such that $M (n - 1, v) \leq B$

PTAS vs FPATS:

PTAS (Poly Time Approx Scheme): $(1 - ϵ)$ approximation. Assume $ϵ$ is a constant then you get a polytime algo $n^{(1 / ϵ)^{(1 / ϵ)}}$
FPTAS (Fully Poly Time Approx Scheme): $(1 - ϵ)$ approximation that is $p o l y (input_size, 1 / ϵ)$

Knapsack FPTAS Algo:

Theorem: An algorithm that value $\geq (1 - ϵ)$ optimization and has time $O (n^{3} / ϵ)$
Step 1: Scaling factor $θ = \frac{ϵ . v_{m a x}}{n}$
n = num items
v_max = max valued item
$ϵ$ = precision
Step 2: Transformation Then change the values like so: $\bar{v_{i}} = ⌈ (\frac{v_{i}}{θ}) ⌉ . θ$
Step 3: Algorithm Use DP to find optimal solution for an instance with $\bar{v_{i}}$ values (Algo 2 above)
Step 4: Runtime Time complexity is now: $O (n^{2} . {\bar{v}}_{m a x}) = O (n^{2} . ⌈ \frac{v_{m a x}}{θ} ⌉) = O (n^{2} . ⌈ \frac{n}{ϵ} ⌉) = O (\frac{n^{3}}{ϵ})$
This algo is polynomial in n and linear in $\frac{1}{ϵ}$

13. Markov Decision Process (MDP)

Definition: Finite state space $S$

finite action space $A$
start state $s_{1} \in S$

Transition Probabilities: $P_{a} (s, s^{'})$ for $a \in A$ and $s, s^{'} \in S$

$\sum_{s^{'} \in S} P_{a} (s, s^{'}) = 1$
Input: $O (| S |^{2} . | A |)$
Rewards: $r_{a} (s) \in ℝ$

Goal?: Given time horizon $T$ , design an algorithm (policy) that plays actions to maximize $𝔼 [total reward]$
Policy Definition: An algorithm (policy) $Π$ tells us what action $a_{t}^{π} \in A$ to take at step $t$
Denote state of policy $Π$ at step $t$ as $S_{t}^{π}$
$Goal: Given time horizon T, {max}_{π} 𝔼 [\sum_{t = 1}^{T} r_{a_{t}}^{π} (S_{t}^{π})]$
We know, for any policy $π$ :

V^{π} (s_{1}, T) = r_{a_{1}}^{π} (s_{1}) + \sum_{s^{'} \in 𝒮} p_{a_{1}}^{π} (s_{1}, s^{'}) \cdot V^{π} (s^{'}, T - 1)

(reward of 1st step) (Expected reward of $T-1$ steps)

V^{*} (s, T) = {max}_{a} {r_{a} (s) + \sum_{s^{'} \in 𝒮} p_{a} (s, s^{'}) \cdot V^{*} (s^{'}, T - 1)}

⟹ We can calculate using Dynamic Programming (DP):

Time horizon: \begin{matrix} States \\ ↓ \\ V^{*} (s, T) \end{matrix}

Space: | 𝒮 | \cdot T Time: 𝒪 (| 𝒜 | \cdot | 𝒮 |^{2} \cdot T)

III

Okay, typing out notes got boring real quick as well. I need more mental stimulation. I will live-stream my suffering to further stimulate my brain. At this point I am sleeping only 3-4 hours. I am clearly mentally fatigued but let's fucking keep going fellas:

IV To retain some theorems and important results, I also made a lot of Flashcards. I'll just upload them here:

Now that the exam is done, here is what happened: I got an A: Wediditfellas

We did it fellas. Note that I did not get an 85+. In fact, I just got an 82. My final grade was 89.37% and the cutoff for an A was 90%. However, more than 80% of the class finished the CIOS survey and hence the grade cutoff was relaxed by exactly one 1%. Everyone who had an 89% got an A.

Final boss of grad-school is defeated. Can't wait to get back to working on real world engineering problems now.

#CS 6515 #georgia-tech #grad-algos