Introduction to Graph Structures

Overview of the Course:

In this section, we will cover the following topics:

But first! What is a graph in the world of computer science?

A graph is a fundamental data structure that consists of two key components:

A graph is often defined as:
$$ G = { V, E } $$
Where:

Directed Graph (Digraph): Edges have a direction, represented as an ordered pair $(u, v)$.
Undirected Graph: Edges do not have a direction, represented as an unordered pair ${u, v}$.

Additional Notes:

The set of vertices, $V$, is finite.
Edges, $E$, can be thought of as a binary relation, where each edge connects two vertices.

Cycle: A path that starts and ends at the same vertex, with no other repeated vertices.
Acyclic: A graph that contains no cycles.
Incident: The relationship between an edge and the two nodes it connects.
Adjacent: Two nodes are adjacent if there is an edge connecting them.
Node Degree: The number of edges incident to a particular node.
Regular Graph: A graph where every node has the same degree.
Sparse Graph: A graph with relatively few edges compared to the number of vertices.

In a disconnected graph, the minimum number of edges is 0 (no connections between vertices).
In a connected graph, the minimum number of edges is approximately $n-1$ (where $n$ is the number of vertices), and the maximum number of edges is approximately $n(n-1)/2$ for an undirected graph (dense graphs).

A path of length $k$ from vertex $a$ to vertex $b$ is a sequence of vertices:
$$ V_0, V_1, V_2, ..., V_k $$
Where:
$$ a = V_0, ;;; b = V_k $$

$(V_i, V_{i+1}) \in E \text{ for all } i = 0, 1, ..., k-1$ (i.e., each pair of consecutive vertices is connected by an edge).

If there exists a path from $a$ to $b$, then $b$ is said to be reachable from $a$.

A connected component is a subset of a graph where there is a path between every pair of nodes in the component.

For example:

Graph $G$ with $V = {a, b, c, d}$ and $E = {{a, b}, {b, c}, {c, d}}$ is connected because every node is reachable from every other node.
Graph $G$ with $V = {a, b, c, d}$ and $E = {{a, b}, {c, d}}$ has two connected components: one containing nodes $a$ and $b$, and the other containing nodes $c$ and $d$. Nodes $a$ and $d$ are not connected.

Directed graphs may have cycles, where you can traverse from one vertex and return to it by following directed edges.
Self-loops: A directed graph can have edges where both ends are the same vertex. This is not relevant for our example in maze graphs.
A strongly connected subgraph is a subgraph where any vertex can reach any other vertex within the subgraph. For example, in the cycle $a \rightarrow b \rightarrow c \rightarrow a$, all nodes are mutually reachable.

There are several ways to represent graphs in memory. Below are the two most commonly used representations:

Adjacency Lists are a preferred representation, especially for sparse graphs (graphs with few edges compared to the number of vertices).

For each node, an array or list contains its adjacent nodes (nodes connected to it by an edge).
Space Complexity: $O(E)$, where $E$ is the number of edges.
Time Complexity: To traverse all adjacent nodes, the time complexity is $O(n)$ (where $n$ is the number of vertices).

For the graph $G$ with $V = {a, b, c, d}$ and $E = {{a, b}, {a, c}, {a, d}, {b, c}, {b, d}, {c, d}}$, the adjacency list representation would be:

An Adjacency Matrix is a 2D array where the cell at row $i$, column $j$ contains 1 if there is an edge between vertex $i$ and vertex $j$, and 0 if there is no edge.

For the same graph $G$:

Space Complexity: $O(n^2)$, where $n$ is the number of vertices (because every pair of vertices is represented by an entry in the matrix).
Time Complexity: $O(1)$ for checking if an edge exists between two vertices.

Efficient for dense graphs: If a graph is dense (many edges), the adjacency matrix is efficient as it allows $O(1)$ time complexity for checking if two vertices are connected.
Memory Use: Can become inefficient if the graph is sparse, as it requires $O(n^2)$ memory.