Data Structures and Algorithms/Trees and Graphs

From Wikiversity
Jump to navigation Jump to search

Tree[edit | edit source]

In computer science, a tree is a widely used abstract data type (ADT)—or data structure implementing this ADT—that simulates a hierarchical tree structure, with a root value and subtrees of children with a parent node, represented as a set of linked nodes.

A tree data structure can be defined recursively (locally) as a collection of nodes (starting at a root node), where each node is a data structure consisting of a value, together with a list of references to nodes (the "children"), with the constraints that no reference is duplicated, and none points to the root.

Alternatively, a tree can be defined abstractly as a whole (globally) as an ordered tree, with a value assigned to each node. Both these perspectives are useful: while a tree can be analyzed mathematically as a whole, when actually represented as a data structure it is usually represented and worked with separately by node (rather than as a set of nodes and an adjacency list of edges between nodes, as one may represent a digraph, for instance). For example, looking at a tree as a whole, one can talk about "the parent node" of a given node, but in general as a data structure a given node only contains the list of its children, but does not contain a reference to its parent (if any)

Definition[edit | edit source]

A tree is a data structure made up of nodes or vertices and edges without having any cycle. The tree with no nodes is called the null or empty tree. A tree that is not empty consists of a root node and potentially many levels of additional nodes that form a hierarchy.

Terminology used in trees[edit | edit source]

The top node in a tree.
A node directly connected to another node when moving away from the Root.
The converse notion of a child.
A group of nodes with the same parent.
A node reachable by repeated proceeding from parent to child.
A node reachable by repeated proceeding from child to parent.
(less commonly called External node)
A node with no children.
Internal node
A node with at least one child.
The number of subtrees of a node.
The connection between one node and another.
A sequence of nodes and edges connecting a node with a descendant.
The level of a node is defined by 1 + (the number of connections between the node and the root).
Height of node
The height of a node is the number of edges on the longest path between that node and a leaf.
Height of tree
The height of a tree is the height of its root node.
The depth of a node is the number of edges from the tree's root node to the node.
A forest is a set of n ≥ 0 disjoint trees.

Graph[edit | edit source]

In computer science, a graph is an abstract data type that is meant to implement the undirected graph and directed graph concepts from mathematics, specifically the field of graph theory.

A graph data structure consists of a finite (and possibly mutable) set of vertices or nodes or points, together with a set of unordered pairs of these vertices for an undirected graph or a set of ordered pairs for a directed graph. These pairs are known as edgesarcs, or lines for an undirected graph and as arrowsdirected edgesdirected arcs, or directed lines for a directed graph. The vertices may be part of the graph structure, or may be external entities represented by integer indices or references.

A graph data structure may also associate to each edge some edge value, such as a symbolic label or a numeric attribute (cost, capacity, length, etc.).

Operations[edit | edit source]

The basic operations provided by a graph data structure G usually include:

  • adjacent(Gxy): tests whether there is an edge from the vertex x to the vertex y;
  • neighbors(Gx): lists all vertices y such that there is an edge from the vertex x to the vertex y;
  • add_vertex(Gx): adds the vertex x, if it is not there;
  • remove_vertex(Gx): removes the vertex x, if it is there;
  • add_edge(Gxy): adds the edge from the vertex x to the vertex y, if it is not there;
  • remove_edge(Gxy): removes the edge from the vertex x to the vertex y, if it is there;
  • get_vertex_value(Gx): returns the value associated with the vertex x;
  • set_vertex_value(Gxv): sets the value associated with the vertex x to v.

Structures that associate values to the edges usually also provide:

  • get_edge_value(Gxy): returns the value associated with the edge (xy);
  • set_edge_value(Gxyv): sets the value associated with the edge (xy) to v.

Representations[edit | edit source]

Different data structures for the representation of graphs are used in practice:

Adjacency list
Vertices are stored as records or objects, and every vertex stores a list of adjacent vertices. This data structure allows the storage of additional data on the vertices. Additional data can be stored if edges are also stored as objects, in which each case vertex stores its incident edges and each edge stores its incident vertices.
Adjacency matrix
A two-dimensional matrix, in which the rows represent source vertices and columns represent destination vertices. Data on edges and vertices must be stored externally. Only the cost for one edge can be stored between each pair of vertices.
Incidence matrix
A two-dimensional Boolean matrix, in which the rows represent the vertices and columns represent the edges. The entries indicate whether the vertex at a row is incident to the edge at a column.

The following table gives the time complexity cost of performing various operations on graphs, for each of these representations, with |V | the number of vertices and |E | the number of edges.[citation needed] In the matrix representations, the entries encode the cost of following an edge. The cost of edges that are not present are assumed to be ∞.

Adjacency list Adjacency matrix Incidence matrix
Store graph
Add vertex
Add edge
Remove vertex
Remove edge
Query: are vertices x and y adjacent? (assuming that their storage positions are known)
Remarks Slow to remove vertices and edges, because it needs to find all vertices or edges Slow to add or remove vertices, because matrix must be resized/copied Slow to add or remove vertices and edges, because matrix must be resized/copied

Adjacency lists are generally preferred because they efficiently represent sparse graphs. An adjacency matrix is preferred if the graph is dense, that is the number of edges |E | is close to the number of vertices squared, |V |2, or if one must be able to quickly look up if there is an edge connecting two vertices