Data Structures and Algorithms/Arrays, Lists and Vectors

Arrays

What is an array

If you have taken the prerequisites to this course or have done any more than the very basics of programming you will have come across arrays. You will know an array is simply an area of memory allocated for a set number of elements of a known size. You can access these elements by their index (position in the array) and also set or retrieve the value stored. Arrays are not to be confused with Associative Arrays, which will be covered in Hash Tables lesson.

Characteristics of an array

An array is always of a fixed size; it does not grow as more elements are required. The programmer must ensure that only valid values in the array are accessed, and must remember the location in the array of each value. Arrays are basic types in most programming languages and have a special syntax for their use.

The computational complexity for writing to and accessing an array is O(1). No matter the number of elements in the array, the calculation to find the element in the array is single multiplication and addition.

element_memory_location = start_memory_location + (size_of_element * index_in_array)

The computational complexity of inserting an element in the middle of an array is O(N), where N is the number of elements in the array. The elements in the array must all be shifted up one index after the insertion, or all the elements must be copied to a new array big enough to hold the inserted element.

  function insert(array, insertion_index, value)
     for (index from array_length to insertion_index - 1)
        array[index] = array[index - 1]
     end {for}
     array[index] = value
  end {function}

The memory usage of an array is simply the number of elements in the array multiplied by the size of each element.

Lists

Linked Lists

A linked list is made up of a linear series of nodes (For non-linear arrangements of nodes, see Trees and Graphs). These nodes, unlike the elements in an array, do not have to be located next to each other in memory in order to be accessed. The reason is that each node contains a link to another node. The most basic node would have a data field and just one link field. This node would be a part of what is known as a singly linked list, in which all nodes contain only a next link. This is different than a doubly linked list, in which all nodes have two links, a next and a previous. While a linked-list API may exist in one's programming library, linked lists are often implemented by the programmer and customized for particular applications.

Characteristics of a Linked List

As mentioned above, a node in a linked list must have a data field and either one or two links. This allows the linked list to grow or shrink in constant O(1) time, as all that needs be done to insert or delete a node is change a few links. This is much better than the time for inserting or deleting an element in an array.

However, the linked list requires linear O(N) time to find or access a node, because there is no simple formula as listed above for the array to give the memory location of the node. One must traverse all links from the beginning until the requested node is reached. If nodes are to be inserted at the beginning or end of a linked list, the time is O(1), since references or pointers, depending on the language, can be maintained to the head and tail nodes. If a node should be inserted in the middle or at some arbitrary position, the running time is not actually O(1), as the operation to get to the position in the list is O(N).

Importantly, linked lists tend to suffer from severe slowdown due to CPU cache misses during traversal, caused by nodes being stored in a non-contiguous fashion. This slowdown is mostly absent when nodes are stored in contiguous memory (as is normal when they are initialized) but worsens when nodes are stored apart from each other (which commonly occurs as more nodes are added later on). This slowdown is often enough to warrant the use of another data structure, although linked lists may still be preferred in cases where data is inserted/deleted frequently and the list is traversed sparingly.

   function '''insert'''(''list, insertion_index, value'')
      ''current'' = ''list''->head
      for (''index'' from 0 to ''insertion_index'')
         ''current'' = ''current''->next
      end {for}
      ''new_node'' = new node
      ''new_node''->value = ''value''
      ''new_node''->next = ''current''->next
      ''current''->next = ''new_node''
      # If this is a doubly-linked list, then also:
      ''new_node''->previous = ''current''
      ''new_node''->next->previous = ''new_node''
   end {function}

Linked lists use more memory than an array containing the same data due to each element not only having a value, but also one or two links.

A simple linked list example.

include <bits/stdc++.h>

using namespace std;

class Node { public:

   int data; 
   Node* next;

};

// This function prints contents of linked list // starting from the given node

void printList(Node* n) {

   while (n != NULL) { 
       cout << n->data << " "; 
       n = n->next; 
   }

}

// Driver code

int main() {

   Node* head = NULL; 
   Node* second = NULL; 
   Node* third = NULL; 
 
   // allocate 3 nodes in the heap

   head = new Node(); 
   second = new Node(); 
   third = new Node(); 
 
   head->data = 1;

   // assign data in first node

   head->next = second; 
   
   // Link first node with second 
 
   second->data = 2;
   
    // assign data to second node 
   
   second->next = third; 
 
   third->data = 3;

   // assign data to third node 
   
   third->next = NULL; 
 
   printList(head); 
 
   return 0;

}

ref:geeksforgeeks.org

Vectors

Vectors are much like arrays. Operations on a vector offer the same big O as their counterparts on an array. Like arrays, vector data is allocated in contiguous memory.

Unlike static arrays, which are always of a fixed size, vectors can be grown. This can be done either explicitly or by adding more data. In order to do this efficiently, the typical vector implementation grows by doubling its allocated space (rather than incrementing it) and often has more space allocated to it at any one time than it needs. This is because reallocating memory is usually an expensive operation.

Vectors are simply arrays which have wrapped grow/shrink functions.