Introduction to Data Structures

Why should we learn data structures?

The code of a well-designed algorithm using data structure is just like a structure of a good house. For designing a good house, one has to put all the components of a house together to make it a comfortable living place. To do that, one needs a good knowledge of the principles of construction techniques. In the same way, a design of an algorithm must be based on a good understanding of data structures: properties, structure, implementation techniques, and efficiency of its critical operations. The fact is: Data structures are the core building blocks of algorithms and real-life applications. Think! 

The maximum benefit from good data structures results from designing algorithms around them in the first place. One of the questions that we face most of the time: how to choose a data structure for the efficient implementation of an algorithm? If we observe closely, changing the data structure does not change the correctness of an algorithm because we replace the correct implementation with another correct implementation. But different data structures take different times to execute various operations that help us improve the algorithm performance. It is just like a patient in need of an organ transplant where only one part needs to be replaced to fix the problem. Think!

We expect that some of the learners have already studied data structures to some basic level. Later, we will share a separate comprehensive blog on each data structure.

Primitive Data Structures

Primitive data structure predefined ways of storing data that can hold a single value in a specific location. Primitive types are: boolean(true or false), char(16 bits), byte(8bits), short(16 bits), int(32 bits), long(64 bits), float(32 bits), double(64 bits). A primitive variable is full of bits representing the actual value of a variable.

  • It can be operated directly on the data and machine instructions.
  • The programming languages define primitive data structures, or we can say that it is built-in.
  • Its the programmer's responsibility to provide value to the primitive data structure.
  • The primitive data structure is a kind of data structure that stores data of only one type. The size depends on the type of data structure. The primitive data structure can be used to call the methods.
  • Depending on the language and its implementation, primitive data types may or may not have a one-to-one correspondence with objects in the computer's memory.

Non-Primitive Data Structures

Non-primitive data structures are derived from primitive data structures. They emphasize grouping the same or different data items with the relationship between each data item.

  • The non-primitive data structure is a kind of data structure that can hold multiple values either in a contiguous or random location.
  • The non-primitive data structure is a type of data structure that can store data of more than one type.
  • In the case of a non-primitive data structure, size is not fixed.
  • The non-primitive data types are defined by the programmer.
  • Non-primitive data structures are classified into two categories: Linear and Non-Linear data structures

Linear Data Structures

Data structures where elements are arranged linearly and linked one after another are called linear data structures. Here elements can be traversed or accessed from one time(Single run).

  • Array
  • Linked List
  • Stack
  • Queue

Non-Linear Data Structure

Data structures in which elements are not arranged sequentially are called non-linear data structures. Here data elements are attached to more than one element exhibiting the hierarchical relationship between the elements and basic operations like traversal, insertion, and deletion are not done sequentially.

  • Tree: Binary Tree, Binary Search Tree, Heap, Trie, Segment Tree, B-Tree, etc.
  • Graph: Directed Graph, Undirected Graph

Sequential vs Linked Data Structure

Another simple classification of data structures is the contiguous structure and linked structure, depending upon whether they are based on arrays or pointers.

  • Sequential Data Structures: It is a continuous block of memory like an array, matrix, array-based implementation of stack, array-based implementation of queue, heap, array-based tree implementation, a hash table implementation using the open addressing method, graph using the adjacency matrix representation, etc.
  • Linked Data Structures: Distinct block of memory connected via pointers like a linked list, linked-list implementation of stack, linked-list implementation of queue, binary tree, binary search tree, trie, graph using the adjacency list representation, a hash table implementation using the chaining method, etc.

Sequential vs. Linked Data Structures

  • Linked data structures require extra space for storing pointers and do not allow random access to elements.
  • Insertions and deletions are simpler in the linked data structures.
  • Overflow problems never occur on linked data structures unless the memory is full.
  • With large records, moving pointers is easier and faster than moving the items themselves.
  • Arrays allow better memory locality and cache performance than random pointer jumping.

Data Structures as a Dynamic set

Data sets manipulated by algorithms can grow, shrink, or change over time. We call such a data structure dynamic set. Algorithms may require several different types of operations to be performed on sets. The best way to implement a dynamic set depends upon the operations that must be supported.

Operations on any data structure can be grouped into two categories: Query Operations, Modification Operations. These operations are most frequent in real-life applications, and we should study the design, code, and analysis of each operation for each data structure. If preparing for an interview, then you could encounter many other specific operations during the problem-solving.

  • Query Operations: These operations will return some information about the data structure. For example search(S, k), findMax(S), findMin(S), findSuccessor(S, x), etc. are the query operations on a dynamic set.
  • Modification Operations: These operations will update or change the data structure. For example, insert(S, x), delete(S, x), sort(D) etc. are modification operations on a dynamic set.

We can also categories the dynamic set into three categories:

  • Containers: We use the term container to denote a data structure that allows storage and retrieval of data items independent of content. For example, array, linked list, stack, queue, binary tree, graph, etc, work as a container.
  • Dictionaries: many algorithms need only the ability to insert elements into, delete elements from, and search in a set. We call a dynamic set that supports these operations a dictionary. For example, we can implement a dictionary using an array, linked list, BST, and hash table. Here the hash table is the effcient implementation of a dictionary where each operation works in O(1) time complexity on average.
  • Priority Queues: many algorithms support the operations of searching, inserting, and deleting the element with maximum or minimum priority from a data set. For example, we can implement a priority queue using an array, linked list, BST, and heap. Here the heap is the effcient implementation of a priority queue where findMin() or findMax() operation works in O(1) time complexity and delete and insert operations in O(logn) time complexity.

Note: if there is a requirement in an application to process data using both dictionary and priority queue, then balanced BST could be a perfect choice. Here all operations work in O(logn) time complexity.

Here are some critical ideas related to the dynamic set:

  • Elements of a dynamic set In a typical implementation of a dynamic set, each element is represented by an object whose attributes can be examined and manipulated if we have a pointer to the object.
  • Some kinds of dynamic sets assume that one of the object’s attributes is an identifying key. If the keys are all different, we can think of the dynamic set as being a set of key values. The object may contain satellite data, which are carried around in other object attributes but are otherwise unused by the set. It may also have attributes that are manipulated by the set operations; these attributes may contain data or pointers to other objects in the set.
  • Some dynamic sets presuppose that the keys are drawn from a totally ordered set, such as the real numbers, or the set of all words under the usual alphabetic ordering. A total ordering allows us to define the minimum element of the set, for example, or to speak of the next element larger than a given element in a set.

Data Structure as an Abstract Data Type

The basic idea in data structures starts from the study of an abstract data type. Usually, when we write a program, we need to specify the data type like integers, reals, characters, etc. But, in some cases, the data type is not critical for designing an effcient algorithm.

For example, we may want to maintain a first-in-first-out (FIFO) queue of elements. The required operations are the insertion and deletion of elements from the queue. In case of deletion, the elements must be removed in the same order in which they were inserted. It is more convenient to design the effcient code for these operations without specifying the data type of the elements. We only need to specify the required operations. We call the abstract data type that supports these operations a FIFO Queue.

Another example of an abstract data type is a queue where elements are associated with priorities. Here deletion will not happen according to the order of insertions but according to the priorities. That is, the first element to be removed in each step is the item of highest priority among the elements in the queue. This abstract data type is called a Priority Queue. Again, we do not specify the data type of the items.

So the most important part of an abstract data type is a list of operations that we want to support. We concentrate on the critical operations of a data structure and design an effcient algorithm for each operation. In simple words: abstract data types allow us to make the algorithm-design process more modular.

Real-life applications of data structures

Application of array 

  • Arranging a particular type of data in a sequential arrangement: storing contacts on our phone, storing speech signals in speech processing, etc.
  • Implementing of Stack and Queue, Adjacency matrix representation of Graphs, Implementing hash tables and heaps.

Application of linked list

  • Singly-linked list: Maintaining a directory of names, Performing arithmetic operations on long integers, Manipulation of polynomials, Implementing of Stack and Queue, Adjacency list representation of Graphs, Hashing by chaining method, Dynamic memory allocation
  • Doubly-linked list: Implementing LRU Cache, Implementing Fibonacci heap, Thread scheduler in operating systems, Representing deck of cards game, Implement undo and redo functionality in applications, Previous and next page in a web browser.
  • Circular linked list: Processes scheduling in the operating system, Keep track of the turn in a multi-player game

Application of stack

  • Storing browser history, UNDO/REDO options in a text editor
  • Process scheduling, Static memory allocation
  • Storing function calls in recursion
  • In IDE or a compiler to know are missing braces
  • Expression evaluation, Syntax Parsing

Application of queue

  • Process scheduling in operating systems (CPU and IO scheduling)
  • Interprocess communications
  • Waiting list in Applications
  • Customer care calls management

Application of hashing

  • Accessing website using keywords in Search Engines
  • Finding phone numbers in mobile devices, Employees Information System
  • Spelling checkers in word processing software, Symbol table in a compiler
  • Encrypting critical data(Cryptography)

Application of tree data structure

  • Binary tree: Store hierarchical data, like folder structure, organization structure, File Systems in Linux, Document Object Model to represent documents with a logical tree
  • Binary search tree: Used in applications where continuous insertion and deletion of data is happening, To implement map and set objects in languages' libraries.
  • Trie data structure: Autocompletion in google search, spell-checking in word processors, Contact search on a phone book, swipe features in your mobile keypad, Network browser history of the URL's, Longest prefix matching used by routers in internet protocols, Predictive text technology (T9 dictionary in mobile phones)
  • Heap data structure: Used in implementing efficient priority queues, scheduling processes in operating systems, Dynamic memory allocation, Order Statistics
  • Advanced data structures: Compiler design for parsing of the syntax (Syntax Tree), Wireless networking and a memory allocation (Treap), Range query processing on a data set (Segment Tree or Binary Index Tree), finding the nearest neighbor to a particular point (K Dimensional Tree), To store data on the drive (B-tree), 3D computer graphics(BSP tree), Fast full-text search in word processors (Suffix tree), Database design (B-Tree)

Application of graph data structure

  • Shortest path algorithm: Network routing protocols, Google maps, Shipment routing in e-commerce, Robot path planning
  • Breadth-first search (BFS): Social network websites to find people with a given distance k from a person, Web crawling in a google search (BFS), Finding all neighbor nodes in the peer to peer network like BitTorrent (BFS), Network Broadcasting, Finding neighboring places via GPS navigation
  • Depth-first search (DFS): Detecting cycle in a graph, Bipartiteness test in a graph, Finding strongly connected components, Solving maze puzzles, Topological sorting, Path-finding 
  • Topological sorting: Used in many different scenarios that involve scheduling several tasks with inter-dependencies. For example, 1) Generates a dependency graph of the dependent modules in IDE Build-Systems 2) Determining order to create complex database tables with interdependencies 3) Determining order to take courses and their pre-requisites to complete a degree.
  • Other important applications: Assigning fastest pick-ups to Uber drivers (Hungarian algorithm), Facebook’s Friend Suggestion algorithm, Google Page Ranking algorithm where web pages are considered to be the vertices, Resource Allocation Graph (Operating System), Transaction graphs in cryptocurrency (Blockchain, which is a large graph), Artificial neural networks, Facebook Graph Search, Google Knowledge Graph, Product Recommendation Graphs (Amazon Recommendation System)

References

  • Algorithms by CLRS
  • Algorithm design manual by Skiena
  • Algorithms by Robert Sedgewick

Enjoy Learning, Enjoy Thinking!

We welcome your comments

Subscribe Our Newsletter

Get well-designed application and interview centirc content on ds-algorithms, machine learning, system design and oops. Content will be delivered weekly.