What is Data Structure? Types, Classification and Applications

Why should we learn data structures?

Data structures are the core building blocks of algorithms and real-life applications. In terms of an analogy, a well-designed algorithm using data structure is just like a structure of a good house. For example, to design a good house, we need to put all components of a house together based on principles of good construction techniques. In the same way, design of an algorithm must be based on a good understanding of data structures:

Properties and implementation techniques
Implementation and efficiency of various operations
Various tradeoffs and comparisons to choose a correct data structure

The maximum benefit from good data structures results from designing algorithms around them in the first place. If we observe closely, changing the data structure does not change the correctness of our code because we replace a correct implementation with another correct implementation.

But different data structures take different times to execute various operations. So we compare the efficiency of operations provided by multiple data structures and choose the correct data structure among them to improve our code performance.

Updating an efficient data structure in a slow program can work in the same way an organ transplant does in a sick patient.

Definition of data structure

The data structure is an idea to organize various types of data in memory. In other words, data structures are several ways to efficiently organize data in memory to perform several operations. We use it to manage, process, and efficiently get relevant information.

There will be two primary components in every data structure: data and various operations working on that data. Data is information, and operations are algorithms working on that data to get valuable insights.

Data Structures = Data + Permissible operations on the data

Types of operations on data structures

Operations on any data structure can be grouped into two types: Query Operations and Modification Operations. These operations are frequent in real-life applications, and we should study the design, code, and analysis of critical operations for each data structure.

Query operations: We perform this operation to get some information available in a data structure. For example, search(S, k), findMax(S), findMin(S), findSuccessor(S, x), etc., are the query operations on a data structure S.
Modification operations: We perform these operations to update or change information stored in a data structure. For example, insert(S, x), delete(S, x), sort(D), etc., are modification operations on a data structure S.

Types and classification of data structures

We can classify data structures in several ways based on the use case and properties.

Primitive data structures

Primitive data structure is a fundamental building block of data structures. This is a predefined way of storing data of only one type. In other words, this can hold a single value in a specific memory location. Some examples of primitive data structures are: boolean (true or false), char (16 bits), byte (8bits), short (16 bits), int (32 bits), long (64 bits), float (32 bits), double (64 bits).

A primitive data structure is full of bits representing the actual value of a variable.
The size of a primitive data structure depends on its type.
The programming languages define primitive data structures, or we can say that they are built-in.
It's programmer responsibility to declare and initialize a value of primitive data structure.

Non-primitive data structures

Non-primitive data structures are derived from primitive data structures. This is a user-defined data structure that stores data of different types as a single entity with a relationship between each data item.

It can store more than one type of data in a contiguous or random memory location.
It's programmer responsibility to define non-primitive data structures.
The size of non-primitive data structure is not fixed. In other words, the size of such data structures depends on the specification of input.

Based on the order of organizing and accessing data, non-primitive data structures can be classified into two categories: Linear data structures and Non-linear data structures.

Linear data structures: Here, elements are arranged or linked linearly with one after another. In other words, elements can be traversed or accessed in a single direction. Examples of linear data structures are array, linked list, stack, and queue.

Non-linear data structure: Here, elements are not arranged sequentially. In other words, each data element can be linked to more than one data element. It can exhibit a hierarchical relationship between elements, and there can be multiple directions to go from one data to another. Examples of non-linear data structures are tree, graph, set, etc.

Based on the memory representation of data, depending upon whether they are based on arrays or pointers, non-primitive data structures can be further classified into sequential and linked data structures.

Sequential data structures: Here, data is stored and organized in a contiguous memory block. Examples of sequential data structures are array, matrix, array implementation of stack, array implementation of queue, heap, binary indexed tree, B-Tree, hashing implementation using open addressing, graph representation using adjacency matrix, etc.
Linked data structures: Here, data is stored and organized as a distinct block of memory connected via pointers or references. Examples of linked data structures are singly linked list, doubly linked list, linked-list implementation of stack, linked-list implementation of queue, binary tree, binary search tree, trie, adjacency list representation of a graph, hashing implementation using chaining method, etc.

Comparison: Sequential vs Linked Data Structures

Since sequential data structures can only be stored in contiguous blocks of memory, so its size cannot be altered at runtime. However, in linked data structures, each data element points to the next one such that data can exist at scattered or non-contiguous addresses. This allows for a dynamic size that can change at runtime, and overflow problems never occur unless memory is full.
For the same number of elements, linked data structures use more memory because a pointer or reference to the next data element is also stored along with data.
In sequential data structures, memory equivalent to the upper limit on the size has to be allocated, but linked data structures can increase their sizes proportionately to the amount of data. In simple words, linked data structures are useful when there is uncertainty about size or significant variations in the size of data elements.
Some operations are faster on sequential data structures, and some are faster on linked data structures. For example, insertions and deletions are simpler in the linked list.
Moving pointers is easier and faster with large records than moving items themselves. But sequential representation allows better memory locality and cache performance than random pointer jumping.

Data structures types and classification

Data structures as a dynamic set

Data sets manipulated by algorithms can grow, shrink, or change over time. We call such a data structure dynamic set. From another perspective: algorithms may perform several different operations on dynamic sets. The best way to implement a dynamic set depends upon the operations that must be supported.

We can categorize the dynamic set into three categories:

Containers: We use a container to denote a data structure that allows storing and retrieving data items independent of content. For example, array, linked list, stack, queue, binary tree, graph, etc., work as a container.

Dictionaries: Many algorithms need only the ability to insert, delete, and search elements in a set. We call a dynamic set that supports these operations a dictionary. For example, we can implement dictionary operations using an array, linked list, BST, and hash table. Here hash table is the efficient implementation of a dictionary where each operation works in O(1) time on average.

Priority queues: Many algorithms support the operations of searching, inserting, and deleting the element with maximum or minimum priority from a data set. For example, we can implement a priority queue using an array, linked list, BST, and heap. Here, heap provide an efficient implementation of priority queue where findMin() or findMax() operation works in O(1) time complexity and deleteMax() or deleteMin() operations in O(logn) time complexity.

Note: If an application requires processing data using both dictionary and priority queue, then balanced BST could be a perfect choice. Here all operations work in O(logn) time complexity.

Data structure as an Abstract Data Type

The basic idea of data structures starts from studying abstract data types. Abstract Data type (ADT) is a class whose behaviour is defined by a set of values and operations.

Abstract data types only provide an interface about what operations can be performed but not how these operations will be implemented. In other words, this does not specify how data will be organized in memory and what algorithms will be used for implementing the operations.

From another perspective, we can think of ADT as a black box that hides the inner structure and design of the data type. It is a model of abstraction to provide an implementation-independent view, where a user only needs to know what a data type can do and does not need to know how that data type is implemented.

For example, we may want to maintain a first-in-first-out (FIFO) queue of elements. The required operations are the insertion and deletion of elements from the queue. In case of deletion, the elements must be removed in the same order in which they were inserted. It is more convenient to design efficient code for these operations and provide an interface to the user without specifying implementation details. We call the abstract data type that supports these operations a FIFO Queue.

Another example of an abstract data type is a queue where elements are associated with priorities. Here deletion will not happen according to the order of insertions but according to the priorities. That is, the first element to be removed in each step is the item of highest or lowest priority among elements in the queue. This abstract data type is called a Priority Queue.

So the most crucial part of an abstract data type is a list of operations that we want to support. We concentrate on critical operations of data structure and design an efficient algorithm for each operation. In simple words: abstract data types allow us to make the algorithm-design process more modular.

What is abstract data type in data structures?

Real-life applications of data structures

Application of array

Storing any types of sequential records (Like contacts in our phone, speech signals in speech processing. etc.), Implementing stack, queue, hash tables, heaps, Fenwick tree, adjacency matrix representation of a graph, etc.

Application of linked list

Singly-linked list: Maintaining a directory of names, Performing arithmetic operations on long integers, Manipulation of polynomials, Implementing stack and queue, Adjacency list representation of a graph, Hashing by chaining method, Dynamic memory allocation, etc.

Doubly-linked list: Implementing LRU Cache and Fibonacci heap, Thread scheduler in operating systems, Representing deck of cards game, Implementing undo and redo functionality in applications, Previous and next page in a web browser, etc.

Circular linked list: Processes scheduling in the operating system, Keep track of the turn in a multiplayer game, etc.

Application of stack

Storing browser history, UNDO/REDO options in a text editor, Process scheduling, Static memory allocation, Storing function calls in recursion, Identify missing braces in IDE or a compiler, Expression evaluation, Syntax parsing, etc.

Application of queue

Process scheduling in operating systems (CPU and IO scheduling), Interprocess communications, Waiting list in Applications, Customer care calls management

Application of hash table

Accessing websites using keywords in search engines, Finding phone numbers on mobile devices, Employees information system, Spelling checkers in word processing software, Symbol table in a compiler, Load balancing, Database partitioning, Encrypting critical data in cryptography, etc.

Application of tree

Binary tree: Storing hierarchical data (folder structure, organization structure, etc.), Document Object Model to represent documents with a logical tree, etc.

Binary search tree: Used in applications where continuous insertion and deletion of data is happening, Implementing ordered map and set in programming languages libraries, etc.

Heap: Implementing priority queues, Scheduling processes in operating systems, Dynamic memory allocation, Order Statistics like finding kth smallest or largest, etc.

Trie: Autocompletion in google search, Spell-checking in word processors, Swipe features in mobile keypad, Network browser history of the URLs, Longest prefix matching used by routers in internet protocols, Predictive text technology (T9 dictionary in mobile phones), etc.

Advanced tree data structures

Compiler design for parsing of the syntax (Syntax Tree), Wireless networking and a memory allocation (Treap), Range query processing (Segment tree or binary index tree), Finding the nearest neighbour to a particular point (K dimensional tree), Storing data on the drive (B-tree), 3D computer graphics (BSP tree), Fast full-text search in word processors (Suffix tree), Database design (B-Tree), etc.

Application of graph

Shortest path algorithm: Network routing , Google maps, Shipment routing in e-commerce, Robot path planning, etc.

Breadth-first search (BFS): Finding people within a given distance k from a person on social network websites, Web crawling in a google search, Finding all neighbour nodes in the peer-to-peer network like BitTorrent, Finding neighbouring places via GPS navigation, etc.

Depth-first search (DFS): Detecting cycle in a graph, Bipartiteness test in a graph, Finding strongly connected components, Path-finding and solving maze puzzles, Topological sorting, etc.

Topological sorting: Generating a dependency graph of the dependent modules in IDE build-systems, Determining the order to create complex database tables with interdependencies, Determining order to take courses and their prerequisites to complete a degree, etc.

Other important applications of graph

Assigning the fastest pick-ups to Uber drivers (Hungarian algorithm), Facebook’s friend suggestion and graph search algorithm, Google page ranking algorithm and Knowledge graph, Resource allocation graph in the operating system, Transaction graphs in cryptocurrency (Blockchain, which is a large graph), Artificial neural networks, Product recommendation graphs (Amazon recommendation system), etc.

References

Algorithms by CLRS
Algorithm design manual by Skiena
Algorithms by Robert Sedgewick

Enjoy learning, Enjoy algorithms!

Data Structure Types, Classification and Real-life Applications