Why should we learn data structures?
Data structures are the core building blocks of algorithms and real-life applications. In terms of an analogy, a well-designed algorithm using data structure is like a good structure of a house. For example, to design a good house, we need to put all the components of a house together based on the principles of good construction techniques. In the same way, a design of an algorithm must be based on a good understanding of data structures:
- Properties and implementation techniques
- Implementation and efficiency of various operations
- Various tradeoffs and comparisons to choose a correct data structure
The maximum benefit from good data structures results from designing algorithms around them in the first place. If we observe closely, changing the data structure does not change the correctness of our code because we replace the correct implementation with another correct implementation.
But different data structures take different times to execute various operations. So we compare the efficiency of operations provided by multiple data structures and choose a correct data structure among them to improve our code performance.
Definition of Data Structure
The data structure is an idea to organize various types of data in memory. In other words, data structures are several ways to organize data in memory to perform several operations efficiently. We use it to manage, process, and get relevant information in an effcient manner.
There will be two primary components in every data structure: data and various operations working on that data. Data is information, and operations are algorithms working on that data to get valuable insights.
Data Structures = Data + Permissible operations on the data
Types of Operations on Data Structures
Operations on any data structure can be grouped into two categories: Query Operations and Modification Operations. These operations are frequent in real-life applications, and we should study the design, code, and analysis of critical operations for each data structure. Even we can encounter many other specific operations during the problem-solving.
- Query operations: These operations will return some information about the data structure S. For example, search(S, k), findMax(S), findMin(S), findSuccessor(S, x), etc., are the query operations on a data structure.
- Modification operations: These operations will update or change the data structure S. For example, insert(S, x), delete(S, x), sort(D), etc., are modification operations on a data structure.
Types and Classification of Data Structures
Based on the use case and properties, we can classify data structures in several ways:
Primitive data structure
The primitive data structure is a fundamental building block of data structures. It is a predefined way of storing data of only one type. In other words, it can hold a single value in a specific memory location. Some examples of primitive data structures are: boolean (true or false), char (16 bits), byte (8bits), short (16 bits), int (32 bits), long (64 bits), float (32 bits), double (64 bits).
- A primitive data structure is full of bits representing the actual value of a variable.
- The size of a primitive data structure depends on its type.
- The programming languages define primitive data structures, or we can say that they are built-in.
- It's the programmer's responsibility to declare and initialize a value of the primitive data structure.
Non-Primitive Data Structures
Non-primitive data structures are derived from primitive data structures. It is a user-defined data structure that stores the data of different types as a single entity with the relationship between each data item.
- It can store more than one type of data in a contiguous or random memory location.
- The programmer defines the non-primitive data structures.
- The size is not fixed. In other words, the size of the data structure depends on the specification of the input.
Based on the order of organizing and accessing data, Non-primitive data structures can be classified into two categories: Linear data structures and Non-Linear data structures.
Linear data structures: Here, elements are arranged or linked linearly with one after another. In other words, elements can be traversed or accessed in a single direction. Examples of linear data structures are array, linked list, stack, and queue.
Non-Linear Data Structure: Here, elements are not arranged sequentially. In other words, each data element can be linked to more than one data element. It exhibits a hierarchical relationship between the elements, and there can be multiple directions to go from one data to another. Examples of linear data structures are tree, graph, set, etc.
Based on the memory representation of data, depending upon whether they are based on arrays or pointers., Non-primitive data structures can be further classified into two categories: Sequential data structures and Linked data structures.
- Sequential data structures: Here, data is stored and organized in a contiguous memory block. Examples of sequential data structures are array, matrix, array implementation of stack, array implementation of queue, heap, Binary Indexed Tree, B-Tree, hashing using open addressing, graph representation using adjacency matrix, etc.
- Linked data structures: Here, data is stored and organized as a distinct block of memory connected via pointers or references. Examples of linked data structures are singly linked list, doubly linked list, linked-list implementation of stack, linked-list implementation of queue, binary tree, binary search tree, trie, adjacency list representation of a graph, a hashing using the chaining method, etc.
Comparison: Sequential vs. Linked Data Structures
- Since sequential data structures can only be stored in contiguous blocks of memory, so its size cannot be altered at runtime. However, in linked data structures, each data element points to the next one such that data can exist at scattered or non-contiguous addresses. This allows for a dynamic size that can change at runtime, and overflow problems never occur unless the memory is full.
- For the same number of elements, linked data structures use more memory because a pointer or reference to the next data element is also stored along with the data.
- In sequential data structures, memory equivalent to the upper limit on the size has to be allocated, but linked data structures can increase their sizes proportionately to the amount of data. In simple words, Linked data structures are useful when there is uncertainty about size or significant variations in the size of data elements.
- Some operations are faster on sequential data structures, and some are faster on linked data structures. For example, insertions and deletions are simpler in the linked data structures.
- Moving pointers is easier and faster with large records than moving the items themselves. But sequential representation allows better memory locality and cache performance than random pointer jumping.
Data Structures as a Dynamic set
Data sets manipulated by algorithms can grow, shrink, or change over time. We call such a data structure dynamic set. From another perspective: algorithms may perform several different types of operations on dynamic sets. The best way to implement a dynamic set depends upon the operations that must be supported.
We can categorize the dynamic set into three categories:
- Containers: We use a container to denote a data structure that allows storage and retrieval of data items independent of content. For example, array, linked list, stack, queue, binary tree, graph, etc., work as a container.
- Dictionaries: Many algorithms need only the ability to insert, delete, and search elements in a set. We call a dynamic set that supports these operations a dictionary. For example, we can implement dictionary operations using an array, linked list, BST, and hash table. Here the hash table is the effcient implementation of a dictionary where each operation works in O(1) time complexity on average.
- Priority Queues: Many algorithms support the operations of searching, inserting, and deleting the element with maximum or minimum priority from a data set. For example, we can implement a priority queue using an array, linked list, BST, and heap. But heap is the effcient implementation of a priority queue where findMin() or findMax() operation works in O(1) time complexity and deleteMax() or deleteMin() operations in O(logn) time complexity.
Note: if there is a requirement in an application to process data using both dictionary and priority queue, then balanced BST could be a perfect choice. Here all operations work in O(logn) time complexity.
Here are some critical ideas related to the dynamic set:
- In a typical implementation of a dynamic set, each data element is represented by an object whose attributes can be examined and manipulated if we have a pointer to the object.
- Some dynamic sets assume that one of the data object attributes is an identifying key. If the keys are all different, we can think of the dynamic set as a set of key-value pairs.
- Some dynamic sets suppose that the keys are drawn from an ordered set, such as the real numbers or the set of all words under the usual alphabetic ordering.
Data Structure as an Abstract Data Type
The basic idea in data structures starts from the study of abstract data types. Abstract Data type (ADT) is a type (or class) for objects whose behavior is defined by a set of values and a set of operations.
- It does not specify how data will be organized in memory and what algorithms will be used for implementing the operations. In other words, abstract data types only provide an interface about what operations can be performed but not how these operations will be implemented.
- We can think of ADT as a black box that hides the inner structure and design of the data type. It is a model of abstraction to provide an implementation-independent view, where a user only needs to know what a data type can do and does not need to know how that data type is implemented.
For example, we may want to maintain a first-in-first-out (FIFO) queue of elements. The required operations are the insertion and deletion of elements from the queue. In case of deletion, the elements must be removed in the same order in which they were inserted. It is more convenient to design the effcient code for these operations and provide an interface to the user without specifying the implementation details. We call the abstract data type that supports these operations a FIFO Queue.
Another example of an abstract data type is a queue where elements are associated with priorities. Here deletion will not happen according to the order of insertions but according to the priorities. That is, the first element to be removed in each step is the item of highest or lowest priority among the elements in the queue. This abstract data type is called a Priority Queue.
So the most crucial part of an abstract data type is a list of operations that we want to support. We concentrate on the critical operations of a data structure and design an effcient algorithm for each operation. In simple words: abstract data types allow us to make the algorithm-design process more modular.
Real-life applications of data structures
Application of array
- Storing any types of sequential records like contacts in our phone, speech signals in speech processing. etc.
- Implementing stack, queue, hash tables, heaps, Fenwick tree, etc.
- Adjacency matrix representation of a graph
Application of linked list
- Maintaining a directory of names
- Performing arithmetic operations on long integers, Manipulation of polynomials
- Implementing of stack and queue
- Adjacency list representation of a graph, Hashing by chaining method
- Dynamic memory allocation
- Implementing LRU Cache and Fibonacci heap
- Thread scheduler in operating systems
- Representing deck of cards game
- Implementing undo and redo functionality in applications
- Previous and next page in a web browser
Circular linked list
- Processes scheduling in the operating system
- Keep track of the turn in a multi-player game
Application of stack
- Storing browser history, UNDO/REDO options in a text editor
- Process scheduling, Static memory allocation
- Storing function calls in recursion
- In IDE or a compiler to know are missing braces
- Expression evaluation, Syntax Parsing
Application of queue
- Process scheduling in operating systems (CPU and IO scheduling)
- Interprocess communications
- Waiting list in Applications
- Customer care calls management
Application of hashing
- Accessing website using keywords in search engines
- Finding phone numbers in mobile devices, Employees Information System
- Spelling checkers in word processing software
- Symbol table in a compiler
- Encrypting critical data in cryptography
Application of tree data structure
- Store hierarchical data like folder structure, organization structure, etc.
- Document Object Model to represent documents with a logical tree
Binary search tree
- Used in applications where continuous insertion and deletion of data is happening
- To implement ordered map and set in languages' libraries.
Heap data structure
- Used in implementing efficient priority queues
- Scheduling processes in operating systems
- Dynamic memory allocation
- Order Statistics like finding kth smallest or largest
Trie data structure
- Autocompletion in google search
- Spell-checking in word processors
- Swipe features in your mobile keypad
- Network browser history of the URL's
- Longest prefix matching used by routers in internet protocols
- Predictive text technology (T9 dictionary in mobile phones)
Advanced tree data structures
- Compiler design for parsing of the syntax (Syntax Tree)
- Wireless networking and a memory allocation (Treap)
- Range query processing (Segment Tree or Binary Index Tree)
- Finding the nearest neighbor to a particular point (K Dimensional Tree)
- To store data on the drive (B-tree)
- 3D computer graphics(BSP tree)
- Fast full-text search in word processors (Suffix tree)
- Database design (B-Tree)
Application of graph data structure
Shortest path algorithm
- Network routing
- Google maps
- Shipment routing in e-commerce
- Robot path planning
Breadth-first search (BFS)
- Find people with a given distance k from a person on social network websites
- Web crawling in a google search
- Finding all neighbor nodes in the peer to peer network like BitTorrent
- Finding neighboring places via GPS navigation
Depth-first search (DFS)
- Detecting cycle in a graph
- Bipartiteness test in a graph
- Finding strongly connected components
- Path-finding and solving maze puzzles
- Topological sorting
- Generating a dependency graph of the dependent modules in IDE build-systems
- Determining the order to create complex database tables with interdependencies
- Determining order to take courses and their pre-requisites to complete a degree.
Other important applications
- Assigning fastest pick-ups to Uber drivers (Hungarian algorithm)
- Facebook’s friend suggestion and graph search algorithm
- Google page ranking algorithm and Knowledge graph
- Resource allocation graph in the operating system
- Transaction graphs in cryptocurrency (Blockchain, which is a large graph)
- Artificial neural networks
- Product recommendation graphs (Amazon recommendation system)
- Algorithms by CLRS
- Algorithm design manual by Skiena
- Algorithms by Robert Sedgewick
Enjoy Learning, Enjoy Thinking!