Longest Substring Without Repeating Characters

Difficulty: Medium, Asked-in: Microsoft, Amazon, Morgan Stanley

Key takeaway: An excellent problem to learn time complexity optimization using sliding window approach. We can solve a lot of problems using similar idea.

Let’s understand the problem

Write a program to find the length of longest substring without repeating characters.

  • Substring is a continuous sub-part of the string, where we aim to determine the maximum substring which has all unique characters.
  • Input string consists of English letters, digits, and symbols.
  • The answer must be continuous substring, not a subsequence of the string.
  • There can be more than one longest substring (without repeating characters) of equal length. So, we just need to return the length of longest substring.

Example 1

Input: str = "abcbbcab", Output: 3

Explanation: There are three substrings of unique characters with the longest length 3: "abc", "bca", and "cab". So we return 3 as an output.

Example 2

Input: "bbbbb", Output: 1

Explanation: There is only one unique character. So output is 1.

Example 3

Input: "enjoyalgorithms", Output: 11

Explanation: "o" is the only repeating character, and the answer is "yalgorithms".

Discussed solution approaches

  • Brute force approach  using two nested loops
  • Using sliding window approach
  • Optimized sliding window approach

Brute force approach  using two nested loops

Solution idea and steps

This is a simple approach where idea is to explore all substrings, and for each substring, we check whether it contains unique characters or not. For string with n characters, there will be n*(n + 1)/2 substrings.

  1. We initialize variable maxLength to keep track of the longest substring with unique characters. maxLength = 0
  2. Now we explore all the possible substrings using two nested loops: outer loop from i = 0 to n - 1 and inner loop from j = i to n - 1. The idea is to pick starting character via outer loop and explore all substrings starting from that character using inner loop.

    • For each substring str[i, j] i.e. starting from index i and ending at index j, we use function areUnique(str, i, j) to check if all characters in the substring are unique or not. It will return true if all characters are unique, otherwise false.
    • If function areUnique(str, i , j) return true, we compare maxLength with length of the substring. If(maxLength > j - i + 1), we update variable maxLength. Note: Length of substring str[i, j] = j - i + 1.
  3. By the end of nested loops, we return value stored in maxLength.

Implementation of function areUnique(str, i, j)

  • To check whether a substring has duplicate characters or not, we use a set visited[256] to store status of each possible character in the substring i.e. either true or false.
  • We go through each character from index i to j using loop. For each character, we check if it is already present in set visited[] or not. If so, we return false. Otherwise, we update the status of that character true in set visited[].
  • By the end of loop, we return true i.e. characters are unique in the substring.

Solution pseudocode

int longestSubstring(string str[], int n)
{
    int maxLength = 0
    for (int i = 0; i < n; i = i + 1)
    {
        for (int j = i; j < n; j = j + 1)
        {
            if (areUnique(str, i, j) == true)
                maxLength = max(maxLength, j - i + 1)
        }
    }
    return maxLength
}

bool areUnique(string str[], int i, int j)
{
    bool visited[256]
    for (int k = i; k < j; k = k + 1)
    {
        if (visited[str[k]] == true)
            return false
        visited[str[k]] = true
    }
    return true
}

Time and space complexity analysis

  • We are exploring all n(n + 1)/2 substrings using nested loops.
  • For each substring, function areUnique(str, i, j) will take O(j - i + 1) time to check substring str[i, j] has unique characters or not.
  • Time complexity = n(n + 1)/2 * O( j - i + 1) = O(n^2) * O( j - i + 1). For upper bound of the worst-case analysis, we consider O( j - i +1) as O(n). Why? Think! For better understanding of such loop analysis, refer to this blog. So overall time complexity in the worst case = O(n^2) * O(n) = O(n^3)
  • We are using few extra variables and constant size set visited[]. Space complexity = O(1).

Using  sliding window technique

Solution idea

Now a critical question is: How can we optimize time complexity of brute force approach? Think! The idea would be to apply sliding window approach and track repetition of characters in each substring i.e., whenever we found repetition in the current window, we slide the window and move to the next substring.

Here is optimization insight: In brute force idea, we repeatedly check a substring starting from character str[i] to see if it has duplicate character or not. We can optimize it further because, during the loop, if substring str[i, j-1] is already checked to have no duplicate characters, then for substring str[i, j], we only need to check if str[j] is already present in substring str[i, j-1] or not.

We can use hash set or extra array to check whether character str[j] is there in substring str[i, j-1] or not, which can be done in O(1) time complexity. How do we implement this? Let's think!

Note: Sliding window is an abstract concept commonly used in array/string problems. A window is a range of elements in the array/string defined by the start and ends indices, where window “slides” in a forward direction by incrementing the left or right end.

Solution steps

  1. We initialize variable maxLength to keep track of the longest substring with unique characters. maxLength = 0
  2. We run an outer loop till i < n to explore substring window starting from character str[i].
  3. Now inside loop, we initialize the right end of window i.e. j = i. We also use an array visited[256] as hash set to store status of characters in the current window [i, j). Initially i = j = 0.
  4. Now we run inner loop to check each substring in window [i, j). This loop will run till j < n && visited[str[j]] == false.

    • If str[j] is not present in window [i, j) i.e. if(visited[str[j]] == false): we mark str[j] in visited[] and keep track of the longest substring length. Now we slide the current window to one right by incrementing j and continue the same process in inner loop till str[j] is not present in window [i, j).
    • If character str[j] is already present or visited in the current window, we increment i by 1 and go to the next iteration of outer loop. Now we check the new window of substring starting from index i + 1. Note: Before moving to the next iteration of outer loop, we should set status of the character str[i] in set visited[] i.e. visited[str[i]] = false. (Think!)
  5. The outer loop will stop when the left end of sliding window reaches the end of string i.e., i = n. At this stage, we return the value stored in variable maxLength.

Solution pseudocode

int longestSubstring(string str[], int n)
{
    int maxLength = 0
    int i = 0
    While(i < n)
    {
        bool visited[256]
        int j = i
        while(j < n && visited[str[j]] == false)
        {
            maxLength = max(maxLength, j - i + 1)
            visited[str[j]] = true
            j = j + 1
        }
        visited[str[i]] = false
        i = i + 1
    }
    return maxLength
}

Time and space complexity analysis

We are using nested loops and performing an O(1) operation at each iteration. The worst-case occur when inner loop runs till the end every time. Or in other words, worst-case will occur when all characters in string are unique. Think! In such situation, nested loop will explore all possible substrings i.e. n(n + 1)/2. Time complexity = n(n + 1)/2 * O(1) = O(n^2).

We are using constant size set visited[] of size 256, so space complexity = O(1).

Optimized sliding window solution using hash table

Solution idea

Now critical questions are: can we optimize the above approach further and solve this problem in O(n) time complexity? Can we think of removing inner loop of the previous approach? Here is a hint:

  • At each iteration of outer loop in the previous approach, we ignore the previous window [i, j) and reset j = i to start a new window. In other words, we are starting with new window size 1 at each iteration of outer loop.
  • If characters in window [i, j) are unique, then characters in window [i + 1, j) will also be unique. So in the next iteration of outer loop, there is no need to start with a new window size 1 or reset j = i (Think!).

Solution steps

  • Scan string from left to right using two window pointers i and j. Also, keep track of the longest length of substring using variable maxLength and maintain a hash table to keep track of visited characters.
  • If character str[j] is not present in hash table, we insert it into hash table and slide the current window right by incrementing pointer j. We also update the longest length of substring with unique characters i.e. maxLength = max (maxLength, j - i).
  • If character str[j] is already present in hash table, we delete character str[j] from hash table and slide the window right by incrementing pointer i.
  • This process will end when any one of the pointers reaches the end of string i.e. i = n or j = n. At this stage, we return value stored in variable maxLength.

Solution pseudocode

int longestSubstring(string str[], int n) 
{
    if (n == 0) 
        return 0
    HashTable H
    int maxLength = 0
    int i = 0, j = 0  
    while (i < n && j < n)
    {
        if (H.search(str[j]) == false)
        {
            H.insert(str[j])
            j = j + 1 
            maxLength = max(maxLength, j - i)
        }
        else
        {
            H.delete(str[i])
            i = i + 1
        }    
    }
    return maxLength
}

Time and space complexity analysis

At each iteration of while loop, we either increment pointer i or pointer j, based on the comparison if(H.search(str[j]) == false). So loop will run n times and perform constant number of operations at each iteration. Time complexity = O(n).

The algorithm requires O(n) space for hash table. Space complexity = O(n)

Critical ideas to think!

  • Can we optimize the efficient approach further and solve it using O(1) extra space? Instead of hash table, can we use a set visited[256] to store status of each character in a substring window?
  • How do we modify the above approaches to print all longest substrings in lexicographic order?
  • Try to explore the precise analysis of brute force approach?
  • Explore patterns of problem-solving using the sliding window.

Comparison of time and space complexities

Longest substring without repeating characters time complexity comparison of solutions

Suggested coding problems to practice

Enjoy learning, Enjoy algorithms!

Share feedback with us

More blogs to explore

Our weekly newsletter

Subscribe to get weekly content on data structure and algorithms, machine learning, system design and oops.

© 2022 Code Algorithms Pvt. Ltd.

All rights reserved.