4 ways to determine if a string has all unique characters

featured image

Given a string, determine if it has all unique characters.

/**
 * Determines if str has unique characters
 * @param {string} str 
 * @returns {boolean}
 */
function isUnique(str) {
    // Fill in your solution here
}

Think it’s easy? Try this input, str = "πŸ‘¨β€πŸ‘¦πŸ‘¨".

TL;DR:

  1. Use nested loops
  2. Sorting and comparing
  3. Use an object cache
  4. Using bitwise operators

Examples:

Long unique string
Input: NzEtem04n2vcokuyI5iYZjlfRC639aX-=#[}?/.!@%
Output: true

Mix of alphabets, numbers and special characters
Input: !-&*$abc456$
Output: false

Uppercase and lowercase alphabets
Input: aAbB
Output: true, as uppercase and lowercase alphabets are unique. i.e. case-sensitive

UTF-16 surrogate pairs. Each character is a pair of one or more 16-bit code units. 'πŸ˜€'.length returns 2!
Input: γΊΏΝΏπŸ˜€πŸ˜‘πŸ‘¨β€πŸ‘¦πŸ‘¨ (ΝΏ is the Greek letter Yot and not the Latin-script J)
Output: true

Strings in JavaScript

A string is a sequence of characters used to represent text. Strings are represented as sequences of UTF-16 code units. Each code unit is 16 bits long. This means there are a maximum of 216, or, 65536 possible characters representable as single UTF-16 code units.

However, the entire Unicode character set is much, much bigger than 65536. The extra characters are stored as surrogate pairs, which are pairs of 16-bit code units that represent a single character. Each Unicode character, comprised of one or two UTF-16 code units, is also called a Unicode code point.

"πŸ˜„".split(""); // ['\ud83d', '\ude04']; Shows the two 16 bit sequences that make up the emoji.
πŸ˜„ is a Unicode code point.

1. Use nested loops

This method compares each character with every other character to check for duplicates.

/**
 * Determines if str has unique characters
 * Compares each character with every other character
 * @param {string} str 
 * @returns {boolean}
 */
function isUnique(str) {
    for (let i = 0; i < str.length; i++) {
        for (let j = i + 1; j < str.length; j++) { // j starts from i + 1
            if (str[i] === str[j]) {
                return false // Immediately return false
            }
        }
    }

    return true
}
Time ComplexityO(n2
Space ComplexityO(1)

There’s more to this simple algorithm than meets the eye. It does not work for surrogate pairs. console.log(isUnique('πŸ˜€πŸ˜‘')) returns false!

Why does this happen?

Each emoji is a pair of 16-bit units. So in memory, str ='πŸ˜€πŸ˜‘' is stored as,

str[0]str[1]str[2]str[3]
\uD83D\uDE00\uD83D\uDE11
πŸ˜€ = str[0] and str[1]. πŸ˜‘ = str[2] and str[3]

str[0] is equal to str[2], so the function returns false.

Fixing this bug is not easy. One way would be to convert the string to an array before iterating over it. In fact, Array.from('πŸ˜€πŸ˜‘') does split them correctly into, ['πŸ˜€', 'πŸ˜‘'].

But this fails for other emojis that are compositions of multiple emojis,

// "Family: Man, Boy"
Array.from("πŸ‘¨β€πŸ‘¦"); // ['πŸ‘¨', '‍', 'πŸ‘¦']
"πŸ‘¨β€πŸ‘¦".length; // 5
// splits into the "Man" and "Boy" emoji, joined by a ZWJ, \u200d character

These sequences of Unicode characters, like emojis, that should be treated as one visual unit, are known as grapheme clusters. Iterating through strings containing grapheme clusters will require custom code. Loops like, for, for-of etc will fail for composed emojis.

Length of a tweet on Twitter/X

On X, each tweet can contain up to 280 characters. However, if you’re using emojis like smileys πŸ˜€, each one takes up two characters because they are composed of two UTF-16 code units. Therefore, despite the 280 character limit, you can only fit in 140 smileys!


cracking the coding interview
Don’t miss out on the book everyone is talking about.
Grab your copy before it’s gone!

2. Sorting and Comparing

By sorting the string, we can check for duplicates by comparing each character with its next neighbour.

/**
 * Determines if str has unique characters
 * Converts string to an array, sorts and then compares
 * @param {string} str 
 * @returns {boolean}
 */
function isUnique(str) {
    str = Array.from(str).sort() // Convert to array and sort

    for (let i = 0; i < str.length - 1; i++) {
        if (str[i] === str[i + 1]) {
            return false
        }
    }

    return true
}
Time ComplexityO(n log n) 
Space ComplexityO(n). The size of the array grows linearly with the input string length.

sort vs toSorted Array methods

sort sorts elements of an array in-place. i.e. modifies the input array.
const a = [3, 2, 1]; a.sort(); console.log(a); Logs [1, 2, 3]

toSorted sorts elements and returns a new array without modifying the original.
const a = [3, 2, 1]; a.toSorted(); console.log(a); Logs [3, 2, 1]

3. Using an Object Cache

We traverse the string and use an object to keep track of characters we’ve seen.

/**
 * Determines if str has unique characters
 * Uses a cache to remember characters seen already
 * @param {string} str 
 * @returns {boolean}
 */
function isUnique(str) {
    const cache = {}
    /**
     * It's important to use a for-of loop here as
     * it iterates over Unicode code points
     */
    for (const c of str) {
        if (cache[c]) {
            return false
        }
        cache[c] = true // Remember that this character is processed
    }

    return true
}
Time ComplexityO(n)
Space ComplexityO(1). The size of the cache is constant. In the worst case, it has one entry for each character.
Its size does not increase beyond this limit and is independent of input size.

Note that this solution fails for the string, “πŸ‘¨β€πŸ‘¦πŸ‘¨” as the for-of loop iterates over Unicode code points. πŸ‘¨β€πŸ‘¦ is composed of 2 Unicode code points and a ZWJ character, so the cache will hold, { 'πŸ‘¨': true, '‍': true, 'πŸ‘¦': true } after the first emoji is processed.


introduction to algorithms
Join over a million readers in mastering algorithms with the classic,
must-have guideβ€”your definitive bible for learning.

4. Using Bitwise Operators

Similar to the previous method, we remember characters that are seen while traversing the string. But this time, we allocate a single bit to each character.

Shift operators return a 32-bit number. So we have 32 bits of memory. Associate each character with a one of the 32 bits. The algorithm works like this,

  1. Checking for Seen Characters: When encountering a character, check if it has been seen before. This is done by bitwise AND operation with a bit mask corresponding to that character’s bit position.
MemoryBit (Incoming char)Memory & BitDuplicate character
010001000100Yes
010000100000No
  1. Marking Seen Characters: If the character is new (not seen before), mark its corresponding bit in the memory. The left shift operator << comes in handy here. Say we want to mark the 10th bit in memory as 1, we first create a bit mask, bit = 1 << 9; and then store it in memory using bitwise OR, memory = memory | bit;
MemoryBitNew Memory = Memory | Bit.
100000101010

This method effectively manages memory using just 32 bits, suitable for scenarios with a limited character set like lowercase English alphabets [a-z].

/**
 * Determines if str has unique characters
 * Uses bitwise operators to remember characters seen by 
 * setting bits in a 32 bit number
 * @param {string} str 
 * @returns {boolean}
 */
function isUnique(str) {
    /**
     * codePointAt returns a non-negative integer that is the Unicode code point 
     * value of the character
     */
    const codePointOfa = 'a'.codePointAt(0)
    let memory = 0

    for (const c of str) {
        // Subtract from "a" to get values between 0 and 25
        const codePoint = c.codePointAt(0) - codePointOfa 

        const bit = 1 << codePoint

        if ((memory & bit) > 0) {
            return false
        }

        memory = memory | bit // Save in memory
    }

    return true
}
Time ComplexityO(n)
Space ComplexityO(1). The size of the memory variable is fixed to 32 bits.

This algorithm can be extended to more characters by using a different number for each bucket of 32 characters.

Conclusion

Determining if a string has all unique characters can vary in complexity depending on the character set and the handling of special characters like emojis and grapheme clusters. Understanding these nuances ensures accurate and efficient solutions.


Your time is limited, so don’t waste it living someone else’s life.

Steve Jobs

0 0 votes
Article Rating
guest
3 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Anonymous
Anonymous
7 months ago

Book recommendation is good.

Anonymous
Anonymous
7 months ago

Cover pic is perfect πŸ‘Œ

Anonymous
Anonymous
7 months ago

Waiting for next blog. Keep writing