Search for repeated characters in a string in Go

The brute force method:

Run two loops with variables i and j. Compare str[i] and str[j]. If they become equal at any point, return false.

For characters which are represented by one byte in UTF-8.

func hasUniqueCharacters1(str string) (string, bool) {
	for i := 0; i < len(str); i++ {
		for j := i + 1; j < len(str); j++ {
			if str[i] == str[j] {
				return str, false
			}
		}
	}
	return str, true
}

To support Unicode, you must convert the string to []rune

func hasUniqueCharacters2(str string) (string, bool) {
	chars := []rune(str)
	length := len(chars)
	for i := 0; i < length; i++ {
		for j := i + 1; j < length; j++ {
			if chars[j] == chars[i] {
				return str, false
			}
		}
	}
	return str, true
}

Sorting values

To avoid having to implement methods of the sort.Interface, you can use the ready-made sortutil package.

func hasUniqueCharacters3(str string) (string, bool) {
	chars := sortutil.RuneSlice(str)
	length := len(chars)
	sort.Sort(chars)

	for i := 0; i < length-1; i++ {
		if chars[i] == chars[i+1] {
			return str, false
		}
		continue
	}
	return str, true
}

A variant of sorting without implementing interface methods.

func hasUniqueCharacters4(str string) (string, bool) {
	chars := []rune(str)
	length := len(chars)
	sort.Slice(chars, func(i int, j int) bool {
		return chars[i] < chars[j]
	})
	for i := 0; i < length-1; i++ {
		if chars[i] == chars[i+1] {
			return str, false
		}
		continue
	}
	return str, true
}

Using an additional data structure

Using big.Int, to store a set of bits:

func hasUniqueCharacters5(str string) (string, bool) {
	var bits big.Int
	for _, char := range str {
		val := int(char)
		if bits.Bit(val) != 0 {
			return str, false
		}
		bits.SetBit(&bits, val, 1)
	}
	return str, true
}

Using map with boolean values

func hasUniqueCharacters6(str string) (string, bool) {
	var bits = make(map[int]bool, len(str))
	for _, char := range str {
		val := int(char)
		if bits[val] {
			return str, false
		}
		bits[val] = true
	}
	return str, true
}

If you need only ASCII values (8 bits).

We need a slice with logical values of a given length – 256. By default all elements of the slice will be equal to false.

const MAX_CHAR = 256

func hasUniqueCharacters7(str string) (string, bool) {
	// If the size is larger than 256,
	// then any characters must have been repeated
	if len(str) > MAX_CHAR {
		return str, false
	}

	chars := make([]bool, MAX_CHAR)
	for _, char := range str {
		val := int(char)
		if chars[val] {
			return str, false
		}
		chars[val] = true
	}
	return str, true
}

Using bitwise operators

Suitable for characters in the range a-z in the ASCII table.

Instead of using arrays, let’s take an int value.

Each bit in an int value can be treated as a flag, so the int ends up being an array of bits (a flag).

int has a fixed size, usually 4 bytes, which means 8 * 4 = 32 bits (flags).

When we iterate over the string, we find the int value of the character with respect to ‘a’.

Then a bit in that int value is set to 1 by a bitwise shift to the left 1 << val .

Then we check if a bit is already set in this value, using the AND & operator. Return false.

The bit mask. 10111011 & 00001000 = 00001000

If a bit is not set in the given place, we update the checker value, using operator OR |

Merging. 00000001 | 00001000 = 00001001

func hasUniqueCharacters8(str string) (string, bool) {
	checker := 0
	for _, char := range str {
		index := char - 'a'
		val := 1 << index
		if checker & val > 0 {
			return str, false
		}
		checker |= val
	}
	return str, true
}

Tests

In tests, the string abcdefghijklmnopqrstuvwxyz is passed to the function arguments.

go test -bench BenchmarkCountLetters -benchtime=10000000x -benchmem
goos: windows
goarch: amd64
pkg: benching
cpu: Intel(R) Core(TM) i5-10210U CPU @ 1.60GHz
BenchmarkCountLetters/Test_1-8          10000000               174.0 ns/op             0 B/op          0 allocs/op
BenchmarkCountLetters/Test_2-8          10000000               310.7 ns/op             0 B/op          0 allocs/op
BenchmarkCountLetters/Test_3-8          10000000               619.9 ns/op           136 B/op          2 allocs/op
BenchmarkCountLetters/Test_4-8          10000000               756.2 ns/op           168 B/op          3 allocs/op
BenchmarkCountLetters/Test_5-8          10000000               400.2 ns/op            48 B/op          1 allocs/op
BenchmarkCountLetters/Test_6-8          10000000              1653 ns/op             459 B/op          3 allocs/op
BenchmarkCountLetters/Test_7-8          10000000                37.31 ns/op            0 B/op          0 allocs/op
BenchmarkCountLetters/Test_8-8          10000000                55.49 ns/op            0 B/op          0 allocs/op

The code and performance tests can be found on the github.