Algorithms

The Bloom filter

In many software engineering problems, we have a set and need to determine if some value belongs to this set. If the possible maximum set cardinality (size; maximum size = total count of elements we consider) is small, the solution is straightforward: just store the set explicitly (for instance, in form of a RB-tree), update it when necessary and check if the set contains elements that we are interested in. But what if maximum set cardinality is large or we need many such sets to operate simultaneously? Or if the set membership test is an expensive operation?

Suppose we want to know if an element belongs to a set. We have decided that it is acceptable to get false positive answers (the answer is “yes”, but the element is not actually in the set) with probability p and not acceptable to get false negative (the answer is “no”, but the element in actually in the set). The data structure that could help us in this situation is called the Bloom filter.

A Bloom filter (proposed by Burton Howard Bloom in 1970) is a bit array of m bits (initially set to 0) and k different hash functions. Each hash function maps a value into a single integer number.

Look at this picture from Wikipedia:

Bloom filter

Finding a cycle in a linked list

There is a popular task at software developer job interviews: having a singly linked list, write a piece of code which tells if the list has a cycle.

In a linked list, each element is a structure which contains the value of an element and the link to the next element. The next-link of the last element has a special value which marks the end (usually, null). If a list has a cycle, the last element points to some element inside the list.

Linked list types

Z algorithm

Introduction

There are some algorithms of exact substring searching (e.g. Knuth-Morris-Pratt, Boyer-Moore etc.) I want to explain one of them which is called Z algorithm in some sources.

Z-boxes and Z-values

Let’s consider the concept of Z-box. Take the string S = “abcxxxabyyy”. We have an internal part “ab” in the string which repeats its prefix. The internal “ab” is a Z-box. It has the beginning at the position with index 6 and the end in 7 (0-based). Z-boxes are substrings which match string prefixes with the same length. For the Z-box, let’s call the corresponding prefix the prototype (it’s my term but I think it’s OK to use it.)

Z algorithm