One of my favorite interview questions goes something like this:
Given an array of 1001 elements which contains integers from 1 to 1000 inclusive. The numbers are randomly stored in the array. Only one number repeats itself. The candidate has to come up with an efficient solution for finding that duplicate given that you can access the elements of an array only once i.e., you can read the elements of the array only once.
First, we need to make a randomized array of our numbers (with one duplicate):
Python:def make_numbers(n): seq = [n] + range(1, 1001) random.shuffle(seq) return seq
Factor:: make-numbers ( n -- seq ) [ 1000 [1,b] ] dip suffix randomize ;
It's worth pointing out that Factor has many "builtin" words that can be helpful for solving this, making the shortest solution:
IN: scratchpad 12 make-numbers duplicates first . 12
While this solution is not linear, it is often the "obvious" first way to solve this problem. It goes something like this: for each element in the list, check if duplicated by any other element:
Python:def find_duplicate(seq): for i, m in enumerate(seq): for n in seq[i+1:]: if m == n: return m return False
Factor:: find-duplicate ( seq -- n ) dup '[ 1 + _ index-from ] find-index nip ;
Our first "linear" solution uses sets to track which elements we've seen, stopping when we encounter our first duplicate. We will gloss over the extra memory and time required to maintain the set.
Python:def find_duplicate(seq): d = set() for m in seq: if m in d: return m d.add(m) return False
Factor:: find-duplicate ( seq -- n ) f fast-set '[ _ ?adjoin not ] find nip ;
: find-duplicate ( seq -- n ) dup length <bit-set> '[ _ ?adjoin not ] find nip ;
Taking advantage of the fact that our numbers are from 1 to 1000, we can compute the
sum of the numbers, and subtract the expected total (using the formula
(n * n+1) / 2) to find the duplicate:
Python:def find_duplicate(seq): n = len(seq) - 1 return sum(seq) - (n*(n+1))/2
Factor:: find-duplicate ( seq -- n ) [ sum ] [ length dup 1 - * 2 / ] bi - ;
What do you think? Any other good, fun, or unusual ways to solve this problem?
Update: Zeev pointed out in the comments that another way would be to XOR all the values in our sequence and the numbers 1 to 1000:
Python:def find_duplicate(seq): n1 = 0 for x in seq: n1 ^= x n2 = 0 for x in range(1, 1001): n2 ^= x return n1 ^ n2
Factor:: find-duplicate ( seq -- n ) 1000 [1,b] [ 0 [ bitxor ] reduce ] bi@ bitxor ;