Recently, I stumbled onto a blog post from 2009 which discussed a way to generate random-looking strings from a series of unique numeric id's, using PHP. The author uses a base-62 encoding ([0-9][A-Z][a-z]
) to convert a number to a unique string using a series of prime numbers to reduce the potential of collisions. Below, I show how it might look in Factor.
First, we implement a simple base-62 encoder:
CONSTANT: CHARS "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz" : base62 ( n -- string ) [ dup 0 > ] [ 62 /mod CHARS nth ] "" produce-as reverse nip ;
Next, we define a series of prime numbers (which should be kept "secret" to make the algorithm hard to predict) chosen to be close to the next highest prime from the golden mean of the number of possible permutations for each string length:
CONSTANT: PRIMES { 1 41 2377 147299 9132313 566201239 35104476161 2176477521929 }
Finally, we can implement the "unique hash" function:
:: udihash ( n chars -- string ) chars PRIMES nth n * 62 chars ^ mod base62 chars CHAR: 0 pad-head ;
Try it out and see that it produces the same random-looking results as the original author. For example, a 5-character hash of the numbers 1 through 10:
( scratchpad ) 10 [1,b] [ 5 udihash print ] each cJio3 EdRc6 qxAQ9 TGtEC 5ac2F huKqI KE3eL wXmSO YrVGR BBE4U
You can find a discussion on StackOverflow, a similar approach used in the SilverStripe CMS to shorten URLs, and lots of search results probably inspiring (or inspired by) the original blog post.
If you're curious how to select your prime numbers, you can use math.primes to create your own list:
( scratchpad ) CONSTANT: golden-ratio 1.618033988749894848 ( scratchpad ) 5 [1,b] [ 62 swap ^ golden-ratio /i next-prime ] map . { 41 2377 147299 9132313 566201239 }
The code for this is on my Github.
3 comments:
Wait, why are we using permutations rather than 62^k?
@William You are exactly right, permutations would mean the characters aren't allowed to be re-used, but in this case they are. So instead of "62 swap nPk", we should do "62 swap ^".
I've update the post to reflect that. Thanks for the feedback!
Post a Comment