Monday, May 17, 2010

Creating Fake Data

A few days ago there was a post on Hacker News about a project called Faker.js. Basically, the project is a Javascript clone of libraries in Ruby and Perl for creating fake data for names, phone numbers, e-mails, street addresses, company information, etc.

I took a look at the code, and got inspired to create a Factor version.

First, some useful vocabularies:

USING: ascii combinators fry kernel make math random sequences ;

All three implementations seem to use simple random selection to generate the "fake" information. For example, each has a long list of valid first and last names. It looks sort of like:

CONSTANT: FIRST-NAME {
    "Aaliyah" "Aaron" "Abagail" "Abbey" "Abbie" "Abbigail"
    "Abby" "Abdiel" "Abdul" "Abdullah" "Abe" "Abel" "Abelardo"
    "Abigail" "Abigale" "Abigayle" "Abner" "Abraham" "Ada"
    "Adah" "Adalberto" "Adaline" "Adam" "Adan" "Addie" "Addison"
    "Adela" "Adelbert" "Adele" "Adelia" "Adeline" "Adell"
    ...

CONSTANT: LAST-NAME {
    "Abbott" "Abernathy" "Abshire" "Adams" "Altenwerth"
    "Anderson" "Ankunding" "Armstrong" "Auer" "Aufderhar"
    "Bahringer" "Bailey" "Balistreri" "Barrows" "Bartell"
    "Bartoletti" "Barton" "Bashirian" "Batz" "Bauch" "Baumbach"
    ...

Given a long list of possible names, generating a fake name is no more complicated than:

( scratchpad ) FIRST-NAME random LAST-NAME random " " glue .
"Greyson Barrows"

Similarly, creating phone numbers is no more complicated than a list of typical phone number patterns, combined with a word that performs substitution of random numbers:

CONSTANT: PHONE-NUMBER {
    "###-###-####"
    "(###)###-####"
    "1-###-###-####"
    "###.###.####"
    "###-###-####"
    "(###)###-####"
    "1-###-###-####"
    ...

: (numbers) ( str -- str' )
    [ dup CHAR: # = [ drop "0123456789" random ] when ] map ;

Generating a fake phone number (without performing any kind of validation on area codes or local numbers) is as easy as:

( scratchpad ) PHONE-NUMBER random (numbers) .
"352-327-9815"

For added flavor, the author chose to include "business bullshit" generation:

( scratchpad ) 5 [ fake-bs . ] times
"leverage 24/7 models"
"deploy ubiquitous vortals"
"maximize holistic channels"
"exploit real-time niches"
"unleash proactive mindshare"

And "product catch phrase" generation:

( scratchpad ) 5 [ fake-catch-phrase . ] times
"Reverse-engineered value-added toolset"
"Diverse systemic concept"
"Ergonomic holistic pricing structure"
"Persevering local interface"
"Intuitive human-resource time-frame"

Useful for scale testing websites, practical jokes, and probably less innocent purposes. You can see the full version on my GitHub account.

No comments: