Tuesday, February 2, 2010

Working with CGI: Part 2

In Part 1, we created a simple debugging script to print out the environment the CGI script is being executed within.

Many CGI scripts are simple "form handlers". These scripts take input via an HTML form and generate a dynamic response. We are going to write a CGI script that will be able to parse input from an HTML form submission.

The most common type of HTTP request is the GET method. The web browser sends a URL to the server and requests the server "get" the contents of the URL and send it back. When HTML forms are submitted using the GET method, the form elements are "URL encoded" and passed to the server as the "query string" part of the URL.

For example, if I had a "calculator" application to add two numbers (e.g., "x+y"), you could imagine getting the result of 2+3 by calling:


We need a word that will parse the QUERY_STRING and return a map of submitted parameters. Luckily, Factor has such a word in the urls.encoding vocabulary:

( scratchpad ) "x=2&y=3" query>assoc .
H{ { "x" "2" } { "y" "3" } }

For our use case, the query>assoc word isn't quite what we need. For one thing, it handles empty strings in an odd way:

( scratchpad ) "" query>assoc .
H{ { "" f } }

Also, it doesn't handle multiple inputs with the same name consistently with single inputs. If a parameter is represented in the query string multiple times, it will appear in the result as a list of values.

( scratchpad ) "a=2&a=3" query>assoc .
H{ { "a" { "2" "3" } } }

So to "fix" this, we will develop a word that filters out f values, and returns both single and multiple parameters as sequences.

: (query-string) ( string -- assoc )
    query>assoc [ nip ] assoc-filter
    [ dup string? [ 1array ] when ] assoc-map ;

Now that we have our building blocks, we can begin supporting the GET request. Let's start by designing the API. We want to parse the request method, handle the GET method, and return the parameters submitted. The REQUEST_METHOD and QUERY_STRING are available as environment variables:

: parse-get ( -- assoc )
    "QUERY_STRING" os-env "" or (query-string) ;

: <cgi-form> ( -- assoc )
    "REQUEST_METHOD" os-env "GET" or >upper {
        { "GET"  [ parse-get ] }
        [ "Unknown request method" throw ]
    } case ;

Since frequently we will only need to worry about the first parameter value (ignoring subsequent values if present), we can make a simple version that can be optionally used:

: <cgi-simple-form> ( -- assoc )
    <cgi-form> [ first ] assoc-map ;

Putting all of this together, we can build something useful: a Brainfuck interpreter accessible from a web page!

USING: assocs brainfuck cgi formatting io kernel ;

"Content-type: text/html\n\n" write

"code" <cgi-simple-form> at
""" or dup get-brainfuck

<form method='get'>
<textarea id="text" name="code" cols="80" rows="15">
<input type="submit" value="Submit"> 
<input type="reset" value="Reset">
""" printf

If nothing is specified, this will happily calculate and then print "Hello World!", otherwise it will compute the result of the code provided.


otoburb said...

: (query-string) ( string -- assoc )
query>assoc [ nip ] assoc-filter
[ dup string? [ 1array ] when ] assoc-map ;

I don't get the "[ dup string? [ 1array ] when ] assoc-map" portion of the code?

If I feed this portion of the code H{ { "a" { "2" "3" } } } that code fragment outputs the same hashtable back to me in the listener. What is that part of the code supposed to do?

otoburb said...

Please ignore my comment above. You are returning single parameter values as sequences in the same manner that multiple values for the same parameter are also passed back as sequences -- exactly as you wrote it in the description.