Wednesday, January 20, 2010

Working with CGI: Part 1

While Factor can be used to develop many different kinds of programs, some uses just aren't as common as others, for many reasons.

One such case is its use to develop CGI scripts. The "Common Gateway Interface" (sometimes called "CGI/1.1") is the documented version of conventions developed for web programming in the late 1990's. If you are curious, you can read RFC 3875 for more details.

The way it works is simple. The web server:

  1. receives an HTTP request
  2. parses the request headers and payload, and then
  3. calls an application with the request details, which then
  4. renders a response that is then sent back to the client

When developing and testing CGI scripts, it is useful to understand the environment that your program will be running within. For this, we can build a simple Factor program that prints the environment variables it is called with to HTML that can be rendered in a web browser.

Our CGI script should be executable, and on a UNIX system (like Mac OS X or Linux) should contain a shebang which indicates what program should process the files contents. This is used to call the Factor interpreter with our CGI script. Note that the shebang has a space after it, which is not required by most interpreters, but is by Factor:

#! /path/to/factor

The vocabularies that we will be using:

USING: assocs environment kernel io namespaces sequences
sorting ;

The first response from a CGI script is typically the HTTP headers, including the type of content that is being returned. Your script could return any content, including images, audio, or video. But in this case, we will just return plain HTML:

"Content-type: text/html\n\n" print

We can then print the HTML header and begin the body:

"""
<html>
<head>
<title>Debug</title>
</head>
<body>
<pre>
""" print

Next, we will get all the environment variables available to our process and print them, sorted alphabetically:

os-envs >alist sort-keys [
    [ "<b>" write first write "</b>" write ]
    [ " = " write second write nl ] bi
] each

And then finish the HTML document with closing tags:

"""
</pre>
</body>
</html>
""" print

If you run this program from the shell, it will print your local user environment. But, when run from a web server, it prints the CGI script's environment. According to the CGI specification, certain environment variables are used to pass the HTTP request details to the CGI program. Some of the commonly used ones include:

DOCUMENT_ROOT
The root directory of your server
HTTP_COOKIE
The visitor's cookie, if one is set
HTTP_HOST
The hostname of the page being attempted
HTTP_REFERER
The URL of the page that called your program
HTTP_USER_AGENT
The browser type of the visitor
HTTPS
"on" if the program is being called through a secure server
PATH
The system path your server is running under
QUERY_STRING
The query string (see GET, below)
REMOTE_ADDR
The IP address of the visitor
REMOTE_HOST
The hostname of the visitor (if your server has reverse-name-lookups on; otherwise this is the IP address again)
REMOTE_PORT
The port the visitor is connected to on the web server
REMOTE_USER
The visitor's username (for .htaccess-protected pages)
REQUEST_METHOD
GET or POST
REQUEST_URI
The interpreted pathname of the requested document or CGI (relative to the document root)
SCRIPT_FILENAME
The full pathname of the current CGI
SCRIPT_NAME
The interpreted pathname of the current CGI (relative to the document root)
SERVER_ADMIN
The email address for your server's webmaster
SERVER_NAME
Your server's fully qualified domain name
SERVER_PORT
The port number your server is listening on
SERVER_SOFTWARE
The server software you're using (e.g. Apache)

This is a useful fact for testing, since you can easily simulate the request that the web server will be sending to your CGI script by configuring the environment in the appropriate way. More to come on that later...