Web-Database Programming: CGI and Java Servlets

NOTE: This document assumes a basic knowledge of HTML. We will not be providing documentation for HTML coding apart from the creation of forms. There are dozens of tutorials available online. You might check out the NCSA Beginner's Guide to HTML.

Overview
Retrieving Input from the User

Forms
Server-Side Input Handling - CGI
Server-Side Input Handling - Java

Returning Output to the User

CGI Output
Java Output

Sample Code and Coding Tips

CGI Sample Code
CGI Setup
CGI Debugging
Java Sample Code
Java Compilation in Unix
Servlet Setup
Handling Special Characters

Overview

CGI or Common Gateway Interface is a means for providing server-side services over the web by dynamically producing HTML documents, other kinds of documents, or performing other computations in response to communication from the user. In this assignment, students who want to interface with the Oracle database using Oracle's Pro*C precompiled language will be using CGI.

Java Servlets are the Java solution for providing web-based services. They provide a very similar interface for interacting with client queries and providing server responses. As such, discussion of much of the input and output in terms of HTML will overlap. Students who plan to interface with Oracle using JDBC will be working with Java Servlets.

Both CGI and Java Servlets interact with the user through HTML forms. CGI programs reside in a special directory, or in our case, a special computer on the network (cgi-courses.stanford.edu), and provide service through a regular web server. Java Servlets are separate network object altogether, and you'll have to run a special Servlet program on a specific port on a Unix machine.

Retrieving Input from the User

Input to CGI and Servlet programs is passed to the program using web forms. Forms include text fields, radio buttons, check boxes, popup boxes, scroll tables, and the like.

Thus retrieving input is a two-step process: you must create an HTML document that provides forms to allow users to pass information to the server, and your CGI or Servlet program must have a means for parsing the input data and determining the action to take. This mechanism is provided for you in Java Servlets. For CGI, you can either code it yourself, find libraries on the internet that handle CGI input, or use the following example code that we put together for you: cgiparse.c.

Forms

Forms are designated within an HTML document by the fill-out form tag:

<FORM METHOD = "POST" ACTION = "http://form.url.com/cgi-bin/cgiprogram">
  ... Contents of the form ...
</FORM>

The URL given after ACTION is the URL of the CGI program (your program). The METHOD is the means of transferring data from the form to the CGI program. In this example, we have used the "POST" method, which is the recommended method. There is another method called "GET", but there are common problems associated with this method. Both will be discussed in the next section.

Within the form you may have anything except another form. The tags used to create user interface objects are INPUT, SELECT, and TEXTAREA.

The INPUT tag specifies a simple input interface:

<INPUT TYPE="text" NAME="thisinput" VALUE="default" SIZE=10 MAXLENGTH=20>

<INPUT TYPE="checkbox" NAME="thisbox" VALUE="on" CHECKED>

<INPUT TYPE="radio" NAME="radio1" VALUE="1">

<INPUT TYPE="submit" VALUE="done">

<INPUT TYPE="radio" NAME="radio1" VALUE="2" CHECKED>

<INPUT TYPE="hidden" NAME="notvisible" VALUE="5">

Which would produce the following form:

The different attributes are mostly self-explanatory. The TYPE is the variety of input object that you are presenting. Valid types include "text", "password", "checkbox", "radio", "submit", "reset", and "hidden". Every input but "submit" and "reset" has a NAME which will be associated with the value returned in the input to the CGI program. This will not be visible to the user (unless they read the HTML source). The other fields will be explained with the types:

"text" - refers to a simple text entry field. The VALUE refers to the default text within the text field, the SIZE represents the visual length of the field, and the MAXLENGTH indicates the maximum number of characters the textfield will allow. There are defaults to all of these (nothing, 20, unlimited).
"password" - the same as a normal text entry field, but characters entered are obscured.
"checkbox" - refers to a toggle button that is independently either on or off. The VALUE refers to the string sent to the CGI server when the button is checked (unchecked boxes are disregarded). The default value is "on".
"radio" - refers to a toggle button that may be grouped with other toggle buttons such that only one in the group can be on. It's essentially the same as the checkbox, but any radio button with the same NAME attribute will be grouped with this one.
"submit" and "reset" - these are the pushbuttons on the bottom of most forms you'll see that submit the form or clear it. These are not required to have a NAME, and the VALUE refers to the label on the button. The default names are "Submit Query" and "Reset" respectively.
"hidden" - this input is invisible as far as the user interface is concerned (though don't be fooled into thinking this is some kind of security feature -- it's easy to find "hidden" fields by perusing a document source or examining the URL for a GET method). It simply creates an attribute/value binding without need for user action that gets passed transparently along when the form is submitted.

The second type of interface is the SELECT interface, which includes popup menus and scrolling tables. Here are examples of both:

<SELECT NAME="menu">
  <OPTION>option 1
  <OPTION>option 2
  <OPTION>option 3
  <OPTION SELECTED>option 4
  <OPTION>option 5
  <OPTION>option 6
  <OPTION>option 7
</SELECT>

<SELECT NAME="scroller" MULTIPLE SIZE=7>
  <OPTION SELECTED>option 1
  <OPTION SELECTED>option 2
  <OPTION>option 3
  <OPTION>option 4
  <OPTION>option 5
  <OPTION>option 6
  <OPTION>option 7
</SELECT>

Which will give us:

The SIZE attribute determines whether it is a menu or a scrolled list. If it is 1 or it is absent, the default is a popup menu. If it is greater than 1, then you will see a scrolled list with SIZE elements. The MULTIPLE option, which forces the select to be a scrolled list, signifies that a more than one value may be selected (by default only one value can be selected in a scrolled list).

OPTION is more or less self-explanatory -- it gives the names and values of each field in the menu or scrolled table, and you can specify which are SELECTED by default.

The final type of interface is the TEXTAREA interface:

<TEXTAREA NAME="area" ROWS=5 COLS=30>
Mary had a little lamb.
A little lamb?
A little lamb!
Mary had a little lamb.
It's fleece was white as snow.
</TEXTAREA>

As usual, the NAME is the symbolic reference to which the input will be bound when submitted to the CGI program. The ROWS and COLS values are the visible size of the field. Any number of characters can be entered into a text area.

The default text of the text area is entered between the tags. Whitespace is supposedly respected (as between <PRE> HTML tags), including the newline after the first tag and before the last tag.

Server-Side Input Handling -- CGI

The form contents will be assembled into an encoded query string. Using the GET method, this string is available in the environment variable QUERY_STRING. It is actually passed to the program through the URL -- examine the URL for the first of the forms above:

http://asdf.asdf.asdf/asdf?thisinput=default&thisbox=on&radio1=2

Everything after the '?' is the query string. You'll see that a number of expressions appear concatenated by & symbols -- each expression assigns a string value to each form object. In this case, the text field named "thisinput" has the value "default", which is what was typed into the field, the checkbox "thisbox" has the value "on", and the radio button group "radio1" has the value "2" (the second button is checked -- note that this is the value I gave it, not a default value. The default is "on").

Let's look at another example from the second form:

http://zxcv.zxcv.zxcv/zxcv?menu=option+4&scroller=option+1&scroller=option+2

The menu has option 4 selected, and the scroller has option 1 and option 2 selected. Note that spaces are converted to '+' symbols in the URL string. The character '+' is converted to its hex value %2B. Other characters similarly converted are & (to %26), % (to %25), and $ (to %24). This conversion is automatic.

Using GET is not recommended, however. Some systems will truncate the URL before passing it to the CGI program, and thus the QUERY_STRING environment variable will contain only a prefix of the actual query string. Instead, you should use the POST method.

The POST query string is encoded in precisely the same form as the GET query string, but instead of being passed in the URL and read into the QUERY_STRING variable, it is given to the CGI program as standard input, which you can thus read using ANSI functions or regular character reading functions. The only quirk is that the server will not send EOF at the end of the data. Instead, the size of the string is passed in the environment variable CONTENT_LENGTH, which can be accessed using the normal stdlib.h function:

   char *value;
   int length;

   value = getenv("CONTENT_LENGTH");
   sscanf(value, "%d", &length);

Decoding the data is thus just a question of walking through the input and picking out the values. These values can then be used to determine what the user wants to see.

We have written a very simple, linear-search-based mechanism for parsing the input string. These are located, as mentioned above, at cgiparse.c. You might want to cut and paste these into your own code or to use the .h file provided. You can use this in your CGI programs by calling Initialize() at the beginning of your code, and then calling GetFirstValue(key) and GetNextValue(key) to retreive the bindings for each of the FORM parameters. See the comments in the file for more details.

Server-Side Input Handling -- Java

Java handles GET and POST slightly differently. The parsing of the input is done for you by Java, so you are separated from the actual format of the input data completely. Your program will be an object subclassed off of HttpServlet, the generalized Java Servlet class for handling web services.

Servlet programs must override the doGet() or doPost() messages, which are methods that are executed in response to the client. There are two arguments to these methods, HttpServletRequest request and HttpServletResponse response. Let's take a look at a very simple servlet program, the traditional HelloWorld (this time with a doGet method):

import java.io.*;
import java.text.*;
import java.util.*;
import javax.servlet.*;
import javax.servlet.http.*;

public class Hello extends HttpServlet {
   public void doGet(HttpServletRequest request,
      HttpServletResponse response)
   throws IOException, ServletException {
      response.setContentType("text/html");
      PrintWriter out = response.getWriter();
      out.println("<html>");
      out.println("<head>");
      String title = "Hello World";
      out.println("<title>" + title + "</title>");
      out.println("</head>");
      out.println("<body bgcolor=white>");
      out.println("<h1>" + title + "</h1>");
      String param = request.getParameter("param");
                
      if (param != null)
         out.println("Thanks for the lovely param='" + param + "' binding.");

      out.println("");
      out.println("");
   }
}

We'll discuss points in this code again in the section on Java Output, but for now, we will focus on the input side. The argument HttpServletRequest request represents the client request, and the values of the parameters passed from the HTML FORM can be retrieved by calling the HttpServletRequest getParameter method. This method takes as its argument the name of the parameter (the name of the HTML INPUT object), and returns as a Java String the value assigned to the parameter. In cases where the parameter may have multiple bindings, the method getParameterValues can be used to retrieve the values in an array of Java Strings -- note that getParameter will return the first value of this array. It is through these mechanisms that you can retrieve any of the values entered or implicit in the form.

As might be inferred from the example above, Java returns null if the parameter for whose name you request does not have a value. Recall that unchecked buttons' bindings are not passed in a POST message -- you can check for null to determine when buttons are off.

Returning Output to the User

In your project, you are going to be concerned with returning HTML documents to the user. The documents will be dynamically created based on the output of the query. You can format it however you like, using ordinary HTML formatting routines

CGI Output

The only work you have to do apart from constructing an HTML document on the fly with the output from the query is to add a short header at the top of the file. Your header will represent the MIME type for HTML, and consists of a single line of text followed by a blank line:

content-type: text/html

<HTML> ... file ... </HTML>

There are, of course, many other types that you can return, but this is all you'll need to return your database queries.

CGI returns the HTML document to the user through standard output from the program, so you can just use a regular printf function in your C programs. The format for setting the content type is just:

printf("content-type: text/html\n\n");

Java Output

Let's look back at our Java code example. You'll see a number of differences between the Servlet code and the CGI approach. Output is all handled by the HttpServletResponse object, which allows you to set the content type through the setContentType method. Instead of printing the HTTP header yourself, you tell the HttpServletResponse object that you want the content type to be "text/html" explicitly.

All HTML is returned to the user through a PrintWriter object, that is retrieved from the response object using the getWriter method. HTML code is then returned line by line using the println method.

Assuming that you all have a basic background in Java, so we won't provide a detailed treatment of exceptions here, but do note that IOException and ServletException both must either be handled or thrown.

Sample Code and Coding Tips

I recommend that everyone attempt to play around a little bit with both of the methods, Java Servlet and CGI, if you have the time and inclination (though you only have to implement your database interface in one of them, of course).

CGI Sample Code

Here is a demonstration of a PRO*C CGI program.
You can also check out the source code.
The HTML page demonstrates a few input features, though the only ones that do anything are the username and password fields. These are used to log onto your Oracle account when the CGI program is executed, create a table, do some insertions, demonstrate the production of HTML formatting through queries on the data (including a demonstration of constructing a new form, which may provide some of you with ideas of how to make a really cool interface), and then drop the table from your database. You may freely cannibalize whatever portions you find useful.

CGI Setup

Your CGI script will be run from cgi-courses.stanford.edu. The URL for your CGI executable will be: http://cgi-courses.stanford.edu/~username/cgi-bin/scriptname

You will need to perform the following actions before a CGI program will run:

Get an account on cgi-courses.stanford.edu.
Create a directory in your home folder to hold your cgi binary executables:
```
mkdir cgi-bin
```
Set access levels on your cgi-bin directory for your cgi-courses.stanford.edu account ("username" should be replaced with your username):
```
fs setacl cgi-bin username.cgi write
```
Make sure that your executeables correctly set all environment variables (normally this is done by /usr/class/cs145/all.env, but this is not available on the cgi machine, so you have to do it explicitly). Here is an example function that you should run before attempting to connect to the database (this function is in C, but you can pretty much just lift the settings and paste them into Perl or PHP, which also need them to connect to the database):
```
      void SetEnvs(void) {
        putenv("ORACLE_SID=SHR1_PRD");
        putenv("ORACLE_HOME=/usr/pubsw/apps/oracle/8.1.7");
        putenv("ORACLE_TERM=xsun5");
        putenv("TNS_ADMIN=/usr/class/cs145/sqlnet");
        putenv("TWO_TASK=SHR1_PRD");
      }
```
Move the executable into your cgi-bin folder.
Change the permissions on your cgi executable:
```
chmod 701 scriptname
```
Use HTML forms to access your new program at http://cgi-courses.stanford.edu/~username/cgi-bin/scriptname.

Here is the homepage of the leland CGI service, which has a FAQ and gives some information about the capabilities of the system. Please check here first if your CGI programs are giving you errors.

CGI Debugging

Due to popular demand, a new cgi debugging feature was just added to the cgi service. It's not in the leland CGI docs yet. If you access your script like so:

http://cgi-courses/cgi-bin/sboxd/~username/scriptname

The script will execute with extra debug info:

All STDERR goes to the browser
A header is included, so lack of any output or lack of Content Type will not cause Internal Server Error.

If still receiving Internal Server Error, consult the cgi FAQ or look in the server log: http://cgi-courses/logs/error_log.

Note, the log shows only several recent entries, due to system issues.

An alternative method is to run your cgi program from command-line, without using the web browser. Put your CGI input into the environment variable QUERY_STRING and run your program. For example (assuming your program is called cgiprog and expects two parameters name1 and name2):

cd ~/cgi-bin
setenv QUERY_STRING 'name1=abc&name2=def'
cgiprog

Note: If you want to use debugging tools such as dbx or gdb, you need to modify Makefile to add the flag -g after cc or g++.

Java Sample Code

You can provide your HTML FORMs on permanent webpages in your personal WWW directory -- though this isn't recommended because you then have to hard code the Servlet addresses -- or in the webpages subdirectory where you run your Servlet (see below in the Servlet setup section). Alternatively (or additionally) you can integrate FORMs into Servlets by creating a FORM on the fly in your Servlet program, which will be invoked when doPost() or doGet() are invoked by the client. An example of a program that creates a FORM on the fly can be found at RequestParamExample.java.

An example that uses JDBC to implement an interface for querying information about a certain US state (based on the JDBC example programs provided in PDA assignment 5) can be found at StateQuerier.java.

The following two examples implement the state query, but it separates the query form from the answer form, providing these services with two different Servlets: StateQueryForm.java and StateQueryAns.java.

You can find the very simple example given above in the text at Hello.java.

One last example demonstrates the concept of a Session, which we do not cover in this handout, but you can use to liven up your interface can be found at HelloSession.java.

Java Compilation in Unix

Compiling Servlets in UNIX requires a few changes to your PATH and CLASSPATH environment variables. These changes have been made for you in the source file /afs/ir/class/cs145/all.env. They include the following additions:

setenv PATH /afs/ir/class/cs145/jsdk2.1:/usr/pubsw/apps/jdk1.2/bin:${PATH}
setenv CLASSPATH /afs/ir/class/cs145/jsdk2.1/servlet.jar:$CLASSPATH

If there are any difficulties, let us know. These have been tested on the elaine machines and are assumed to be operational on the leland Sparc machines (elaine, myth, epic, saga).

You also have to set up a specific directory structure to provide Servlets. The directory structure required by Servlets is essentially:

[anydir]
   [servletdir]
      webpages
         WEB-INF
            servlets

A shell script to build this hierarchy is provided at /afs/ir/class/cs145/code/bin/buildServletDirectory (after you run source /afs/ir/class/cs145/all.env (which you probably should just add to your .cshrc file), you can run buildServletDirectory by just typing the command).

You can store .html documents in your webpages directory, and they will be accessible at your Servlet address (see below), while all Servlets you write have to be located in the servlets directory to be recognized.

Further information on the Java Servlet API can be found at Servlet Package Documentation page.

Servlet Setup

The directory structure for your servlets and HTML documents was outlined in the previous section. Static HTML documents may be placed in the webpages directory and are accessible from the web at the address http://machinexx:portnum/page.html, where machinexx refers to the machine from which you're running the webserver (e.g. elaine12, saga22, myth7, etc.), portnum is a specific port (see below), and page.html is the name of the HTML page that you are serving. You may find it useful to create a static HTML document or a hierarchy of static documents to serve as the jumping off point for your Servlets, where your HTML FORMs that start the interaction with the database are found.

Servlets will be found in the directory servletdir/webpages/WEB-INF/servlets, and will just be the .class files that you compile from your .java files using javac. These may be reached on the web using the URL http://machinexx:portnum/servlet/servletname. Note that the servlet directory is singular in the URL but plural in Unix, while the Servlet itself loses its .class in the URL. HTML and other documents contained in the servlets directory cannot be accessed over the web.

Once you have your directory set up and your Servlets compiled, you have to run the Java JSDK 2.1 webserver manually on a specific leland machine in order to provide these documents over the web. The steps involved in starting the server are as follows:

Choose a port number in the range 5000-65000. This will bind your server application to that port for the machine on which you're running your server. Try to choose a random number and remember it -- you will be the only person on that machine who can use that port, and you will need it to have access over the web.
From the root of your servlet directory (if you run our buildServletDirectory script, then it will be called servletdir), start the server by calling startserver -port portnum from the Unix command line, where portnum is the port number you chose above. The server will begin in the background, and you can see it using the ps command. If you do not enter a port number, the default port number, 8080, will be chosen for you (you can actually set the default yourself -- after you've run the server once, it will create a configuration file called "default.cfg" for you -- it finds the default port number here).
From your browser, enter the URL of a webpage or servlet contained in your servletdir hierarchy using the address structure mentioned above. Now you can play with your interface.
If you would like to stop the server, issue the command stopserver.
If you want to recompile your servlets, you have to stop the server and restart it again. Static HTML pages that you are hosting from the webpages directory, however, can be changed at will.

Handling Special Characters

The special characters &, <, and >, need to be escaped as &, <, and >, respectively in HTML text (see NCSA Beginner's Guide to HTML). Moreover, special characters appearing in URL's need to be escaped, differently than when they appear in HTML text. For example, if you link on text with special characters and want to embed them into extended URLs as parameter values, you need to escape them: convert space to + or %20, convert & to %26, convert = to %3D, convert % to %25, etc. (In general, any special character can be escaped by a percent sign followed by the character's hexadecimal ASCII value.) Important: Do NOT escape the & that actually separates parameters! For example, if you want two parameters p1 and p2 to have the values 3 and M&M, you should write something like:

http://cgi-courses.stanford.edu/~username/cgi-bin/cgiprog?p1=3&p2=M%26M

Be careful not to confuse the escape strings for HTML text with those for URL's.

This document was written by Nathan Folkert (with help from Vincent Chu) for Prof. Jennifer Widom's CS145 class in Spring 2000; revised by Calvin Yang for Prof. Widom's CS145 class in Spring 2002.