[Chapter 8] Forms

HTML: The Definitive Guide

Previous Chapter 8 Next
 

Forms

Contents:
Form Fundamentals
Form Input Elements
Multiline Text Areas
Multiple Choice Elements
Creating Effective Forms
Forms Programming

Forms, forms, forms, forms: we fill 'em out for nearly everything, from the moment we're born, 'til the moment we die. So what's to explain all the hoopla and excitement over HTML forms? Simply this: they make HTML truly interactive.

When you think about it, except for the limited input from users available through the <isindex> tag, HTML's interactivity is basically a lot of button pushing: click here, click there, go here, go there; there's no real user feedback, and it's certainly not personalized. Applets provide extensive user-interaction capability, but they can be difficult to write and are still not standardized for all browsers. Forms, on the other hand, are supported by almost every browser and make it possible to create documents that collect and process user input, and formulate personalized replies.

This powerful mechanism has far-reaching implications, particularly for electronic commerce. It finishes an online catalog by giving buyers a way to immediately order products and services. It gives nonprofit organizations a way to sign up new members. It gives market researchers a way to collect user data. It gives you an automated way to interact with your HTML document readers.

Mull over the ways you might want to interact with your readers while we take a look at both the client- and server-side details of creating forms.

8.1 Form Fundamentals

Unlike the <isindex> tag, you can put one or more forms in a single document. And unlike an <isindex> document, users can ignore the embedded forms, reading content and interacting with the document's links just as with a form-less document. [<isindex>, 6.6.1]

Forms are comprised of one or more text-input boxes, clickable (radio) buttons, multiple-choice checkboxes, and even pull-down menus and clickable images, all placed inside the <form> tag. Within a form, you may also put regular body content, including text and images. The text is particularly useful for providing instructions to the users on how to fill out the form and for form element labels and prompts.

Once a user fills out the various fields in the form, they click a special ``Submit'' button (or, sometimes, press the Return key) to submit the form to a server. The form-supporting browser packages up the user-supplied values and choices and sends them to a server.[1] The server then passes the information along to a supporting program or application that processes the information and creates a reply, usually in HTML. The reply may be simply a thank you or it might prompt the user how to fill out the form correctly or to supply missing fields. The server sends the reply to the browser client that presents it to the user.

[1] Some browsers, Netscape in particular, may also encrypt the information, securing it from credit-card thieves, for example. However, the encryption facility must also be supported on the server-side as well: contact the browser manufacturer for details.

The server-side data-processing aspects of forms are not part of the HTML standard; they are defined by the server's software. While a complete discussion of server-side forms programming is beyond the scope of this book, we'd be remiss if we did not include at least a simple example to get you started. To that end, we've included at the end of this chapter a few skeletal programs that illustrate the common styles of server-side forms programming.

The <form> Tag

You place a form anywhere inside the body of an HTML document with its elements enclosed by the <form> tag and its respective end tag </form>. You may, and we recommend you often do, include regular body content inside a form to specially label user-input fields and to provide directions, for example.

Browsers flow the special form elements into the containing paragraphs as if they were small images embedded into the text. There aren't any special layout rules for form elements, so you need to use other HTML elements, like the <br> and <p> tags, to control the placement of elements within the text flow. [<p>, 4.1.2] [<br>, 4.7.1]

All of the form elements within a <form> tag comprise a single form. The browser sends all of the values of these elements--blank, default, or user-modified--when the user submits the form to the server.

You must define at least two special form attributes, which provide the name and address of the form's processing server and the method by which the parameters are to be sent to the server. A third, optional attribute lets you change how the parameters get encoded for secure transmission over the network.

The action attribute

The required action attribute for the <form> tag gives the URL of the application that is to receive and process the form's data.

Most webmasters keep their forms-processing applications in a special directory on their Web server, usually named cgi-bin, which stands for Common Gateway Interface[2] binaries. Keeping these special forms-processing programs and applications in one directory makes it easier to manage and secure the server.

[2] The Common Gateway Interface (CGI) defines the protocol by which servers interact with programs that process form data.

A typical <form> tag with the action attribute looks like this:

<form action="http://www.kumquat.com/cgi-bin/update">
...
</form>

The example URL tells the browser to contact the server named www.kumquat.com and pass along the user's form values to the application named update located in the cgi-bin directory.

In general, if you see a URL that references a document in a directory named cgi-bin, you can be pretty sure that the document is actually an application that creates the desired page dynamically each time it's invoked.

The enctype attribute

The browser specially encodes the form's data before it passes that data to the server so it does not become scrambled or corrupted during the transmission. It is up to the server to either decode the parameters or to pass them, still encoded, to the application.

The standard encoding format is the Internet Media Type named ``application/x-www-form-urlencoded.'' You can change that encoding with the optional enctype attribute in the <form> tag. If you do elect to use an alternative encoding, the only other supported format is "multipart/form-data."

The reality is that you'll rarely if ever see the enctype attribute used. The only format common among the popular browsers and Web servers is the default application/x-www-form-urlencoded. Netscape is the only browser that currently supports the multipart/form-data alternative, which is required only for those forms that contain file-selection fields. Unless your forms need file-selection fields, you probably should ignore this attribute and simply rely upon the browser and your processing server to use the default encoding type. [file-selection fields, 8.2.2.3]

The standard encoding--application/x-www-form-urlencoded--converts any spaces in the form values to a plus sign (+), nonalphanumeric characters into a percent sign (%) followed by two hexadecimal digits that are the ASCII code of the character, and the line breaks in multiline form data into %0D%0A.

The standard encoding also includes a name for each field in the form. (A ``field'' is a discrete element in the form, whose value can be nearly anything, from a single number to several lines of text--the user's address, for example.) If there is more than one value in the field, the values are separated by ampersands (``&'').

For example, here's what the browser sends to the server after the user fills out a form with two input fields labeled name and address; the former field has just one line of text, while the latter field has several lines of input:

name=O'Reilly+&+Associates&address=103+Morris+Street%0D%0A
Sebastopol,%0D%0ACA+95472

We've broken the value into two lines for clarity in this book, but in reality, the browser sends the data in an unbroken string. The name field is ``O'Reilly & Associates'' and the value of the address field, complete with embedded newline characters, is:

103 Morris Street
Sebastopol,
CA 95472

The multipart/form-data encoding encapsulates the fields in the form as several parts of a single MIME-compatible compound document. Each field has its own section in the resulting file, set off by a standard delimiter. Within each section, one or more header lines define the name of the field, followed by one or more lines containing the value of the field. Since the value part of each section can contain binary data or otherwise unprintable characters, no character conversion or encoding occurs within the transmitted data.

This encoding format is by nature more verbose and longer than the application/x-www-form-urlencoded format. As such, it can only be used when the method attribute of the <form> tag is set to post, as described below.

A simple example makes it easy to understand this format. Here's our previous example, when transmitted as multipart/form-data:

------------------------------146931364513459
Content-Disposition: form-data; name="name"
  
O'Reilly & Associates
------------------------------146931364513459
Content-Disposition: form-data; name="address"
  
103 Morris Street
Sebastopol,
CA 95472
------------------------------146931364513459--

The first line of the transmission defines the delimiter that will appear before each section of the document. It always consists of thirty dashes and a long random number that distinguishes it from other text that might appear in actual field values.

The next lines contain the header fields for the first section. There will always be a Content-Disposition field indicating that this section contains form data and providing the name of the form element whose value is in this section. You may see other header fields; in particular, some file-selection fields include a Content-Type header field that indicates the type of data contained in the file being transmitted.

After the headers, there is a single blank line followed by the actual value of the field on one or more lines. The section concludes with a repeat of the delimiter line that started the transmission. Another section follows immediately, and the pattern repeats until all of the form parameters have been transmitted. The end of the transmission is indicated by an extra two dashes at the end of the last delimiter line.

As we pointed out earlier, use multipart/form-data encoding only when your form contains a file-selection field. Here's an example of how the transmission of a file-selection field might look:

------------------------------146931364513459
Content-Disposition: form-data; name="thefile"; filename="test"
Content-Type: text/plain
  
First line of the file
...
Last line of the file
------------------------------146931364513459

The only notable difference is that the Content-Disposition field contains an extra element, "filename," that defines the name of the file being transmitted. There might also be a Content-Type field to further describe the file's contents.

The method attribute

The other required attribute for the <form> tag sets the method by which the browser sends the form's data to the server for processing. There are two ways: the POST method and the GET method.

With the POST method, the browser sends the data in two steps: the browser first contacts the form-processing server specified in the action attribute, and once contact is made, sends the data to the server in a separate transmission.

On the server side, POST-style applications are expected to read the parameters from a standard location once they begin execution. Once read, the parameters must be decoded before the application can use the form values. Your particular server will define exactly how your POST-style applications can expect to receive their parameters.

The GET method, on the other hand, contacts the form-processing server and sends the form data in a single transmission step: the browser appends the data to the form's action URL, separated by the question mark (?) character.

The common browsers transmit the form information by either method; some servers receive the form data by only one or the other method. You indicate which of the two methods--POST or GET--your forms-processing server handles with the method attribute in the <form> tag. Here's the complete tag including the GET transmission method attribute for the previous form example:

<form method=GET 
   action="http://www.kumquat.com/cgi-bin/update"> 
  ...
</form>

Which one to use if your form-processing server supports both the POST and GET methods? Here are some rules of thumb:

  • For best form-transmission performance, send small forms with a few short fields via the GET method.

  • Because some server operating systems limit the number and length of command-line arguments that can be passed to an application at once, use the POST method to send forms that have many fields, or ones that have long text fields.

  • If you are inexperienced in writing server-side form-processing applications, choose GET. The extra steps involved in reading and decoding POST-style transmitted parameters, while not too difficult, may be more work than you are willing to tackle.

  • If security is an issue, choose POST. GET places the form parameters directly in the application URL where they easily can be captured by network sniffers or extracted from a server log file. If the parameters contain sensitive information like credit card numbers, you may be compromising your users without their knowledge. While POST applications are not without their security holes, they can at least take advantage of encryption when transmitting the parameters as a separate transaction with the server.

If you want to invoke the server-side application outside the realm of a form, including passing it parameters, use GET because it lets you include form-like parameters as part of a URL. POST-style applications, on the other hand, expect an extra transmission from the browser after the URL, something you can't do as part of a conventional <a> tag.

Passing parameters explicitly

The foregoing bit of advice warrants some explanation. Suppose you had a simple form with two elements, named x and y. When the values of these elements are encoded, they look like this:

x=27&y=33

If the form uses method=GET, the URL used to reference the server-side application looks something like this:

http://www.kumquat.com/cgi-bin/update?x=27&y=33

There is nothing to keep you from creating a conventional <a> tag that invokes the form with any parameter value you desire, like so:

<a href="http://www.kumquat.com/cgi-bin/update?x=19&y=104">

The only hitch is that the ampersand that separates the parameters is also the character-entity insertion character. When placed within the href attribute of the <a> tag, the ampersand will cause the browser to replace the characters following it with a corresponding character entity.

To keep this from happening, you must replace the literal ampersand with its entity equivalent, either &&#38;; or &amp;. With this substitution, our example of the nonform reference to the server-side application looks like this:

<a href="http://www.kumquat.com/cgi-bin/update?x=19&amp;y=104">

Because of the potential confusion that arises from having to escape the ampersands in the URL, server implementors are encouraged to also accept the semicolon (;) as a parameter separater. You might want to check your server's documentation to see if they honor this convention. See Appendix D, Character Entities.

A simple example

In a moment we'll examine each element of a form in detail. Let's first take a quick look at a simple example to see how forms are put together.

This one (shown in Figure 8-1) gathers basic demographic information about a user:

<form method=POST action="http://www.kumquat.com/demo">
  Name: 
    <input type=text name=name size=32 maxlength=80>
  <p>
  Sex: 
    <input type=radio name=sex value="M"> Male 
    <input type=radio name=sex value="F"> Female
  <p>
  Income: 
    <select name=income size=1>
      <option>Under $25,000
      <option>$25,001 to $50,000
      <option>$50,001 and higher
    </select>
  <p>
  <input type=submit>
</form>

The first line of the example starts the form and indicates we'll be using the POST method for data transmission to the form-processing server. The form's user-input elements follow, each defined by an <input> tag and type attribute. There are three elements in the simple example, each contained within its own paragraph.

The first element is a conventional text-entry field, letting the user type in up to 80 characters, but displaying only 32 of them at a time. The next element is a multiple-choice option, which lets the user select only one of two radio buttons. This is followed by a pull-down menu for choosing one of three options. The final element is a simple submission button, which, when clicked by the user, sets the form's processing in motion.


Previous Home Next
Appropriate List Usage Book Index Form Input Elements