Hints for Web Authors

By Warren Steel

I've been working on World Wide Web documents since spring 1994, when I attended a seminar at the Mississippi Center for Supercomputing Research (MCSR). At the University of Mississippi I manage a web site for the department of music with some 120 HTML documents, a site for Sacred Harp singing, and a personal site.

I find the Web world to be complex and in rapid flux. I began web authorship without much to go on, working by trial and error until I had acceptable results. Now I follow the discussions in the newsgroup comp.infosystems.www.authoring.html and try to keep up with changing standards and new browsers, while learning how best to make our documents accessible, clear, and attractive to all who browse them. To this end, I'm always making changes in my documents; in the same spirit, I'd like to offer a few suggestions, in hopes that others may find 'em useful, or reply with hints of their own. This document, made in October 1995, is no longer updated, and has not been revised since 2003.

  1. Introduction
  2. The meaning of authorship
  3. Portable documents
  4. HTML 3.2
  5. HTML 4.0
  6. Links to references cited

1. Introduction

I have two cardinal rules of web authorship. (1) The Web is and should be platform-independent. Documents will convey the same information to users who have various operating systems (Unix, CMS, Windows, NT, Mac), various browsers (graphic, text-only, braille or speaking machines for the blind, etc.) and other devices (webcrawlers, searchers, indexers), and various user settings (monitor resolution, window sizes, fonts, colors, graphics turned on/off). (2) HTML is a content markup language, not a desktop publishing or page presentation environment. Many questions in the authoring newsgroup come from people with desktop publishing experience who want to know "how to do" something like animations, background sounds, fancy fonts and layouts, scrolling marquees, hit counters, or the like, but have no idea how to organize paragraphs, headings, lists, and images for varied platforms and displays. If you start out with good content, you can use tables, images, and other elements to enhance the appearance dramatically. If you start with a "look and feel" concept, it may be too late to pour in coherent content.

Part 2 discusses the concept of authorship, and explains why the web author cannot be obsessed with the final appearance of a web document. Part 3 offers suggestions for making web documents truly portable, without making them look dorky. Part 4 discusses the implications of new browsers and new HTML standards, and includes suggestions for improving your current web documents so that they can survive the new versions.

2. The meaning of authorship

A web author is not a programmer, nor a typographer, nor a graphic designer. Since the days of Gutenberg, authors have learned to "let go" of their cherished work ("Farewell, sweet book") when they deliver their manuscript to the publisher. The author's manuscript may include chapters, paragraphs, headings, tables, and illustrations, all clearly marked, but it is the editor who chooses the paper, page design, fonts, and other characteristics, according to a "house style." In the same way, a web author prepares a document by marking up the elements, and then "sends it to the publisher" by placing it on a web server. The function of the editor is shared between the browser, which renders the text and graphics on the available hardware, and the human being who views the document. It is the user who configures the browser by choosing the fonts, sizes, and colors and other features of the onscreen appearance. This is the great strength of the World Wide Web. On a non-graphic browser, the user can view text descriptions in place of the invisible images. On a graphic browser, a nearsighted user can control the size of the fonts; a colorblind user can choose colors that offer enough contrast for legibility. A user on a slow line, say a dialup, can disable graphic loading, only displaying graphics individually when they contain essential information. A blind user can listen to a web document when rendered by a speech synthesizer, or read it by means of a braille browser. In every case, the structure of the document is the same, but the renderings are customized for the individual.

Hypertext Markup Language (HTML) is a simple but ingenious concept. If you use it for the purpose for which it was designed, you will learn to love its simplicity and flexibility. If you try to use it for a page description language, or a desktop publishing program, you will learn to hate it, and will be doomed to endless frustration. If the exact appearance of a document is of paramount importance, you have other options—you can scan it and create a bitmapped image of your page, or you can offer it in a sophisticated page description language such as Postscript, which can be viewed on an appropriate viewer. In either case, you will have lost the flexibility in rendering that is the chief advantage of HTML.

3. Portable documents

If you want to reach as wide an audience as possible, you must try to be friendly. If you begin by saying "Internet Explorer 6.0 Enhanced! Get it or get out!" you've dismissed, and maybe offended, people that for reasons of their own are using another browser. Make your message the issue, not the medium. Hypertext markup standards are not religious dogma; they're a common language for communicating. If you follow (with caution) the HTML 4.0 recommendation, you can be reasonably sure that every browser can display your document acceptably. If you use non-standard tags, such as Netscape or Microsoft extensions, you have no guarantee they will work in future versions of Netscape or of any other software—many people with Netscape 1.1 "enhancements" were forced to change their documents to make them viewable in Netscape 2.0, or 3.0, or. 4.0. Finally, if you confine yourself to well-documented standards, you have the added advantage that you can "validate" your documents.

a. Start with content and organize it clearly before worrying about exact appearance. The basic kinds of content in HTML documents are paragraphs of text, headings (organized into various levels), and lists (numbered, unnumbered, and definition lists). Tables are another important element (not recognized by all browsers), as are blockquotes, addresses, and others. A heading cannot be part of a paragraph, nor a table part of a heading. One way to ensure that each element maintains its identity is to adopt the recommended practice of enclosing paragraphs within a pair of tags <P>...</P> just like all the other block elements. Align or center each element separately with the ALIGN attribute: <H1 ALIGN=CENTER>Dave's World</H1> You could use the Netscape tag <CENTER>, but it may not be displayed on other browsers, and it encourages bad markup by failing to specify which elements you want to align.

The various HTML elements <P>, <H1>, <TABLE>, <UL>, etc. not only tell the browser how the document is organized; they also supply information that helps search/index programs to pick up keywords or generate a table of contents. If you have a heading, it's important to call it a heading: <H2>Biology Web Sites</H2> Don't call it something else just because you like the typeface better on a particular browser: <FONT SIZE=5>Biology Web Sites</FONT> fails to identify the text as the name of an important section of your document. And <FONT SIZE=6>B</FONT><FONT SIZE=4>iology Web Sites</FONT> may not even contain the searchable keyword 'biology'.

b. Then add images and links. All inline images <IMG> should be part of another block element, such as a heading, paragraph, or table. The new <OBJECT> element supports multimedia objects, descriptive text with markup, and client-side image mapping; it can stand alone or allow subsequent text to flow around the figure. It has not yet been completely implemented by the commercial browsers, but it should be studied by web authors looking for flexible graphic and multimedia treatments.

A banner or other image that plays the role of a heading should be enclosed in <H1>...</H1> or whatever level tags. <H1 ALIGN=CENTER><IMG SRC="banner.gif" ALT="Dave's World"></H1> Images can also be incorporated in paragraphs, lists and tables. Good HTML authors always show consideration for users of non-graphic browsers by including an ALT= attribute within the <IMG> tag: if verbal explanation could be helpful, write ALT="[UM students cheer on Rebels at the Georgia game]" If the image is merely decorative, write ALT="" so that nothing will be displayed on the non-graphic screen. To speed up loading of documents with images, use the WIDTH= and HEIGHT= attributes to the <IMG> tag, by entering the exact values (in pixels) of the original image; these attributes will enable some browsers to allot screen space for the image so that they can render the subsequent text before the images have loaded: <H1 ALIGN=CENTER><IMG SRC="banner.gif" WIDTH="430" HEIGHT="112" ALT="Dave's World"></H1>

In a well-organized document consisting of headings, paragraphs, tables, and lists, it's a simple matter to add links to words or phrases—almost all browsers have a way of indicating linked text: underlining, color, highlighting, etc. You don't need to say "Click here for information on our graduate programs;" just insert the link into what you were saying: "Our excellent graduate programs ..." Links to large files or unusual formats should be so marked, perhaps in a parenthetical note: "Our stirring fight song (400k .au) ..."

Links and other attribute values should be checked carefully so that quotation marks are paired. A link such as <a href="/share/copyright.html> (without the closing quotation mark) will work as intended in Netscape 1.0-1.2, but will choke Lynx or Netscape 4.0.

c. Add tables, forms, and image maps if necessary. For some kinds of content, the exact placement or presentation of data is important. For such information, HTML provides a special block element, called preformatted text <PRE>. All characters and spaces between <PRE> and </PRE> will be displayed as entered, usually in a monospaced font. For some kinds of text, such as modern poetry, preformatted text offers the only way to ensure that the text appears as the author intended. For tabular data, that is, data arranged in columns and rows, <PRE> can be used also, but HTML also offers a more flexible and attractive solution: the HTML <TABLE>. A table is an array of cells <TD> arranged in rows <TR>; each cell may contain text, images, links, or any block element. Table attributes may specify the size and alignment of table elements, subject to the capabilities of the browser and display. Some browsers cannot display tables at all. If you choose to work with this powerful feature, you should (1) use them only to present tabular data, not as a means of forcing a particular layout or appearance; (2) provide a viable alternative, using preformatted blocks or other means, to all pages containing tables. For example, "See my resumé, also available in a version without HTML tables." (1)

While the World Wide Web has many possibilities for interactive sessions, few of these are built into hypertext markup. Some browsers (e.g., Lynx) have a command to send mail to the author, providing that the e-mail address is properly supplied in the <LINK> element in the document head:
<LINK rev=made href="mailto:mudws@olemiss.edu">
The author can also solicit mail by way of a mailto: link in the document body, providing that the viewer has a properly configured browser and access to a mail server. While requests for files and documents can be processed directly by the server, other kinds of user input require the execution of special programs or scripts on the server machine, usually written according to a standard called Common Gateway Interface (CGI). HTML forms have two basic functions: they provide areas for the viewer to enter specified data, and they invoke the scripts which process this data and act upon it in specified ways, such as updating files, searching a database, returning a custom-made document, sending mail, or other action. Because scripted actions can raise security concerns, an individual author should usually consult with the system administrators who have access to the scripts, and can make changes as needed.

The image map is an additional interactive element in wide use in Web documents. Limited to graphic systems, an image is superimposed upon a graphical array of links. When the user clicks on the image, the position of the mouse or other device is sent to the server, which returns the appropriate file. While they are strongly appropriate for retrieving data by clicking on a geographical map, their most frequent use today is as a graphic substitute for a menu or list of links. This is a bad idea, for several reasons. Although toolbar maps may be attractive, their use is discouraged. If you really want to use one, you should (1) see that the image is small, both in file size and pixel dimension, (2) see that it is clear and legible to those with impaired vision, even on monochrome graphic systems, and (3) always supplement it with a text alternative.

d. Validate your work! HTML is a public standard; the W3 Consortium (W3C) maintains public specifications for the universal HTML 2.0 (RFC 1866), HTML 3.2, and the newer HTML 4.0. Netscape and Microsoft extensions are not a public standard—there are usage hints, but the specs are not published anywhere, so there's no telling how the extended tags will behave in all cases. One of the strongest reasons for following the standard is that you can validate your documents by running them through software that will reveal hidden errors, and ensure that your work will make sense on every browser. Two online validators are the W3C and the WDG validation services.

4. HTML 3.2

Recent years have seen important developments in World Wide Web communication. These included new HTML proposals and drafts, and the widespread implementation of such features as scripted actions, frames, and style sheets. While many of the new features implemented in Netscape Navigator (versions 2 and 3) and Microsoft Internet Explorer (versions 2 and 3) seem counter to the spirit of interoperability, the consolidation of standards in the HTML 3.2 and 4.0 specification, and the gradual emergence of style sheets, offer hope that HTML can remain viable as a platform-independent means of communication over the Web.

HTML 3.2 (Wilbur). Previous versions of HTML were developed through suggestions and consultations by the whole Web community (authors, developers, and web users), and submitted as Internet drafts. The HTML 3.0 draft, while innovative and well-suited to authors and users, was largely ignored by major software developers, resulting in a proliferation of mutually incompatible vendor extensions. HTML 3.2, proposed by the W3C's Editorial Review Board, was an attempt to solidify the state of implementation by the "market leaders" as of early 1996, and as such incorporates many vendor extensions, as well as a few widely-used portions of the expired draft, into a formal specification. Authors who mark up and validate their documents at HTML 3.2 can be confident that their work will be viewable on a wide variety of current browsers. On the other hand, HTML 3.2 contains a large amount of presentational markup (elements and attributes) which is deprecated in the newer HTML 4.0 specifications. This includes the character-level <FONT> element, which attempts to control the size and color of specific portions of text. Since current browsers do not offer means of disabling the effects of such markup, <FONT> frequently produces unpredictable conflicts with user settings and defaults, resulting in loss of communication. Authors who want their documents to be accessible to all would do well to avoid the <FONT> element: if you use Style Sheets to suggest fonts and other presentational aspects, your work may not always be presented as you suggest, but at least it will be legible to all, and highly attractive to those with style-capable browsers (see below). The Web Design Group offers a complete guide to HTML 3.2, including a useful overview of every tag and its use.

Client-side image maps. HTML 3.2 allows authors to replace the unreliable server-side image-map with the <MAP> element, which provides both client-side parsing and built-in text alternatives. Browsers who have not this capability will see the image, but the links will not work. Spyglass image-maps may exist as separate files, linked to several documents, but a persistent bug in Netscape prevents the use of linked client-side image maps. While the new maps can speed up processing, and can display the linked URL when the mouse passes over the image, they still must be supplemented by server-side alternatives for those without this capability.

4. HTML 4.0

HTML 4.0, approved in December 1997, is an attempt to provide an interworking specification for a World Wide Web with enhanced possibilities for multimedia, scripting, and presentational style. It also contains features to improve accessibility on all platforms and displays, and to disabled users; and provides support for languages and writing systems worldwide. The HTML 4.0 recommendation consists of three different versions. The "strict" definition excludes most presentational markup such as fonts and layout, relegating these to style sheets. The "transitional" definition retains much of the "deprecated" presentational markup from HTML 3.2, as a stopgap measure until better browsers are available. Finally, the "frameset" definition incorporates Netscape's popular though problematic multi-document layout model.

Internationalization. In HTML 2.0, the default character set was ISO-8859-1, containing nearly two hundred characters and symbols used in Western European languages. Other character sets could be supported, but only with difficulty. HTML 4.0 uses ISO-10646, also known as Unicode, as its basic character set, containing some 34,000 characters in most of the world's languages. The reliable display of Unicode documents requires large font upgrades on older systems. Fortunately, most users will require only a small portion of the entire repertory of characters. Jukka Korpela has written a useful tutorial on character set issues associated with HTML 4.0.

Frames. Frames, introduced in Netscape 2.0, are sub-windows displayed within the main window of the client screen. They may be resizeable and independently scrollable, or they may be fixed in size and position while the rest of the window scrolls, providing a corporate banner or toolbar menu. They are achieved through a new block element <FRAMESET> which replaces <BODY> as the "main" page. Within this element there can be one or more <FRAME> elements with various content and options. Most of the material in a <FRAMESET> is completely invisible to the non-frame-aware browser. A <NOFRAMES> element, included at the end of the <FRAMESET>, supplies a substitute <BODY> for other browsers, analogous to the ALT= attribute of the <IMG> tag. It may comprise a rude comment ("If you don't have frames, get 'em!") or an entire page, complete with images, tables and other elements. As good as frames may look on a large, high-resolution monitor, there are serious problems displaying them on smaller, lower-resolution systems. The viewer quickly resents having to devote valuable real estate to a non-scrolling logo or tool bar that can't be dismissed from an already limited viewing space. Another objection to frames is that a page accessed within a frame cannot easily be printed or added to the user's hotlist or bookmark file. For the foreseeable future, if you use frames, it is essential to provide a clear and complete alternative in the <NOFRAMES> element.

Scripting. Though this capability is provided only on some platforms, Netscape, since version 2.0, supports the powerful Java scripting language, in which compiled programs (applets) are run on the client system, frequently to perform interactive tasks. Microsoft is promoting its own ActiveX technology, which achieves similar goals, but is more closely integrated with, and limited to, the Windows95 operating system. Netscape is also promoting Javascript, a simpler interpreted (non-compiled) form of event programming. Like all scripted actions these can raise security concerns, but they are more often annoying than harmful. For example, one popular use of Javascript so far has been the introduction of scrolling "ticker-tape" text in the browser's status line; this slows down performance, makes the status line nearly useless for anything else, and relegates "important" messages to a part of the display that appears in non-adjustable small print on a gray background!

Java applets were first embedded in Web documents by means of the HTML 3.2 <APPLET> element, but this is now superseded by the HTML 4.0 <OBJECT> element. JavaScript is rather uneasily integrated into HTML through a <SCRIPT> element, but the actual instructions are embedded in HTML comments <!-- instructions here --> within <SCRIPT> blocks. Any text within a <SCRIPT> element that is not within the comment will be displayed to clients without script capabilities, but ignored by the script engine. To compensate for faulty parsing by popular browsers, the character > must not occur in a comment; compliance with SGML also dictates that the string "--" must not occur until the end of the comment. It is essential that everything in a <SCRIPT> block be checked to make sure it is in the proper location for either execution or public viewing. While the use of scripted actions on the web is increasing rapidly, it is essential that alternative means be provided to access a site's information, since many users prefer to use browsers without these technologies, or to disable them in the user settings.

The new <OBJECT> element provides a way to link or embed applets or multimedia objects, with provision for multiple fallback options on systems that cannot handle the desired object. For example, an author may specify a Java applet, which may be replaced with an animated video if Java is not available; this in turn may be replaced by a static image, or even a fully marked-up textual alternative. The <OBJECT> element also includes a better model for client-side image maps than that included in HTML 3.2. Unfortunately, browser developers have been slow to implement this useful element, and especially its fallback features.

Style sheets. According to the W3 Consortium:

Style sheets describe how documents are presented on screens, in print, or perhaps how they are pronounced.... By attaching style sheets to structured documents on the web (e.g. HTML), authors and readers can influence the presentation of documents without sacrificing device-independence or adding new HTML tags.

Style sheets have been part of the Web from its inception. Cascading Style Sheets, level 1 (CSS1) are the focus of current efforts to incorporate style on the Web. They have already been implemented by major browsers (partially in Microsoft Internet Explorer 3.0 and Netscape Communicator 4.0, but more fully in Internet Explorer 4.0 and Opera 3.50). CSS1 style sheets may be embedded in an HTML document, or may exist separately, being linked to an entire site or group of documents, providing a distinctive "house style." Linked style sheets eliminate the need to clutter web pages with repetitive <FONT> tags or invisible "spacer images." Authors can use style sheets to specify presentation, including margins, leading, spacing, and font details (size, color, face) for various classes of text; when these classes are named in any document, the browser will use all the stylistic information associated with that class to render the text. Users may define their own style sheets, incorporating their personal browsing preferences. Web authors who are interested in graphic design, typography, and fine presentation, should study the CSS Pointers Group site, and the Web Design Group's CSS reference, along with the W3 Consortium's CSS1 and CSS2 recommended specifications.

Conclusions. Implementation of new HTML features (objects, table enhancements, internationalization) is still uneven. If you choose to use them, (1) make provisions for those whose clients don't recognize them, (2) be prepared for changes in future releases, and (3) don't be annoying—frames, java, and javascript have a potential for obnoxity that goes far beyond the <BLINK> tag. While some authors may be fascinated by frames and scripts, it is more important than ever that Web authors design not for a single browser or audience, but for everybody. If you design your documents by marking up their structure, if you adhere as closely as possible to the accepted standards at the W3 Consortium, and if you validate your work, you can be confident that your cherished work will be available to the maximum audience, and that future browser developments (including style sheets) will not make your pages obsolete, but can only make them look better.

Warren Steel (mudws@olemiss.edu)

[ Warren Steel | Sacred Harp Singing | Music Dept. | UM Home ]
Last modified 28 October 2008
Copyright © 1995-2002 D.W. Steel. All rights reserved.

HTML 4.0 Checked! except for annoying <BLINK> tag in the last paragraph.