One of the most intriguing things about the Internet is one of the facets least reported by the popular media. Not only does easily available global connectivity mean that you can get at your local library, various levels of government, and people from nearly every culture in the world easier than you can grab a soda from the corner store, it means you can make information that you are passionate about available to all those people and places as well. In a superlative expression of cutting out the middle man, most people on the Internet are able to make their social views, wants and needs, products, and other favorite things available to anyone with a "network tap." No representative government, economic distributorship, broadcast or print media, or note from your parents is needed. The first question you need ask yourself is what you want to put out therequickly followed by "How?"
As a potential Internet information provider, you have a number of methods, limited only by your creativity. Whether you elect to use a WWW, Gopher, FTP, finger, WAIS, e-mail, or other type of server depends on what kind of information you want to publish, how much of it there is, how often it changes, how many people might want it at once, what kind of hardware and software you have available to you, and the speed of the various network connections between you and your intended audience.
If, after considering all the ancillary issues, you still want to put your information on the World Wide Web with a HyperText Transport Protocol (HTTP) server, next consider whether you should use a server to which you may already have access, or if you should bring up a server of your own. Usually, the question boils down to an issue of control. Do you have access to a server where you can store your information? Do you have enough influence to get the server configured the way you want? Does that server have access to the resources like disk space, databases, and CPU that you want to use? Is the server managed by reliable folks or by "loose cannons"? Is the fun and experience of running your own Web server worth the headaches?
If your requirements and wants and needs are unique enough in your environment to merit running your own server, the next thing to decide is which server you want to run and on which platform.
Choosing HTTP server software is not unlike choosing political candidates. Do you choose a candidate based on their personality or do you decide based on "the issues"? Do you choose a hardware platform for your server and then choose which sever software to run, or do you pick the software and let that determine the hardware platform?
Web server software, like all software, is a work in progress. Ultimately, that means many promised features are as reliable as some campaign promises. Does this mean that you should make your decision based exclusively on hardware? Probably notmany HTTP servers offer attractive, robust features that make it well worth the possible hit of using weaker hardware. A good strategy is to decide which of your hosts can potentially have access to all the really important resources you want to publish on the Web and then further limit your choice based on the features of (and your initial experience with) the HTTP servers available for those platforms.
On VM, they're called service machines. On VMS they're detached processes. DOS has TSRs, Macintosh Systems have INITs, and UNIX has daemons. Whatever they're called in your Net-neighborhood, there is a cornucopia of choices that can serve up your HTTP.
The UNIX convention of referring to such software as a daemon is probably the most accurate. The term server usually refers to the machine on which a daemon resides, not to the software itself. The use of daemon (the Latin form of the middle-English demon) hails back to the Greek mythological daimon (meaning an attendant spirit or inferior divinity such as a deified hero). Depending on how well your daemon is running, however, you may be more inclined to assign it the more modern diabolical meaning.
For the purposes of this chapter, I'll follow the more popular convention of referring to the software itself as a server.
The current popularity and scalability of UNIX-like systems has led to the development of a variety of HTTP servers for such systems. Nowadays, UNIX is easily the most scaleable operating system in common usage, running on everything from 80286 PCs to mainframes and supercomputers and nearly everything in between. Perhaps the scope of its influence is caused because nowadays, UNIX is more of a culture from which specific operating systems arise (for example, SunOS, Linux, FreeBSD, SVR4, Dynix, and many others) than one specific product from one company. Technically, Novell may have the rights to the name UNIX, but that is more of a restriction on the name than on the culture and development associated with systems that are UNIX in everything but name.
The benefits of running your HTTP server on a UNIX platform include portability to more (or less) powerful hardware with a minimum of distress to your users, the probability that you'll never be at a loss for system administration talent, and the wealth of freely available software tools on the Internet to micromanage your data and coerce it into the information you want to present.
The risks of using a UNIX server include the persistent inaccessibility of a standard UNIX shell interface for neophyte users who may have learned computing on a PC or Macintosh. This limitation is important only if your information and data maintainers use the UNIX system directly. Many other options can make information available to your UNIX server and that insulate your users from UNIX; these options include e-mail robots, FTP, and remotely mounted file systems.
Another risk of using a UNIX system is that the security flaws of many UNIX systems are well known and can be overlooked by overworked, undertrained, or undermotivated system administrators. If this is likely to be true in your case, you may want to look at systems that have more obscure security flaws (and hope that anyone intent on doing harm to your system is correspondingly overworked, under trained, or under motivated). Keep in mind that the sheer popularity and widespread use of UNIX has resulted in many fine security tools and an abundance of well-thought-out research. It requires much less effort and talent to use a combination of these tools to shore up your system against attack than it does to try to repair the damage afterward. With a little persistence, any decent UNIX-style system and the network on which it resides can be made secure enough for all but the most demanding applications. Besides, if you're planning on putting ICBM launch codes on an Internet-connected HTTP server, perhaps you ought to re-think your application.
The following sections list some HTTP servers available for download from various sites on the Internet. Please note that this is a list, not a review. The issue of which server is best for you is one only you can decide. What follows is a brief overview of some of the available servers, their most prominent features, and some of the characteristics that distinguish them from other servers.
Server: |
NCSA HTTPD |
|
Platforms: |
SunOS 4, Solaris 2, SGI IRIX, HP-UX, AIX, Ultrix, DEC OSF/1, NeXT, Sequent, Linux, A/UX, SCO ODT, SVR4, Amdahl UTS 2.1, HP/Apollo Domain/OS, and possibly others. |
|
From: |
The National Center for Supercomputing Applications (NCSA) at the University of Illinois at Urbana-Champaign. |
|
URL: | |
|
Major Features: |
Wildcard-based access control, CGI 1.1, server-side includes, enhanced directory listing, script hooks for PEM/PGP-based encryption, imagemaps, user authentication, user-specific HTML source directories, HTTP 1.0 and 0.9 compatibility, HTML2 forms. |
|
Security: |
Host-based filtering, user ID/password authentication, hooks for external methods of PEM and PGP encryption. |
|
Installation: |
Precompiled binaries available for some systems. C source code compiles to a single binary that can be installed to run either as a free-standing daemon or launched on demand from the inetd daemon. |
|
Documentation: |
Copious online documentation in the form of WWW pages at NCSA covering both installation and use in minute detail. Web-based tutorials are online for topics including CGI, HTML2 forms, access control and user authentication, graphical map creation and use, and WAIS/HTTP integration. |
|
Age: |
Version 0.5 released September 1993. |
|
Status: |
Current version is 1.3 as of January 1995. |
|
Licensing: |
Public domain. |
Mention a UNIX-based HTTP server to most people in the Web community and the first one likely to come to mind is NCSA HTTPD. The National Center for Supercomputing Applications has gained notoriety on the Internet by producing freely available software for various platforms to make the Internet more accessible and enjoyable to use.
NCSA's HTTPD can be configured to run either as a stand-alone process or invoked through the UNIX inetd facility. There's a link off the main Web page for the server (see the chart at the beginning of this section) to walk you through all the steps necessary to get HTTPD up on your machine: from downloading the software through configuration and testing. Several precompiled binaries are available for several more popular machines that permit you to run the server without having to compile it first. (The variety of hardware platforms used by UNIX means that freely available software from the Internet is often distributed as source code that must be compiled to be made into a usable binary image.)
Once you've downloaded the software (and optionally compiled it), you must configure it. Configuration is accomplished by editing a handful of text files judiciously documented in the online documentation in the NCSA HTTPD Web pages. As a matter of fact, many HTTP servers and Web browsers now use Web-based documentation.
TIP When installing an HTTP daemon, it is probably a good idea to do so where you can have at least three windows open on your screen: one for the login session in which you do the installation, one for a Web browser pointed at the online documentation, and one for a Web browser pointed at your server for testing.
You configure NCSA HTTPD in three steps: server, resource, and access configuration. Each step is typically controlled by a series of directives in a different ASCII file. Most folks can get along by changing very few of the directives. Some folks, however, can't sleep at night unless they change everything from its default valueyou know who you are.
In the server configuration process, you can manipulate a variety of parameters concerning the way the HTTPD process runs, how it talks to the network, and where it should look for other configuration files. The resource configuration process dictates the specifics of how certain services are offered through your Web server, such as where the daemon should look for HTML documents, whether or not users can have private directories for their own Web publishing, and some specifics having to do with automatically generated file directories.
The access configuration process is where you can make or break the security and usability of your server. NCSA HTTPD provides two methods for controlling access to your Webspace: a global access configuration file and a per-directory access file. Global access-control files are mandatory and control such things as authentication files for users and groups, "sectioning directives" that control which access controls apply to which branches of your file system, default MIME-typing, various fine-tuning options for dynamically generated directory listings, as well as settings for the per-directory access-control files. The capabilities of the per-directory access-control file are nearly the same as for the global file with the exception of being able to limit other access-control files.
One of the greatest strengths of NCSA HTTPD lies in its near universalityit seems like everybody's doin' it. NCSA's server is the McDonald's of Web servers. It pleases most of the people most of the time and manages to come up with something fairly innovative every so often as well.
Server: |
CERN HTTPD |
|
Platforms: |
NeXTStep 3.2 (on NeXT and NeXT-386), SunOS 4.1.3, Solaris 2.3, HP-UX/Snake 9.0, IRIX 5.2, ULTRIX 4.3, AIX 3.2, OSF/1 2.0/3.0, VMS, Linux 1.1.29, Apple A/UX 3.1, Pyramid DC/OSx 1.1, NCR 3000 2.01.01, Amdahl UTS 2.1/4.2, SCO 3.0/3.2, DELL SVR4.0 Rev 2, Unisys SVR4, Intergraph Clipper C300/C400. |
|
From: |
Centre Europien de Recherche Nucliaire (CERN, The European Laboratory for Particle Physics). |
|
URL: | |
|
Major Features: |
Features vary from platform to platform. server-side scripts, imagemaps, CGI 1.1, CGI index search interface, HTML2 forms. |
|
Security: |
Host-based filtering, user ID/password authentication, access- control lists, group-based filtering, server functions as a caching multiapplication proxy gateway to serve out services such as remote HTTP, FTP, Gopher, WAIS, and NNTP to firewalled networks. |
|
Installation: |
Source in C. Binaries available for NeXTStep 3.2 (on NeXT and NeXT-386), SunOS 4.1.3, Solaris 2.3, HP-UX/Snake 9.0, IRIX 5.2, ULTRIX 4.3, AIX 3.2, OSF/1 2.0/3.0, VMS, Linux 1.1.29, SCO 3.0/3.2, Intergraph Clipper C300/C400. May be called from inetd or run as a stand-alone server. |
|
Documentation: |
User Guide, Installation Guide, bug list, FAQ, and other documents available online at URL http://www.w3.org/hypertext/WWW/Daemon/Status.html. |
|
Age: |
Version 0.1 released June 1991. |
|
Status: |
Version 3.0 released September 1994. Current as of February 1995. |
|
Licensing: |
Public domain. |
Not too long ago, a chapter with the scope of this one would have taken only a few pages. In the summer of 1991, just over two years after Tim Berners-Lee proposed the World Wide Web project at CERN and about seven months after he prototyped the first graphical WWW browser on the NeXT, version 0.1 of the CERN HTTPD was released (see the chart at the beginning of this section). CERN HTTPD is the old salt of the UNIX Web server set. A couple of the appeals of CERN's Web server are its staying power and the fact that it comes from CERN, the birthmother of all Web sites.
CERN's HTTPD has more to recommend it than sentimentality. A special search directive in the configuration file permits easy setup of search scripts compliant with the Common Gateway Interface (CGI) 1.1. More about CGI later. For now, keep in mind that CGI is the current platform-independent industry standard for an API that describes how back-end scripts talk to a Web server.
Another strong feature of CERN's HTTPD is its ability to sit on a firewall machine and act as a proxy server for other protocols such as HTTP, FTP, NNTP, and WAIS that wouldn't otherwise be able to leave the premises of a sensitive site. Additionally, the server can cache proxied items and act as an internal proxy to an external proxy for sites wanting or needing double-firewalled protection.
If you want to squeeze more functionality out of this server, you can use it as a Wide Area Information Server (WAIS) gateway. Understand that as you pile functions on a single daemon and a machine with limited resources, performance drops proportionately.
The CERN server isn't alone in this regard. Part of the fine art of system administration is determining when you cross the threshold of offering too many services for the resources you have available. Your site may have world-class expertise but be limited to a couple of Sun 3/60s for all your servers, or you may be at the other end of the spectrum and be resource wealthy but lack the time, inclination, or training for strategic Web service. If, like most system administrators, you have a reach that exceeds your grasp, you are well advised to back off a bit and sacrifice variety for consistency. Your users will thank you for it.
Server: |
GN |
|
Platforms: |
Generic UNIX-style applications with configuration accommodations made for AIX, AUX, BSD_386, Convex 0S, HPUX, Irix, Linux, NeXT, OSF1, Pyramid, SCO, Sequent, Solaris2, Sun OS4, SVR4, and Ultrix. |
|
From: |
Northwestern University. |
|
URL: | |
|
Major Features: |
Combination Gopher0 and HTTP 1.0 server. Local WAIS searches, but no WAIS gateway; global and per-directory access control based on explicit or wildcarded domain name or IP address, structured file server, automatic decompression of data files, Gopher-style scripts. |
|
Security: |
Host-based filtering. |
|
Installation: |
Source is in C. Can be run either as a stand-alone daemon or under inetd. No precompiled binaries available at canonical source. Well-done HTML installation documentation included with source. |
|
Documentation: |
Moderate documentation at Gopher URL. |
|
Age: |
Version 1.0 released October 1993. |
|
Status: |
Current version is 2.21 as of February 1995. |
|
Licensing: |
GNU public license: free for any use including commercial or distribution. |
Pioneers who use any new technology not only have to adapt themselves to the new way of thinking but often have to adapt the previous technologies as well.
A good example is Northwestern University's GN server. GN can serve up your data using Gopher0 protocol to Gopher clients, and HTTP 1.0 to Web browsers. If you like, it can even do both out of the same TCP port! GN looks at the syntax of incoming information requests and behaves accordingly.
Like all the servers described so far, GN can either run as a stand-alone daemon or be called from the UNIX inetd facility. As is par for the course, GN is distributed in C source code and must be compiled to work on your intended machine.
A few global configuration settings are changed in a source code header file before compilation. The rest of GN's capabilities are controlled by menu files in each directory. These files are roughly equivalent to the .Links files in the University of Minnesota's Gopher server.
GN permits proprietors of existing Gopherspace to ease themselves into the WWW painlessly. Once the existing University of Minnesota Gopher tree is converted over to GN format, the resulting menu files can be left as-is or augmented to contain any desired amount of HyperText Markup Language. GN generates HTML from the existing Gopher tree when a Web client connects. The menu files can also contain any amount of HTML: from adding a few icons to an existing Gopher-style menu to a full HTML document.
GN also has the attractive feature of being able to serve structured files as a two-layer hierarchy. An e-mail folder, for example, can be split into separate records, each starting with a From header line, and listed in a menu generated on-the-fly by their respective Subject lines. A little creative symbolic linking at the UNIX shell layer can easily result in multiple presentations of the same data based on different fields.
Another nifty feature of GN is that you can keep your data files compressed and instruct GN to decompress them just before serving them out to the public. Although doing so uses up more bandwidth and time, you may want to uncompress files before serving them up if you use a compression method your target audience may not easily be able to handle.
COMPARING HTTP SERVERS
The Enterprise Integration Technologies Corporation has put together a rather novel Webmaster's Starter Kit. This is a form-based installation kit for an enhanced version of NCSA HTTPD. The form asks you for a few bits of basic information about you, your site, and your host and then walks you through an Web-automated process that generates a server remotely; downloads, unpacks, and runs a shell archive; and installs and sets up your server.
The EIT HTTPD is also distinguished by a set of enhancements above and beyond the stock NCSA HTTPD. The server has been modified to give "kinder and gentler" error messages to remote users; it can also be tuned to give priority to document-specific or user-specific tasks and can redirect HTTP requests elsewhere during intentional downtime. A handful of enhancements and utilities are packaged in the kit, including a watchdog process to automatically restart the server if it hangs, a form-based home-page generator, a C library of CGI functions, a link verifier to aid in spotting lame links in hypertext documents, and a utility to convert e-mail folders into cross-linked hypertext.
EIT HTTPD can be found at URL http://wsk.eit.com/wsk/doc/.
Plexus is an HTTP server written in perl (which, depending on who you ask and what kind of mood they're in, stands for either Practical Extraction and Report Language or Pathologically Eclectic Rubbish Lister). The perl language has become a utilitarian staple of the UNIX community; the ease of integration with other perl programs must be numbered among Plexus' strengths. Unlike other UNIX HTTP servers, which are written in C and must be compiled, Plexus is interpreted because it's written in perl. Under normal circumstances, this is a significant performance hit, but chances are that you can run Plexus without suffering the performance ills of other interpreted languages.
Plexus was written with easy extensibility in mind; its online documentation includes a tutorial on writing your own gateways. This should make Plexus a strong contender for those sites that have unique data to be presented on the Web in a fairly standardized manner.
Plexus can be found at URL http://bsdi.com/server/doc/plexus.html.
Netsite is the commercial-grade Web server from Netscape Communications Corp. Netsite is a commercial product (it isn't free), but many of the folks who pay for it believe it's worth every bit of the price. The server, which is installed by using HTML2 forms, supports the standard suite of features expected from any high-end Web server and comes in two flavors. The Netsite Communications Server is a standard HTTP server designed to achieve high performance and low host resource utilization. The Netsite Commerce Server is an enhanced version of the Communications Server that purports to offer a bulletproof platform for exchange of the types of information critical to electronic commerce. Through use of RSA public-key cryptographic technology, the Commerce Server offers a combination of data encryption and server authentication that should have a fundamental impact on the amount of secure commerce taking place over the Internet.
Information about acquiring Netsite is available at http://www.mcom.com/MCOM/products_docs/server.html.
Sometimes, less is more. On the Internet, everyone should be able to be both an information provider and an information consumer. Not only should you be able to keep tabs on your local, state, and federal representatives without going through the press, but they should able keep in touch with the cultures they represent without relying exclusively on pollsters. On the World Wide Web, this idea implies the availability of personal Web serversubiquitous pieces of software that can be dropped into any or all personal workstations to serve out information from or concerning one person.
Personal Web servers aren't the only way this can be accomplished, of course. Many of the UNIX servers described in the first part of this chapter permit access to individual home directories from which users can put their own documents out on the Web. An advantage of this approach is that, although it's not strictly so by virtue of the operating system, UNIX machines often deliver more processing resources to each user than do individual PCs. An advantage of a personal Web server is that having unrestricted control over a server permits you to put items in place that require a level of privileged access and support that may not be practical in some organizations with a shared Web server. Some of these items are clickable maps and HTML2 fill-out forms. A happy medium for many people is to put the lion's share of the Web resources on larger multiuser hosts and to link to personal Web servers for resources that they can't put other places (either for reasons of practicality or resource availability).
Web servers that live on various versions of Microsoft Windows, OS/2, the Mac OS, or other traditionally single-user machines aren't necessarily underpowered relations of their larger Web cousins. Some of the Web servers you've browsed recently may actually have been PCs. So-called personal computers are now both client and server. On such systems, the only significant detraction may be that small personal computer operating systems don't have the rich multiuser tradition that UNIX, VM, or VMS do. Consequently, when pressing such machines into multiuser service, it's not unusual to find systemic weaknesses in drivers or software that aren't easily discovered in single-user circumstances. When you set out to use the machine on your desk as a Web server, be prepared for an interestingalthough not necessarily unprofitablelife.
Server: |
Windows HTTPD |
|
Platforms: |
Microsoft Windows with a WinSock 1.1 compliant driver. |
|
From: |
Robert B. Denny |
|
URL: | |
|
Major Features: |
HTTP 0.9/1.0, CGI 1.1 compliance with either DOS Virtual Machine processes or by native Windows applications, forms support, imagemaps, optional multithreading for up to 16 simultaneous users, optional Visual Basic DLL preloading for heavy CGI usage, enhanced directory listing, useful utilities. |
|
Security: |
Host-based filtering, user ID/password authentication. |
|
Installation: |
Distributed as a precompiled, ready-to-run MS Windows binary bundled with related files in a zipped archive. |
|
Documentation: |
Unzipped package includes online HTML documentation. |
|
History: |
Evolved from NCSA HTTPD for Windows, which grew out of the UNIX HTTPD. |
|
Status: |
Current version is 1.4 as of January 1995. |
|
Licensing: |
GNU public license: free for personal or non-commercial use. 30-day free trial followed by registration fee for commercial use. |
To say that the Windows HTTPD is surprising is an understatement. A test machine documented in the server's home page handled well over 25,000 requests an hour, averaging around 4K each. Although there were some problems with TCP session management between some WinSock DLLs and HTTPD at press time (see http://www.city.net/win-httpd/tcp-report.html for more information), those problems can be worked around until the WinSock libraries in question are up to snuff.
Windows HTTPD can easily be characterized as an HTTP server for people who don't like running HTTP servers. On one level, you can download the zipped file, extract it into the default c:\httpd directory, double-click on the HTTPD.EXE from File Manager, and presto-chango, you're running a Web site (albeit one with the default demo Web pages that come with the server). On another level, the rich suite of utilities and features that come with or can be easily acquired for this server definitely make it a major player.
The utilities and added functionalities that, in some cases, have been contributed by the enthusiastic user community that Windows HTTPD has acquired are available through the URL listed in the chart at the beginning of this section. These utilities and added features include VB Server Stats, which generates periodic reports about the use of your server; a guide to using DOS perl for DOS-CGI scripts; and an imagemap editor that makes maintenance of hot regions within clickable imagemaps even easier than on UNIX.
Server: |
HTTPS |
|
Platforms: |
Intel, DEC Alpha, or MIPS processor running Windows NT 3.1 final release with TCP/IP installed. |
|
From: |
European Microsoft Windows NT Academic Centre (EMWAC). |
|
URL: |
http://emwac.ed.ac.uk/html/internet_toolchest/https/contents.htm |
|
Major Features: |
HTTP 1.0, CGI, HTML2 forms, installed as a Windows NT service, multithreaded, integrates with WAISTOOLS for Windows NT for local WAIS searches, works with all network interfaces on multi-homed machines. |
|
Security: |
Multi-homed host support. Additional security features in commercial version. |
|
Installation: |
Configuration done through the Windows NT control panel. |
|
Documentation: |
Online at the canonical Website for the software as well as included in numerous formats in the distribution. |
|
Age: |
Still in beta, version 1.0 not released at press time. |
|
Status: |
Version 0.96 current (beta version, but is freely available) as of February 1995. |
|
Licensing: |
Free. Commercial version is also available; see Web page for details. |
If you like the HTTPD for Windows but crave more horsepower, try HTTPS from the European Microsoft Windows NT Academic Centre. Not only will you run on the swifter horse of Windows NT, but you have the option of running on DEC Alpha or MIPS hardware as well.
One of the nice features of HTTPS is that you can run it on multi-homed machines (machines with more than one network interface) and serve the Web out of each interface. This makes HTTPS worth examining for sites with non-passable firewalls between two or more networks, although security compartmentalization of your Web tree can't really be done with the freeware version of this software (you'll have to acquire the commercial version to use WWW basic authentication and access control). The commercial version is also more of a full-blown caching firewall for HTTP, Gopher, and FTP services.
Like its Windows 3.1 cousin, HTTPS is a snap to install. Just FTP to emwac.ed.ac.uk and get the zip file containing the software appropriate for your processor from the /pub/https directory. Both DEC Alpha and Intel versions are available for download. The server installs just like any other Windows NT service and is configurable from the control panel.
At the very least, servers like HTTPS are niche software. They provide a robust platform for Web development and publication for shops that haven't yet embraced UNIX and don't have access to other Web-servable hosts systems such as VM and VMS. It's not unreasonable to look forward to the day in the not-too-distant future when Windows NT-based Web servers will inherit the mantle of VM-based and VMS-based servers, whose data will be available by back-end gateways that talk to the more user-friendly Windows NT.
As Web servers become easier to install and configure, the majority of work is put where it ought to be: providing information. The greater the percentage of time you can spend actually publishing your data, the higher the quality of your product. Certainly, the current growth in HTML authoring tools, gateways between the Web and other information sources, and easily installed Web servers are having a tangible impact on the character of the Web and the accessibility of Web publishing to noncomputer specialists.
Server: |
OS2HTTPD |
|
Platforms: |
IBM OS/2 with IBM TCP/IP for OS/2 Base Kit 2.0 or later. |
|
From: |
Frankie Fan <kfan@netcom.com> |
|
URL: | |
|
Major Features: |
HTTP 0.9/1.0, CGI 1.0/1.1/HTBIN, HTML2 forms, imagemaps, access control, server-side includes, HPFS and FAT filenames supported, REXX and OS/2 executable server scripts. |
|
Security: |
Host-based filtering. |
|
Installation: |
As with the other servers in this class, simply download the software, unzip the archive into the specified directories, and run it. |
|
Documentation: |
HTML documentation in the distribution archive as well as Web-based documentation at the canonical URL. |
|
Age: |
Ported from NCSA HTTPD 1.3 |
|
Status: |
Current version is 1.04 as of February 1995. |
|
Licensing: |
Free for non-commercial use. |
This section wouldn't be complete without including a reference to an HTTP server for OS/2. OS/2 has long been the also-ran of PC operating systems, but with the more recent releases, including OS/2 Warp, OS/2 has developed a sizable and devoted following.
Although many of the servers described so far in this chapter have been direct or indirect ports of the NCSA HTTPD, OS2HTTPD seems to have retained a sizable amount of the functionality of the original. Big Iron workers on the local mainframe will feel right at home developing server scripts in REXX.
One of the strong benefits of running a server in OS/2 or Windows NT is that they are true operating systems and provide stronger interprocess insulation than does MS Windows 3.x or DOS. It's much less likely that an errant process will cause your Web server to become "lost in cyberspace." When you run a Web server on Windows 3.1, however, you are subject to a range of reliability levels that are usually directly proportional to the amount of shareware experimentation or resource-envelope pushing in which you engage. For every person who has to reboot his or her Windows machine three or four times a day, there's probably some joker who's had Windows running continuously since it was originally installed. Go figure.
With all the servers available for various platforms, it should come as no surprise that there's one available for the Macintosh. With their built-in AppleTalk ports, Macintoshes have been network animals almost since their inception. For years, AppleTalk was the most pervasive network protocol in the world. Nowadays, Macs are no slouch when it comes to being both client and servers on the Internet, as demonstrated by the capabilities of the Macintosh HTTP server described in the following section.
Server: |
MacHTTP |
|
Platforms: |
68000-series or Power Macintoshes with System 7 and Mac TCP. |
|
From: |
BIAP Systems, Inc. |
|
URL: | |
|
Major Features: |
Optional FV-Bridge application for online commerce, CGI access to FileMaker Pro, access control by address and domain name, user ID and password authentication, remote management from other WWW browsers using AppleEvents, CGI scripts can be AppleScript or any application that returns AppleEvents (such as HyperCard), tunable performance, and usage restrictions. |
|
Security: |
Host-based filtering, user ID/password authentication, administratively defined "security realms." |
|
Installation: |
Drag-and-drop installation. Entire configuration is kept in one file, most of which can be edited indirectly through menu choices in HTTP. |
|
Documentation: |
Extensive on-Web documentation at URL listed above. |
|
Age: |
Version 1.3 released in May 1994. |
|
Status: |
Release 2.0 was current as of February 1995. |
|
Licensing: |
30-day evaluation followed by license fee. See canonical Web site listed above. |
MacHTTP has the kind of features you expect to find on a UNIX serverand a number of Mac-specific features found on no other server (see the chart at the beginning of this section). In fact, the online technical documentation states "Without MacHTTP, you'd be forced to do this on a UNIX box!"
In a roundabout way, that statement points to one of the appeals of Mac HTTP. If left-brained people are more likely to use Macintoshes, Amigas, NeXTs, or X servers running on Berkeley-extracted UNIX versions and right-brained people are more likely to use PCs, 3270 terminals attached to VM or MVS boxes, or System V UNIX at the Bourne shell prompt, MacHTTP is likely to be more appealing to the left-brained crowd. Not that there isn't plenty here to appeal to the right-wing_uh_make that right-brain set.
The additional FV-Bridge application works with the server for developing credit-card charge-out applications for Internet-based commerce. The server also permits the standard set of access restrictions available on many other servers: restriction by IP address, by domain name, or based on user ID and password sets. In addition, the on-Web documentation gives examples of how to interface MacHTTP to Filemaker Pro, Hypercard, and Mac perl.
Servers with that kind of capability and expandability can easily become indispensable once pressed into service; what was once a weird little toy in the corner of somebody's office may one day become a critical link in a new service that wasn't previously possible. The real value of Web servers like this one isn't necessarily in the tools they replace but in the entirely new applications that they make practical.
It's always amusing to hear someone say something along the lines of "I've been a VM system programmer for 20 years, but someone in management went to a trade show and now they want to shut down the mainframe and 'go client-server.'" Mainframes and client-server relationships aren't necessarily mutually exclusive. The fact that many of the more popular client-server applications run on PCs and UNIX machines is an accident of history: those systems happened to come into wide acceptance at about the same time the philosophy of client-server computing was catching on. These systems didn't have a lot of critical applications to market them, so it was natural to use PCs and UNIX for client-server applications. Mainframes happened to have been bearing an established load of centralized nonnetworked mission-critical applications, so it did not seem necessary to market mainframes as part of the client-server equation. Big Iron shops have been paying the price for that for quite a while.
Ask anyone what they don't like about mainframes; in with the arguably valid points about the cost of maintenance, cost of storage, and length of the hardware product cycle are an equal number of invalid points about 3270 interfaces, slow response, and centralization as opposed to client-server orientation. An HTTP server is an ideal application to breathe new perceptions into an allegedly tired platform.
When the mainframe is an HTTP server, there's no 3270 interfacejust a Web browser. Unless users try to log in to the host or give the host an obvious "mainframy" name like webvm1, there's no way most users can discern a Web server on a mainframe from one on a UNIX box. The 3270 interface problem becomes a non-issue. Slow response time is a function of system management. If users use their PCs, Macs, X terminals, and so on as their front ends and point Web browsers at the mainframe instead of using TN3270, SNA virtual terminals, or 3270 bysynch terminals to run applications on the mainframe, a large unnecessary portion of the system load is eliminated. By making mainframe data available over a Web server instead of relying on proprietary VM-based applications, a good mainframe can be heard but not seen.
Server: |
WebShare |
|
Platforms: |
CMS 5 or later, with CMS pipelines and REXX/Sockets. |
|
From: |
Rick Troth <troth@ua1vm.ua.edu> |
|
URL: | |
|
Major Features: |
HTTP 0.9/1.0, CGI 1.0/1.1, personal Web pages, REXX CGI scripts, works on multiple-homed hosts, redirection, imagemaps, uses minidisks and shared file system. |
|
Security: |
Simple remote user identification through an RFC 1413-type ident transaction. |
|
Installation: |
REXX source code. Runs "off the shelf" and is available in either VMARC or TAR format. |
|
Documentation: |
http://ua1vm.ua.edu/htbin/cmshelp?task+httpd as well as additional documentation in the distribution archive. |
|
Age: |
Release 1.0 distributed in January 1994. |
|
Status: |
Release 2.2 current as of February 1995. |
|
Licensing: |
Free software; see http://ua1vm.ua.edu/~troth/rickvmsw/cmshttpd.copyright for restrictions. |
Of the VM-based Web servers available, WebShare is the most feature rich. Through HTML2 forms with a REXX back-end engine, local mainframe data on either minidisks or the shared file system can be served in any way that can be described in REXX. Through use of a multi-homed host, appropriate REXX CGI scripts, and a related package, WebShare supports simple remote user identification by putting the result of an ident service transaction in the appropriate CGI variable
All things considered, WebShare is a pretty good chance to put your money where your mouth is when defending the life of your mainframe.
The lowest common denominator is a big deal on the WWW. One of the tenets of the WWW philosophy is that users shouldn't be disenfranchised from Web access because of the machine on their desk. Although it's obvious that not all users can achieve parity of access quality, it's a perfectly realizable goal for anyone with any type of access to the Internet to have access to the great majority of information on the Web.
The same month the very first Web browser was developed at CERN (a graphical browser for the NeXT), work began on the first lowest common denominator browser: the Line Mode Browser. Development for low-end platforms has always been a common thread on the Web. Arguments that the WWW can be used only by the privileged few "information haves" (those with graphical displays) and not by the silent majority of "information have-nots" (those with text-only displays or no direct path to the Internet) just don't wash. The belief of some neophytes that the entire World Wide Web is somehow "inside Mosaic" is, naturally, invalid. Like any myth, however, that assertion has a kernel of truth. The sheer Nirvanic joy of a graphical Web browser with adequate CPU, graphics, sound, a decent Web site to browse, backbone bandwidth to get there, and other ergonomic considerations puts a lowest-common-denominator interface like e-mail to shame. Compared to the plethora of resources now available to graphical Web interfaces, many of them may as well be "inside Mosaic" (or Netscape or whatever is your personal preference for a browser).
The Internet has always had to contend with making services available to folks who have only e-mail access to the Internet. In days gone by, that contingent was made up of customers from major providers of structured network services such as CompuServe or Prodigy. As more providers join the structured-provider fray, and they all begin to offer larger amounts of direct or nearly direct Internet access to an unprecedented number of users, most e-mail orphans tend to be users on under-engineered or highly secured corporate networks.
Agora, from the W3 consortium, is a browse-by-proxy e-mail server that lets anyone with e-mail access to the Internet browse the Web from the perspective of the mail server. Hypertext documents are returned as text; links can be browsed by return mail. An informational URL can be found at http://info.cern.ch/hypertext/WWW/Agora/Overview.html. Agora is still being beta tested, but if you want to run it, request a copy of the software by sending e-mail to agora-request@mail.w3.org.
A PASSING REFERENCE TO SERVERS FOR OTHER SYSTEMS
Internet applications are often replete with features, many of which were never imagined when the application was first conceived. The WWW and its associated applications aren't immune to this tendency. As is frequently the case, the more capable an application is, the more its users want "just one more thing" until eventually these additions outnumber the original features. Such is the nature of application growth on the Internet.
The following sections briefly survey many of features important to WWW applications, especially servers.
Multimedia Internet Mail Extensions, or MIME, was developed to leverage more functionality from RFC 822/SMTP-type e-mail applications. MIME permits the attachment of single or multipart attachments such as program data, graphics, sound, video, or anything else that can be described within the context of a formal "MIME-type."
HTTP provides a framework for browsers to make either simple or full requests from the HTTP server. Simple requests are considered HTTP 0.9 requests; full requests are considered HTTP 1.0 requests.
HTTP 0.9 is provided as a stripped-down version of HTTP for more streamlined and less capable implementations. In an HTTP 0.9 request, a browser requests a document from an HTTP server and makes certain assumptions about content based on context. The mechanics of this request can vary from client to client, but content is often presumed based on the file extension. An extension of .HTML or .HTM is usually assumed to go along with an HTML document; the .TXT extension usually goes with text, and so on. This system works much of the time, but it's far from bulletproof.
In an HTTP 1.0 request, the browser requests a document and then rattles off a list of MIME types it can accept, such as text/plain, video/QuickTime, audio/basic and so on. If the requested document contains only MIME types listed as acceptable by the browser, it is encapsulated into a MIME-format message and transmitted to the client. If not, a variety of status or error conditions and renegotiation are possible, depending on the implementations of the browser and the server.
One tangible benefit of using MIME types to define the format of data transmitted from server to client is that the user of the client can micro-manage how individual types of data are handled: the type of GIF or JPEG viewer preferred, whether they want to print PostScript as soon as they receive the file, view it on-screen, save it to a file, and so on. UNIX clients frequently use the Metamail package to define how certain MIME types are handled. PC and Macintosh clients usually have a menu set or a dedicated configuration file to perform this function.
One of the wonderful things about the WWW is that each document can have its own unique user interface. A popular way of doing this is to create an inline graphic with clickable "hot spots" a user can point to, select, and use to control the behavior of the Web page. For example, a page may have a map of a campus; clicking on a given building may display a blueprint for that particular building. Another page may have a graphic with a photographic roster of a board of directors; clicking on a particular director may get you a roster of that director's department. Yet another page may have a fanciful company logo surrounded by custom icons representing various online services, each of which is linked to a telnet://, wais://, or another http:// URL.
Servers that support imagemaps typically have some sort table defining what coordinates define what hot spots on the map and what action should be taken. Consider this example from the NCSA HTTPD for UNIX imagemap tutorial:
<SB2 BOX>
default /X11/mosaic/public/none.html
rect http://cui_www.unige.ch/w3catalog 15,8 135,39
rect gopher://rs5.loc.gov/11/global 245,86 504,143
rect http://nearnet.gnn.com/GNN-ORA.html 117,122 175,158
This example lists the necessary coordinates for three rectangles on a graphic that, if clicked, should result in the Web browser jumping to the specified URL. The default line at top specifies what should happen when the user clicks somewhere on the graphic that isn't within any of the defined areas.
A variety of utilities are available on the Net to assist in the construction of these imagemap tables; the specifics of how the server does image mapping varies between implementations.
Not surprisingly, many folks have started using Web browsers as their primary FTP clients. Information providers who know this can incorporate HTTP links to fancy directory listings in their Web pages as a kind of value-added file archive.
If you disable directory listings, users specifying an HTTP URL ending with a trailing slash will either load the designated default page, if there is one (on some systems, this is be called index.html), or fail. If directory listings are enabled, and the default page doesn't exist in a given directory, the user will probably see a simple listing showing the parent and child directories as well as each of the visible files in that given directory and a brief description of file type, like text/plain.
Extended or fancy directory listings permit the server administrator to designate an icon bitmap for specific file types, descriptive text to go at the top of the listing for individual subdirectories, as well as descriptive text for individual files like "Don't download this GIF across state lines, please."
Server-side include scripts are an easy way to solve certain challenges, but they can also give users enough rope to both hang themselves and compromise the system. Typically, the name of the script is included in the text of a document and the script is executed each time a given document is retrieved. The convention for such scripts with NCSA HTTPD, for example, is as follows:
<!#command tag1="value1" tag2="value2" >
Placing the script in a comment ensures that the HTML source is more portable if moved to another server.
Server-side includes are often used for applications that aren't large enough to warrant spending time putting together a full-blown HTML2 form (such as automatic counters in Web pages or pointers to dynamic events). A benefit of server-side include scripts is that no burden is placed on the Web browser to support HTML2 forms. An obvious detraction is that any time a user can give any type of unrestricted access to outside users, there's the possibility that your system can be compromised.
It is probably a good idea to use this features only on rare occasions and only for simple applications that have been gone over with a fine-tooth comb before release.
As anyone who's done any amount of Web publishing can tell you, the world doesn't naturally exist in HTML format. Anytime someone attempts to put some naturally existing information into HTML format, it means one of two things: they're writing HTML by hand or they're using a gateway to "HTML-ify" their information. One method isn't necessarily better than the other.
Some applications call for creative expression, such as system front ends, home pages, personal pages, and so on. Some applications call for consistent formatting, minimal distractions, and few if any hours spent in actual conversion of the data from another format. In the latter case, gateways are most useful.
Gateways abound. If you have any data in a commonly used format, the question is likely whether a gateway is available. Gateways, converters, and filters are available on the Net for getting HTML out of XFind, Hytelnet, e-mail, Usenet news, VMS Help, WAIS, Oracle, Techinfo, Lotus Notes, O2 OQL, finger, PostScript, Microsoft Word, WordPerfect, FrameMaker, troff, LaTeX/BibTeX, Texinfo, DECwrite, Interleaf, QuarkXPress, PageMaker, Scribe, PowerPoint, Linuxdoc, Rainbow, C, C++, Lisp, Fortran, SGML, Emacs Info, and AmigaGuide.
Whether some of these items are browser or server issues depends on how you implement them. If you have a directory full of MS Word documents and a CGI script that dynamically converts them to HTML just before viewing, it is a server issue. Other ways of using these gateways, converters, and filters may make them more appropriate for a browser.
One of the reasons why there are so many archives of extensions and scripts for various Web servers is the Common Gateway Interface, or CGI. CGI is an Application Programming Interface specification that addresses environment variables, command lines, standard input, and standard output issues, In particular, CGI is concerned with how servers communicate with back-end scripts that dynamically produce the output of a given Web page instead of relying on the contents of a static file. CGI scripts, for example, are the most common way of handling HTML2 fill-out forms.
Most servers permit CGI scripts to be written in one of a number of languages. Depending on your server, CGI scripts can be written in perl (the most common language for CGI scripts), C, Visual Basic, DCL, and even DOS Batch (if you're a masochist).
All servers that support the CGI can process HTML2 fill-out forms. Web browsers can use two methods to send the contents of a form to the HTTP server: GET and POST. The difference between the two is how the contents are conveyed to the server. Each HTTP client has its default method, but the method may be explicitly selected within the HTML for the form. (Refer to Chapter 28, "Navigating the World Wide Web," for more information.)
One of the big differences between HTTP 0.9 and HTTP 1.0 is the pallet of methods that can be used within the context of an HTTP session to talk between the Web browser client and the HTTP server. With HTTP 0.9, only GET is used; as of the December 1994 second edition, HTTP 1.0 requires the GET and HEAD methods and defines optional methods called PUT, POST, DELETE, LINK, and UNLINK. All this really means is that HTTP1.0 servers and clients can provide a great deal more flexibility when communicating back and forth.
The GET method means that the Web browser connects to the server and tries to "get" a specific Universal Resource Locator. The HEAD method works just like the GET method but it doesn't return any actual documents (just header information). The vast majority of Web browsers use the GET method.
The POST method was specifically created for passing information that is "subordinate of the resource identified by the URL in the Request-Line" (in plain talk, this means that if you've got a Web page that requests information from you, you can use the POST method to return it to the server).
Both the GET and POST methods can be used for HTML2-type forms. If the GET method is used, the contents of your form are appended to the right side of the URL when the form is submitted to the server and passed to the CGI script as a variable called QUERY_STRING. If the POST method is used, the length of the data submitted by the form is passed to the CGI script as the variable CONTENT_LENGTH and the data is passed in standard input.
NOTE It is recommended that you use POST instead of GET whenever possible because the method used to pass the subordinate data to the server with GET has a tendency to run into arbitrary URL length restrictions on some servers.
The WWW is the first killer application on the Internet that has attracted wide-spread interest in electronic commerce. Already, a plethora of companies are selling their goods individually and through numerous online shopping malls. There is, however, a fly in the ointment.
Communication of financial information over the Internet seems to require that such information pass from the source to the destination in privacy. That requires communication that is either physically or virtually secure. Because the Internet is built on shared communications media, virtual security seems to be the only choice available. Virtual security implies encryption of dataand that's the problem.
The NIST, NSA, and U.S. Commerce Department place restrictions on cryptography when it's used for data encryption. This can very well mean that to use encryption for electronic commerce on the Internet, steps must be taken to ensure the citizenship of any users of such services as well as the national origin of all sites ultimately receiving data from such servers. On the Internet, or even on any complex network a fraction of the size of the Internet, this is a practical impossibility.
Political quandary not withstanding, here are a few URLs that may help you sort out what your legal obligations are and help you come to a basic understanding of some of the issues involved:
Cryptography Export Control Archives:
http://www.cygnus.com/~gnu/export.html
Cryptography home page:
http://retriever.cs.umbc.edu/~mohan/Work/crypt.html
Technical progress usually outstrips social progress, and the security of the WWW is no exception. Web security has advanced on several fronts; the following sections provide a brief survey of some of the developments.
Most HTTP servers provide access control. IP address-based access control is the ability to block or permit access to a server or individual document based on all or part of the IP address of the requesting host. An example of this is to permit all of the University of North Texas to access a server by explicitly permitting their entire block of IP addresses, which all begin with 129.120, to access the service.
DNS-based access control is the ability to afford the same type of protection based on all or some of the requesting host's Internet domain name. An example of DNS-based access control is to permit the University of North Texas site to access the server by explicitly permitting all hosts that have an Internet domain name ending in unt.edu (such as vaxa.acs.unt.edu or mvssp.coe.unt.edu).
Because no Internet access can take place without an IP address, IP-address restriction seems to be a good method for ensuring security. Because nearly all hosts at most sites are assigned domain names, DNS-based access control seems to be a sound idea as well.
Although these approaches great starting points, there are enough problems with them to warrant looking for additional protection. IP address-based filtering and DNS-based filtering seek to protect you from machines that violate your security. Nevertheless, machines don't violate securitypeople do. People can move from machine to machine and from a blocked site to a trusted site with relative ease. Because there's no shortage of unauthenticated access to the Internet, it's a little like trying to build a dam with chicken wire: nice foundation, but now we need some cement. Additionally, recent Internet history has proven that the expertise required to forge or "spoof" Internet addresses and domain names is not in short supply.
In October 1993, the Access Authorization Basic Protection Scheme was incorporated into the CERN WWW Common Library, which is used for numerous HTTP clients and servers. The AA scheme, sometimes known as WWW Telnet-level access authorization, provides for clear text transmission of user IDs and passwords and has hooks for the use of schemes like Kerberose or RIPEM. On the surface, this approach seems to be an excellent safeguard, but the scope of its protection is narrow. If used with an encryption method, such as NCSA has done with their use of PGP/PEM encryption, the AA scheme provides an excellent way of authenticating access to resources over the Net.
A central problem the AA scheme does not address, however, is the security of the document as it passes from one relatively secure site, over umpteen broadcast networks with an unknown level of security, to another relatively secure site.
One of the proposals on the table that would address this problem is the Secure HTTP proposal by Rescorla and Schiffman of Enterprise Integration Technologies. This proposal provides for a spectrum of security measures including digital signatures, public-key encryption, privacy domains, and key exchanges.
Where Web security is concerned, it will be interesting to see how the social challenges presented by the technical capability of the Internet are addressed.