If you've read about the Internet lately, you've probably seen some mention of the World Wide Web. It has been featured in the daily newspapers, network television, major news magazines, and, of course, in the computer industry press. The World Wide Web (often called WWW, W3, or just the Web) is drawing people and organizations to the Internet faster than ever before; this time, reality comes close to living up to the hype. The Web is visually appealing, easy to use, rich with resources both useful and useless (but nearly always fun), and truly worldwide.
The World Wide Web is a distributed hypertext system. This means that documents contain links to other documents (hypertext links) and that documents on different machines, widely separated on the network, can link to each other. The Web is also a hypermedia system in that documents can contain sound clips and images; other media such as video clips are also common. Figure 28.1 shows a typical Web page with hypertext links highlighted and underlined.
FIGURE 28.1. A typical World Wide Web page.
Navigating the Web is easy. While looking at one document, you can follow a link to another document by a mouse click or key press. Your browser (the program you use to explore the Web) will load the new document and present it to you for viewing. You can continue to follow links from there, whether you are doing serious research or just exploring for fun. Usually, the only difficult part of the process is finding your starting point for a topic, but there are Web indexes, catalogs, and search facilities that help with that process, too.
The Web has quickly become a crucial part of the Internet and its culture. If Usenet and mailing lists can be compared to the town square and restaurants where people gather and con-verse, then the World Wide Web fills the roles of libraries, shopping centers, and museums. That's really just the start of it. As you will see, the Web is being used for such a wide variety of things that the only way to really get a complete feel for it is just to use itto browse around, hang out, and grow familiar with the World Wide Web.
The Web itself is relatively new (the project was officially begun in 1990), but it has its roots in ideas that go back to 1945. This is when Vannevar Bush, then Director of the U.S. Office of Scientific Research and Development, published an article in The Atlantic Monthly called "As We May Think." In the article, Bush proposed the development of a mechanical filing and knowledge-retrieval system that he called the memex. Physically, Bush's dream bears little resemblance to the Web of today in that the memex was a personal knowledge system rather than a world-wide knowledge sharing system, and it was mechanical, making use of mirrors and microfilm. But if one ignores the physical design, the memex is the first rough design for a hypertext system: a system that permits documents to contain live, active links to other documents or sections of other documents so that relationships and cross-references can be examined instantly. The basic ideas of Bush's device can be seen today in the Web and similar systems, such as HyperCard on the Macintosh. Bush's paper was so influential that most consider him the "father of hypertext." (For those who are interested, "As We May Think" is available on the Web as http://www.csi.uottawa.ca/~dduchier/misc/vbush/as-we-may-think.html.)
In the late 1950s and 1960s, several people began attempting to realize Bush's ideas. The most famous of these people are Douglas Englebart and Ted Nelson. Englebart, working at the Stanford Research Institute, developed some experimental hypertext systems that are still amazing to people today. In the process, he invented window systems, the mouse, and several other important innovations. Nelson is known more for his writings about hypertext and his predictions about the eventual uses and forms of hypertext systems. Nelson coined the word "hypertext." At the time (and for many years after), many considered his ideas to be wildly improbable, but the World Wide Web comes quite close to his predictions.
Since Englebart and Nelson did their first experiments in the 1960s, they and others have continued to work on the concept, and some simple hypertext systems have seen wide use (Microsoft Windows' help system is an example). However, the World Wide Web, partly because it has a simple architecture and partly because it makes use of the Internet, is the first system that has demonstrated the rapid growth in information and connectedness that was envisioned by Bush, Englebart, and Nelson.
The World Wide Web project was started at CERN, the European Laboratory for Particle Physics (the initials come from its older French name, Conseil Europeen pour la Recherche Nucleaire) in Geneva, Switzerland, by a group led by Tim Berners-Lee. It was intended as a vehicle for real-time collaboration among physicists. Partly because of this need and because physicists at CERN wanted to collaborate with each other and also with other physicists around the world, the Web was not designed as an internal CERN system; it was designed to be world- wide. Berners-Lee and others on the WWW team made several decisions that helped the Web succeed where other hypertext systems did not. They made all of the specifications open and publicly available; they designed a linking system that operated over the network and that was inclusive (that is, it could be adapted to use existing and future protocols, not just the ones designed to be a part of the Web). They also created sample implementations of Web servers and browsers and made them freely available.
The Web project started at CERN in 1990 and was showing slow but steady growth in 1992. This is when Marc Andreesen began working on similar collaboration and information sharing at NCSA (National Center for Supercomputing Applications). NCSA is a federally funded organization located at the University of Illinois at Urbana-Champaign (UIUC). Marc was a student who had a part-time job at NCSA developing tools for researchers. He learned about the World Wide Web and realized that he didn't have to start from scratchthe Web was designed to solve the same sorts of problems that Marc was working on. At the time, however, Web browsers were primarily text based, so Marc began writing a new browser that could mix text and graphics, taking full advantage of modern window systems such as X Window, the Macintosh, and Microsoft Windows. His new browser, combined with a few extensions to existing WWW specifications, became NCSA Mosaic. Compared to what had gone before, Mosaic was flashy and dazzling, combining text and graphics in the same documents. Many people began to see the potential of the Web. Mosaic, more than any other development, sparked the current interest in the Web and its explosive growth. (In fact, Mosaic made such an impression that many people think that it is the Web, and speak of "Mosaic pages." But the Web is independent of any one browser.)
Although CERN and NCSA were the primary forces behind the beginnings of the Web, neither controls it any longer. Because the specifications are open and inclusive, the current development of the Web is being driven by numerous forces. These forces include research groups, the Internet Society, information providers, and several new companies that have been started in order to capitalize on the Web's growth and popularity (one of which was co-founded by Mosaic's creator, Marc Andreesen). CERN and the Massachusetts Institute of Technology have started the World Wide Web Organization (W3O), a consortium intended to help coordinate future developments. However, the W3O, as it's called, is also open; its purpose is not to control but to provide a forum for cooperationto help keep the Web from splintering into incompatible enclaves. The Web has a life of its own.
To explore the Web, you use a program called a Web browser (or sometimes, a Web client). The browser makes connections over the Internet to network services, requesting network resources and displaying them to you. Most commonly, the network services are specialized Web servers, and the resources that are displayed are hypertext documents (often called Web pages). However, one of the things that has made the Web such a success is that it can also connect you to some of the more traditional network services, such as FTP, Telnet, and Gopher, all using the same interface and using the hypertext documents to link them together and provide explanations.
While reading hypertext with a Web browser, certain parts of the text are highlighted or marked in some special way; those marks or highlights are links (refer back to Figure 28.1). A link is a pointer to some other network resource or document. By clicking a link with your mouse (or selecting it in some other way with a terminal-based browser), you direct the browser to follow the link. It will connect to the appropriate machine on the network, request the resource, and display it to you.
Documents on the Web that deal with similar topics are often linked together, and there are indexes that provide searchable catalogs of the available resources. By following links from one document to another, you can explore and research a topic quickly and thoroughly, and often you'll learn about related topics you hadn't even considered. If you've ever spent an hour following "see also" references in an encyclopedia just learning about new things, you'll have an idea of what the Web is like. The Web can be more fun, though, because following a link is easier and faster than looking up a new entry in an encyclopedia; the Web usually has more personality and it's often more thorough. Anyone can publish on the Web, and a great many people are publishing the things that interest them. Whether it's a serious topic or just a fun diversion, chances are that there's information about it on the World Wide Web.
The hypertext documents on the Web are written in Hypertext Markup Language (HTML), version 2.0. HTML allows documents to contain other things besides text and links; for example, pictures, sound and video clips, and forms with fields that you can fill out and send back to the document's author. However, not all browsers support all the features of HTML 2.0. Today, it's hard to get the most out of the Web if your browser doesn't support forms and inline images. HTML 3.0 is currently being designed and will support features such as tables and mathematics.
Before you can start exploring the Web, you'll need to get a Web browser. It also helps to have a few "helper applications" your browser can use to display special document types.
Whether your computer is a PC running Windows, a Macintosh, or a UNIX machine, you have several browsers to choose from. There are browsers available for nearly every computer environment; some are free and some are licensed for a fee. There are two primary types of browsers: text-based, for use on ordinary terminals or terminal emulators, and graphical, for use on systems with a graphical user interface.
The following sections list some of the most popular browsers and explain how to acquire them. (A list of WWW browsers is maintained by the World Wide Web Organization and can be found at http://www.w3.org/hypertext/WWW/Clients.html.)
The Web offers several browsers that come in versions for different platforms. This section describes those browsers. One of the following may be a good choice for you if you occasionally must access the Web from different types of machines:
In addition to the multiplatform browsers mentioned in the preceding section, there are two full-featured browsers, Cello and WinWeb, designed specifically for Windows (see Table 28.1).
Browser |
Location |
|
Netscape Navigator | |
|
NCSA Mosaic | |
|
AIR Mosaic (demo version) | |
|
Quadralay WebWorks Mosaic | |
|
Cello | |
|
WinWeb |
In addition to the multiplatform browsers, there are two other browsers that run only on the Macintosh: MacWeb, a full-featured browser similar to WinWeb for Windows, and Samba, a basic browser from CERN that does not support some of the newer features, such as forms (see Table 28.2).
Browser |
Location |
|
Netscape Navigator | |
|
NCSA Mosaic | |
|
AIR Mosaic | |
|
Samba | |
|
MacWeb |
All multiplatform browsers run on UNIX under the X Window system, in addition to the following:
See Table 28.3 for a list of UNIX browsers and their locations.
Browser |
Location |
|
Netscape Navigator | |
|
NCSA Mosaic | |
|
AIR Mosaic | |
|
Quadralay WebWorks Mosaic | |
|
tkWWW | |
|
Arena |
Finally, for UNIX users without access to X, there are several text-mode browsers:
See Table 28.4 for a list of text-mode browsers and their locations.
Browser |
Location |
|
Lynx | |
|
Emacs w3-mode | |
|
CERN line-mode browser |
In addition to browsers, it also helps to have some external viewers or helper applications that the browser can use to show you documents that it doesn't understand. For example, most browsers understand one or two graphics formats, but if they find an unusual format that they can't display directly, they can call a helper application to show the file.
You probably already have some programs that can be used as helper applications on your system: image viewers, PostScript previewers, and audio and video players. However, if you need others, it helps to know where to find them. Netscape has a page that describes the helper applications available for various systems; it's located at http://home.mcom.com/assist/helper_apps/index.html.
The way to configure browsers so that they know which helper applications to call for particular document types depends on what kind of computer you use and on the browser itself. Most UNIX browsers consult a system configuration file called mailcap (because it's also used for electronic mail configuration). Some Windows browsers also use a mailcap file, but others store this information in their INI files and have dialog-based configuration procedures. Macintosh browsers also have a different mechanism. Check your browser's documentation for information on this.
The Web is still a relatively new technology and is changing rather rapidly. It's worthwhile to try to keep your browser current so that you can use new features of the Web as they become available. Check in at your browser's home page every month or two to see what the current version status is.
There is a "Browser Checkup page" on the Web: just go to the page and it will figure out what browser and what version you're using. The page lets you know whether the browser you're using is the most up-to-date version available or not. The Browser Checkup http://www.city.net/checkup.cgi.
Although many different browsers are available, they are remarkably similar in their interfaces. After having learned to use one browser, you should be able to learn to use a different one very easily.
In the following sections, you'll learn most of the things that browsers can do. I'll cover both graphical and textual browsers. In general, I won't get too specific about how particular browsers do things, but I will occasionally provide specific examples. For graphical browsers, I'll refer to Netscape Navigator for Windows (which is the browser used for most of the screen shots in this chapter), and I'll also give some specific examples of the textual browsers because their interfaces differ more than the graphical browsers.
When you start your browser, it will first load an initial page, usually called a home page. ("Home page" is sort of a confusing term because you'll see later that it's used to mean several different things. But in this section, I always use it to refer to the initial page that your browser loads on startup.) Most browsers let you configure the Uniform Resource Locator (URL), which is used to load the home page; each browser has a default. Many of the early browsers use the CERN World Wide Web Project page, but most of the newer generation of browsers have their own home pages that introduce you to the browser and its features. All default browser home pages, however, have links to important Web pages that lead to the rest of the Web. Figure 28.2 shows Netscape Navigator's home page.
FIGURE 28.2. Netscape Navigator's home page.
Unless your browser comes with its own home page loaded on to your local disk, you'll probably notice that loading the default home page is a little slow. This might be because it is far away from you on the network or because your link to the network is relatively slow. However, part of the reason is that the machines that hold those home pages are busy; thousands or millions of people all over the network are using that same browser and whenever they start up, they all connect to the same machine and load the same document. This is one reason that after you've gained some familiarity with the Web, you may want to find another page that is closer to use as a home page or one that is on a machine that's less busy. You may even want to create your own home page (see Chapter 36, "Creating Web Pages with HTML," for information on how to build your own Web pages).
If you are in an organization that accesses the Internet from behind a firewall, you may have to use a proxy for your Web access. A Web proxy is a special program running on the firewall machine that forwards your Web requests to the Internet and returns the requested documents to your browser. Firewalls are special machines designed to shield an organization's internal network from the Internet, only permitting certain kinds of trusted traffic through. If you try to use the Web and always have trouble connecting to network hosts, you may be behind a firewall. Your browser will have to ask the proxy machine for each document, and the proxy will actually fetch the document from the Internet, passing it to the browser. Different browsers have different ways of handling this; you'll have to find out the name of your proxy; ask someone knowledgeable about your organization's Internet connection.
World Wide Web documents, or pages, usually consist of text and pictures, just like a typical page from a book or magazine. The biggest difference is that some parts of the text (or even some of the images) are active; they represent links to other documents. Usually, the links are highlighted in some way so that you can tell they are there. In most graphical browsers, links are represented in a different color than the rest of the text (and image links have a colored border). By clicking the link with the mouse, you can load the other document in place of the one you're currently reading. This is called traversing or selecting a link (see Figure 28.3).
FIGURE 28.3. Traversing a link.
You may notice that a message in a status area of your browser's window changes as you move the mouse or cursor from link to link. If this happens, the browser is showing you the URLs of the documents associated with the links. Most browsers have an option to do this, but in some it is turned off by default. URLs are the "addresses" used on the Web to represent documents (see Chapter 6, "Domain Names and Internet Addresses," for more information about URLs).
If you are using a textual browser, links can't be highlighted by a different color, so they are handled a bit differently. Lynx highlights links in bold and uses the cursor to mark the link that's ready to be selected. You use the up-arrow and down-arrow keys to move from link to link; finally, you traverse one by using the right-arrow key (Return and Enter also work). The CERN line-mode browser marks each link currently visible on-screen with a number in square brackets, like this: [1]. At the bottom of the screen, the browser prompts you for the number of a link to traverse.
Images, of course, are a problem for textual browsers. Well-designed Web pages specify text to be displayed in place of an image if the browser can't display the image. However, not all Web authors bother to specify alternate text for images; if this is the case, a textual browser usually won't display anything at all unless the image is also defined as a link, in which case Lynx displays the link as [IMAGE].
Sometimes, an image represents a special type of link called an imagemap. With imagemaps, clicking on different parts of the image links you to different places. This is usually used to present a fancy, graphical menu of choices. Usually, the sections of the image are labeled with text in some way to tell you where to click for particular types of information. Figure 28.4 shows an imagemap with clearly labeled regions; Figure 28.5 shows an imagemap that is a literal map; for example, clicking a city name produces a weather forecast for that city.
FIGURE 28.4. An imagemap with clearly marked regions.
FIGURE 28.5. Another imagemap.
If you're using a textual browser, you can't use imagemaps. Some graphical browsers don't support them, either. However, most pages with imagemaps contain a textual link that says something like "go here to see the textual version of these pages" for the benefit of those who can't use the images.
Each link specifies a URL, which is the address of the document to be loaded when the link is traversed. There are several different types of URLs, and each type works a little differently; therefore, it's sometimes helpful to know a little bit about them. Chapter 6, "Domain Names and Internet Addresses," describes URLs in detail, but this section gives a brief overview with an emphasis on how your browser handles each different type.
The most common type of link is HTTP. HTTP is the HyperText Transfer Protocol, designed especially for the World Wide Web at CERN. HTTP is very lightweight and fast and is used to serve most documents found on the Web today. When you traverse an HTTP link, your browser will connect to the appropriate machine, retrieve the document, and close the connection. The HTTP server will tell your browser what type of document it is so that your browser can display it correctly.
The FTP URL type is almost as common. When you traverse an FTP link, your browser connects to a machine using the File Transfer Protocol, retrieves the appropriate document, and closes the connection (see Chapter 22, "FTP: Fetching Files from Everywhere"). The FTP server does not tell the browser about the type of document, so the browser has to guess based on the file's name. Sometimes your browser's best guess may be incorrect.
If an FTP URL specifies a directory instead of an actual file, the browser will get a listing of that directory and construct a special document for display. The document is the directory listing with each file or directory name being an FTP link to that file or directory and with a special link at the top to the parent directory (see Figure 28.6). By using this facility, your Web browser can be used to explore anonymous FTP sites on the Internet.
FIGURE 28.6. Browsing an FTP site using a Web browser.
The file URL type is much like FTP except that it specifies a file on a local disk accessible to the browser. As you may guess, file URLs aren't very common on the Web at large, although you may find them useful in your own Web documents.
The Gopher URL type is used to refer to information available on Gopher servers (see Chap-ter 25, "Using and Finding Gophers"). Gopher is also a typed protocol, like HTTP, so your browser can do different things with documents based on their types. Typically, Gopher URLs refer to Gopher menus or text files. Your browser displays a Gopher menu as a menu of links, again with a special link at the top to take you to the parent menu.
The news URL type consists of two kinds, both of which are used to access Usenet news (see Chapter 17, "Reading and Posting the News: Using Usenet"). One type links to an individual news article that will be retrieved from your news server and displayed for you. The other type links to an entire newsgroup, and your browser will show you a list of the messages in that newsgroup with each line linked to the corresponding article. Unlike most other URL types, news URLs don't contain the name of a machine; your standard news server is used; if you don't have access to Usenet already, you can't access it through the Web. Also, news articles eventually expire and are removed from news servers, so news URLs that refer to articles are not as long-lived as most other URLs. For this reason, you don't often see news URLs in Web documents.
A Telnet URL is unusual in that it doesn't retrieve a document for you; it connects you to a service. The URL contains the name of a machine to connect to using the Internet's Telnet protocol (see Chapter 23, "Logging in to Other Computers with Telnet and Rlogin"). When you traverse a Telnet link, usually your browser will start a Telnet session (possibly in another window) with which you can log in to the other machine. Often, a Telnet URL is used to connect you to a library catalog system or some other information server that is accessed by using Telnet. Another URL type, TN3270, is similar, but it uses a version of Telnet that works with mainframe applications designed for IBM's 3270-style terminals.
A relatively new URL type, mailto, is another type that doesn't actually retrieve a document. When you traverse a mailto URL, if your browser supports this type, you are prompted to compose a mail message that will be sent to the electronic mail address contained in the URL.
Most documents you find on the Web are in HTML (HyperText Markup Language) format, described in Chapter 36, "Creating Web Pages with HTML"; however, other types of documents exist. In some cases, you see plain text documents, images, video clips, sound recordings, Gopher menus, PostScript documents, and others. Some of these special formats are understood directly by your browser, but most of them must be displayed by helper applications. If you don't have the appropriate viewer for a document type, or if your browser doesn't know about it, you may not be able to display some documents.
As you traverse links from one document to another, most browsers keep track of your path through the Web: the list of documents you have gone through to get to your current document. By using a button, menu entry, or command (usually named "Back"), you can back up through those documents, eventually returning to your home page. After backing up through a few documents, some browsers allow you to move forward again, retracing your steps to the point at which you started backing up, without having to find and reselect the same links you traversed to get there.
This capability is a great aid to exploring. If you find a page that looks interesting, you can explore any of the links that lead from that page without worrying about losing it. If you branch off in one direction and eventually exhaust that lead, or if it turns out to be a blind alley, you can just back up and start again on another trail.
Another aid to exploration is the bookmark (some browsers use hotlists instead, but the ideas are very similar). Like bookmarks for paper books, Web bookmarks are markers that permit you to quickly return to places of interest. You can save bookmarks to pages that are particularly interesting or that you may want to return to later. After you've saved the bookmark, it's a simple matter to return to that page later by choosing the bookmark from your list. Some browsers let you choose a bookmark from a dialog box, others have a menu of bookmarks, but the principle is the same.
Some browsers also support hierarchical bookmarks so that you can organize a long list of bookmarks into categories, making them easier to manage. Figure 28.7 shows the process of selecting a bookmark from one of the categories I've established in my copy of Netscape Navigator.
FIGURE 28.7. Selecting a bookmark.
Many Web pages contain forms, which you can fill out with different kinds of information. In some cases, these forms just collect information for a survey; but in other cases, the forms permit you to enter information for a database query, request information on a topic, draw a picture, or play a game. Figure 28.8 shows a form used for searching the Web for pages devoted to a particular topic.
FIGURE 28.8. A simple Web form.
Not all browsers support forms, but most do; even the textual browser Lynx has good support for forms.
Forms can contain several different types of elements that will be familiar to users of window-based applications. Entries are blanks in which you can enter text. Check boxes are boxes with labels, and you can check as many as apply. Radio buttons are like check boxes, but they come in groups, and you can check only one at a time. Menus present a list of choices from which you can select one (and sometimes more). Finally, forms contain a Submit button, used to submit the form, and sometimes a Reset button, to clear the form and start over.
Different browsers handle forms in different ways, and textual browsers have to be especially creative. It can take some experimenting to figure out how to work all the different gadgets, but it's all right to experimentremember that you usually can't actually submit the form until you select the Submit button. (The exception is some forms that contain only one entry: pressing Return in the entry box can submit the form.) Furthermore, if you decide not to submit the form after you've started, just back up out of the page that contains the form, choose another link, and your browser will forget all about it.
If you run into trouble or need to know how to do something new, it helps to be able to read your browser's documentation. All graphical browsers have a Help menu, and the textual browsers also have help commands that let you view help files.
Some browsers come with documentation that you install on your disk, but others simply access their current documentation from the Web. This is usually a benefit because you always see the most up-to-date documents. If you're having trouble getting anything to work, though, it's a problem. All browsers come with preliminary documentation that you'll need to get up and running; but if you can't get things to work, and you have a browser that fetches the rest of its help files over the Net, you'll have to get a friend with a connection to help you.
If your access to the Web is over a slow link (or even if it's not), sometimes pages load slowly and you want to speed things up a bit.
One easy way to make things seem faster is to delay the loading of images. Most graphical browsers have this facility; once enabled, your browser will load all the text for each page but not the images. Small icons are usually shown in place of the images, and you can click an icon to request that a particular image be loaded; if a page looks interesting enough, you can request that all the images be loaded. Inline images can use up a lot of bandwidth, so cutting them out can save a lot of timeat the expense of less exciting Web pages. The good part of the tradeoff is that you don't have to choose all or nothing. You get to decide when you want to take the time for the images and when you don't.
Another strategy for speeding up your access is to make use of your browser's cache. Most browsers cache the documents they retrieve. When you move on to a new page, the browser saves the old page for a while, rather than throwing it away, in case you choose to revisit the page soon. In some cases, even if you don't revisit the same page, the cache will help because often pages at the same Web site share images used for markers, bullets, or dividers. If your browser loads those images into its cache for the first page at that site, subsequent pages load more quickly.
Some browsers cache old documents in memory, others use disk space; most often, it's a combination of both. Enabling the cache (if it's not enabled by default) or increasing the size of the cache can lead to increased performance.
Be careful with the cache, however, because the cache takes memory and disk storage that could be used by other applications, and you could end up slowing things down further if you make the cache too large. Be reasonable. If you have a PC with 8M of memory, using 4M for the Web cache will probably cause more problems than it will solve. Likewise, a disk cache uses disk space you might wish you had for other things. If you are running your Web browser on a multiuser machine, such as a UNIX system, you should go easy on the cache (and it might be a good idea not to use the disk cache at all) so that you don't cause too much trouble for other users.
If you're using the Web from within an organization with many Web users, such as a company or university, ask one of the network administrators if the organization has a shared Web cache. Shared caches look and act like proxies (refer to "Proxies," earlier in this chapter), and they cache Web documents locally for a group of users. If your slow link is between you and the cache, it won't really help very much; but often the slow link is outside the organization, and a shared cache could help a lot. Because it's shared, it's less wasteful to allocate a lot of disk space to the cache because it will only cache a single copy of a document retrieved by multiple users. The cache might decrease congestion on that slow link to the outside world, which would be a win for everyone in the organization, Web surfers or not.
One Web cache your organization might want to install is a part of the Harvest system, a system for building topical indexes of Web sites (http://harvest.cs.colorado.edu/harvest/). Harvest is discussed in detail in Chapter 30, "New Tools: FSP, Harvest, and Hyper-G."
Caches, whether local to your browser or shared with others, sometimes cause problems. Some documents on the Web change frequently; for example, pages that contain live images of some event. Such pages are supposed to identify themselves as short-lived so that they won't be cached. However, not all such pages are set up properly, and some caches have bugs; therefore, you might find that each time you reload the page, you get the same picture. If you're using a browser cache, you can usually clear the cache; but if you're using a shared cache, you'll need to reconfigure your browser to bypass the cache if you want to see the updates regularly. (You may need to shut down your browser to do that.)
Most browsers also provide a host of other functions. All browsers provide the capability to type the URL you want to open (this is usually done by using a button or menu entry marked Go or Open). You can also save documents to disk, print them, or mail them to a friend. After you've grown comfortable with the basics, it's worth spending an hour or so investigating the other possibilities offered by your browser.
After you are familiar with your browser, you may not want to see the browser's home page each time you start up. You may have another home page that you prefer (such as your organization's home page), or you may want to build one of your own, filled with links to sites you like to visit often. To do this, you need to configure your browser to use a different startup page.
Depending on your system and browser, the way to do this may be different. Many browsers now have menu options or dialogs for setting the home page. If you can't find such an option or if you're using Windows or UNIX, try setting the environment variable WWW_HOME to the URL of the desired page. In either case, your browser's documentation should explain what to do.
Because computers and people are involved, errors do occur on the Web. Sometimes machines are down, or there's a network problem that keeps you from reaching the machine you're interested in. At other times, a Web link points to a document that has moved, or possibly one that never existed in the first place; these are called broken links. Also, links occasionally point to documents you're not allowed to readsome documents on the Web are restricted to people within certain organizations.
The best way to handle such situations is to be prepared for them and not panic. If the problem looks like it might go away after a while, save a bookmark to the document and try again later (most browsers let you save a bookmark to a page even if you didn't load it successfully). In the case of a broken link, see whether the page that contained the link (or some other page at that site) contains the e-mail address of the author or maintainer of that page. Web authors usually appreciate hearing about problems with their documents, and they might reply with the correct URL. But you should also be prepared for the occasional disappointment. Sometimes, documents are withdrawn from the Web because the people or organizations responsible for them don't have time to keep them updated anymore or for other reasons. The Web is shifting and changing some, not just growing.
Webspace
To begin navigating the World Wide Web, it's important to have some understanding of the way it all fits together and where to start.
Many people, when they start exploring the Web for the first time, expect a unified structure, much like a directory structure, with a top-level entry point that "contains" the rest of the Web. However, like most of the Internet, the Web is not that well organized. There are "catalog" pages that attempt to categorize and index the entire Web, and there are introductory pages that attempt to provide good starting points for further exploration (as most of the browser home pages do), but none of them are complete. There is no official entry point to the World Wide Web.
You may find it helpful to think of the Web as a collection of islands in a vast ocean. Each island has some organization of its own. There are large islands and small islands, and there are connections between the islandsbut there is no overall authority; there is no "official" organization to the Web.
Newcomers to the Web find this confusing at times, but it is also one of the Web's (and the Internet's) strengths: A global organizing authority would slow things down and be a barrier to the growth of the Web. Instead of a global organizing authority, several groups are attempting to survey and categorize the entire Web on a regular basis. None of the indexes produced by these groups is complete, but at some point in the future, one may emerge as the catalog of the Web.
Think of it this way: In the conventional publishing world, there is no real catalog of everything that is published. Books in Print covers only the United States, and there is a British equivalent that covers the U.K. and some of the former British colonies. There is a system of International Standard Book Numbers (ISBNs) and International Standard Serial Numbers (ISSNs), but there are still books and periodicals published all over the world that are not registered according to either system. The closest thing to a global catalog that we have is the United States Library of Congress, and even that has no official international status; all it has is prestige. Because of that prestige, most publishers desire to have their publications cataloged by the Library of Congress, so they register a copy there.
Perhaps soon, one or another of the unofficial World Wide Web indexes or catalogs will achieve a similar level of prestige so that nearly all the people and organizations who publish information on the Web will register their pages with that organization. Until then, though, things are much less formal.
Although there are no "official" catalogs to the World Wide Web, there are several unofficial ones, and you can also build your own catalog of the pages that are interesting to you. If your browser has a "hotlist" or "bookmarks" so that you can save pointers to pages, use that tool. When you encounter a page that is particularly interesting, save a pointer to it. I find this particularly useful when I'm looking for one kind of information but stumble on another interesting page in the process. Rather than getting sidetracked on the new topic, I can just save a bookmark for later reference.
Other users maintain their own Web pages with links to their favorite pages on the Web. With the full structuring power of HTML at their disposal, such users have more freedom to organize their collection of links in a useful way. You may want to investigate this option if you find your collection of bookmarks getting unwieldy or if your browser doesn't allow you to organize your bookmarks hierarchically.
You usually don't have to store a reference to every page that is interesting to you. Most of the islands, or sites, in the Web have their own organization; from the site's home page, it is usually quite easy to find your way back to any other page at that site. In fact, most sites are organized in a loose hierarchy and provide links on each page to help you move up, down, backward, or forward in the tree of pages at that site.
Each site has slightly different ways of doing this, however, so you should learn to recognize the signs. Usually, there are links using the words Next or Forward, Previous or Backward, Back or Up, and Home. Following the Next links walks you through the pages at the site in a planned order, like reading the pages of a book in sequence. The links marked Previous let you go backward through that order. Up lets you move up a level in the hierarchy (usually this is equivalent to going back to the start of a major section). Links marked Home take you to the welcome page for the site. If you follow a link from somewhere else and land in the middle of a site that's organized in this way, it's easy to get to the main page for the site.
Of course, in addition to these special navigation links, there are still embedded links within the text of the pages. These might be cross-references to other pages at the same site, but remember that they can also take you away from the site completely, to related information at another site.
As you browse through the Web, you travel from place to place through pages scattered all over the Internet, building a trail as you go. Some people feel disoriented if the trail gets too long. It's easy to lose track of where you are, where you've been, and how to get back to where you started.
There are several ways to avoid disorientation. For newcomers to the Web, it's best to try to keep your trail from getting too long. When you find a place that's very interesting, with leads to a lot of other interesting places, save a bookmark to that place, back up to your starting point, and jump straight to your new treasure trove to begin further exploration.
If it's important to you to know where you are, turn on the option in your browser that displays the URL of the current document. Because most URLs contain the name of the machine where the document is located, you can usually tell something about "where you are" (akebono.stanford.edu is at Stanford University, for example). However, be prepared to go to companies, universities, and countries you've never heard of.
As you gain experience with the Web, you'll learn an easier way to avoid becoming disoriented; you'll realize that concepts like "where you are" and "how to get back" don't really matter. You're always at your keyboard, looking at a document from out in the Web someplace. Save references to places you might want to visit again, and just keep exploring. Occasionally, when I've reached the end of a long exploration and feel that I've exhausted all my leads, I start clicking Back just for fun, to see where I've been. Sometimes there are 30 or 40 links between my starting point and the eventual endpoint, and that's not counting the pages that turned out to be blind alleys. When I first started exploring the Web, I wanted to always know how far I was from my home page, but now I rarely know, and I don't feel lost. It just takes a while to get used to the vastness of the Web.
After you find a Web page on a particular topic, it's usually easy to find other related information because pages tend to have links to other pages on the same topicbut how do you find that first page?
Earlier, I mentioned indexes or catalogs of pages on the Web. There are several of these, and they can be used to search the Web for the information you need. I'm going to give you an example using the catalog I find most useful, called Yahoo (http://www.yahoo.com/). Other indexes use different ways of gathering and organizing information and different styles of searching, and one of the others might be a better match for the way you think. Two good alternatives are Lycos (http://lycos.cs.cmu.edu/) and WebCrawler (http://webcrawler.cs.washington.edu/WebCrawler/). One of the nice things about Yahoo is that it keeps a list of other Web catalogs, and there's a link to that list on almost every page in Yahoo. Therefore, if you want to try one of the alternatives, just go to Yahoo and follow the links to the other catalogs from the Yahoo welcome page.
Figure 28.9 is a picture of the Yahoo welcome page. You can see that it is organized into categories, and you can choose the category that seems most likely to contain the information you're looking for. Often, you can find information quickly this way, and it's a good way to explore when you don't know exactly what you're looking for.
FIGURE 28.9. Some of Yahoo's subject categories.
If you are uncertain about what category might be appropriate for your topic, or if it's something that could reasonably fall under more than one category, it might be better to use Yahoo's search facility. The next few figures illustrate the search process, starting with Figure 28.10, the Yahoo search form. I'm interested in learning about the comet that hit Jupiter a while back, but I can't remember the name of it, so I just enter a couple keywords that seem likely to get me what I want.
FIGURE 28.10. Starting a Yahoo search.
When I submit the form, Yahoo searches its entire catalog for pages that match the keywords I entered. Figure 28.11 shows the results of the search, and there are several pages about the comet impact to choose from. The result page has useful information about each of the matching pages, including a summary and a direct link to the page, so it's easy to take a look at the real thing after you've done the search. The search result also tells you how the page is classified in Yahoo's category system and includes a link to that category page so that you can investigate other pages listed in the same category. For the example, I'll follow a link to one of the comet pages that looks particularly interesting (see Figure 28.12).
FIGURE 28.11. The result page of a Yahoo search.
FIGURE 28.12. One of the pages found in the Yahoo search.
As you explore the Web, you'll build your own list of interesting places, and those places will lead to others. A little exploration tends to lead to more. To get started, though, here are a few pointers to places you may want to investigate. Some of them are just for fun, others have a wealth of serious and useful information, and still others fall somewhere in between.
Place |
Address |
|
Web Catalogs | |
|
Lycos | |
|
WebCrawler | |
|
Yahoo | |
|
Arts and Entertainment | |
|
Hollyweb | |
|
Internet Underground | |
|
MusicArchive | |
|
JazzWeb | |
|
Movie Database Browser | |
|
OTIS | |
|
Computers and the Internet | |
|
Best of the Web | |
|
Cool Site of the Day | |
|
Economics and the Internet | |
|
MPEG Movie Archive | |
|
SGI Silicon Studio | |
|
Xearth | |
|
Fun | |
|
Cybersight | |
|
Dr. Fun | |
|
World Birthday Web | |
|
Hobbies, Crafts, and Sports | |
|
America's Cup '95 | |
|
Juggling Information Service | |
|
LEGO Information | |
|
Origami Page | |
|
The Sports Server | |
|
Wine.com | |
|
Libraries, Museums, and Information Sources | |
|
French National Center for Art and Culture Georges Pompidou | |
|
Library of Congress | |
|
Raleigh News and Observer (NandO) | |
|
The United States Holocaust Memorial Museum | |
|
Usenet FAQs (Frequently Asked Questions lists) | |
|
The WebMuseum Science | |
|
Entomology at Colorado State University | |
|
Face of Venus | |
|
The Nine Planets |
http://seds.lpl.arizona.edu/nineplanets/nineplanets/nineplanets.html |
|
Physics e-Print archive | |
|
Purdue Weather Processor | |
|
Travel Guide to Australia | |
|
Paris | |
|
Yukon Visitor's Guide | |
|
Miscellaneous | |
|
CIA | |
|
PAWWS Portfolio Management Challenge | |
|
The White House |
In spite of all of the current excitement, and in spite of the fact that the Web is more dazzling and comes closer to fulfilling the promise of the Internet than anything that has gone before, it is clear that the World Wide Web is in its infancy. There are pressures to provide new functionality on the Web, to improve its performance and ease of use, and to use the Web for new purposes. There is a huge amount of research and development in progress with regard to the Web. It's impossible to predict what might happen, but in this section, I'll try to give you a snapshot of some of the directions people are talking about and some of the things that may come.
Some of these new technologies and ideas are actually in production, although they are not widespread and are still being refined. Others are currently being developed; still others are no more than gleams in someone's eye. Each of them, however, stands a good chance of someday becoming an important part of the World Wide Web.
Most documents on the Web are written in HTML, the HyperText Markup Language. HTML is quite simple, and its simplicity has made it easy for the Web to get started. However, people are starting to want more from HTML than it currently offers, and alternatives are being developed.
It's unlikely, though, that any one alternative will be chosen as the next generation HTML; that's because people disagree with what a document markup language should look like. Those who are using the Web for publishing technical information wish that HTML provided more versatile ways of describing the structure of their documents; those who are doing marketing and public relations work on the Web wish that HTML gave them more control over appearance. People disagree on how to accomplish these goals, and there are many who consider them incompatible; therefore, there are several different groups working on possible solutions. The alternatives mentioned in the following sections are prominent examples, but there are others.
Work is proceeding on a new version of HTML, called HTML 3.0 (the version most widely supported as of this writing is HTML 2.0). HTML 3.0 attempts to correct some of the most glaring flaws in the current HTML, without going too far (most people agree that HTML should remain relatively simple and easy to implement). The most important new features of version 3.0 are tables and mathematics. The new version also cleans up some of the messy or inconsistent details of the current version.
HTML 3.0 is currently supported by two browsers that run on UNIX: Arena (which was designed specifically as a testbed for new HTML ideas) and Emacs-w3, which implements all of HTML 3.0 except for mathematics. New versions of Netscape Navigator also support many HTML 3.0 features, including tables.
Many of those who are doing actual publishing on the Web want more versatile and flexible ways of describing the logical structure of their documents. This is particularly true for people publishing technical information that must be indexed and searched, in addition to being casually browsed by Web explorers. Many such people are hoping that the Web soon will support the Standard Generalized Markup Language (SGML).
SGML is actually a way of describing the structure of a whole class of documents. After that is done, you can mark up a particular document to show how it matches that structure. HTML is actually one simple application of SGML designed for typical textual documents; but those with more specialized documents (such as phone books and bibliographies) want to have other SGML document types to choose from when they publish on the Web.
Some information providers have chosen to store master copies of their documents in other SGML formats for searching and maintenance purposes, translating those formats to HTML for Web browsing. Other groups, however, are hoping to add SGML knowledge directly to Web browsers. Mosaic version 2.5 is supposed to come with a free version of a product called Panorama from SoftQuad Corporation. Panorama is capable of understanding any SGML document type and will work with Mosaic as a helper application.
Users who are more concerned with the appearance of their documents than the logical structure are hoping for better ways to control that appearance. Currently, HTML gives an author very little control over how the document will be displayed.
Most of the work in this direction centers on style sheets. Rather than turning HTML into a full-function formatting language, the goal is to come up with a second style language that can be used to specify how various parts of an HTML document should be formatted. Some of the proposals concentrate solely on HTML, while other style sheet proposals are meant to be useful with arbitrary SGML documents. One such proposal is DSSSL-Lite, which is based on the SGML-related DSSSL (Document Style Semantics and Specification Language) standard.
If the new document types are to become useful, there will have to be new browsers that understand them. Additionally, people are working on browsers that support different modes of use and more interactive documents.
Earlier, I mentioned that many Web browsers ship with only minimal documentation, with the rest being available on the Web. Other applications are starting to do the same thing. Several large software systems are now shipping without extensive user documentation. When you ask the documentation for help, it fires up a Web browser to access the documentation from the network. This can be wonderful if you're connected to the Net because the documentation is always up to date, and you don't have to waste shelf or disk space for documentation you might not use very often. Soon, we'll see browsers designed to make it easy for other applications to use the browsers this way. Mosaic for UNIX already has this capability.
Currently, most applications that find their documentation on the Web are designed for UNIX systems because it's very common for UNIX systems to be connected to the network. However, as network connectivity becomes more common in the PC and Macintosh worlds, you can expect to see many more programs use Web browsers as help systems. (Of course, there will always be systems that don't have network connections. Software developers should make sure that you can get a copy of the documentation for your local disk if you need to.)
It's already clear that browsers are going to have a hard time keeping up with all the things people want to do with the Web, and all the different kinds of documents out there. The browser you get tomorrow, which supports all the best new features, might be slow to pick up the next big advance; but not many people want to change browsers every six months.
One possible solution is to design browsers as componentwareinstead of one big application that has a built-in understanding of many different types of documents, these browsers will be loose confederations of cooperating components, each of which understands one type of document. When the main part of the browser encounters something it doesn't understand, it looks for a component that does understand it, gives it an area of the screen to draw on (if required), and hands the document over. To support a new type of document, you won't have to upgrade your whole browser; you'll just install a new component. You may already be familiar with non-Web applications that work this wayapplications that cooperate by using Microsoft's OLE (Object Linking and Embedding) system or the Apple-IBM OpenDoc system are examples of this idea.
Sun Microsystems Laboratories is developing HotJava, a new Web browser based partially on this philosophy. It can readily load new components to handle document types it doesn't understand. Interestingly, HotJava's components are written in a special languagecalled Javadesigned to be safe so that the components cannot do unpleasant things to your system, like erasing your files or installing viruses. As a result, HotJava users can trust new components they get from the network. HotJava has the capability to automatically locate new components on the network, download them, and install them, with no requirement for the user to even know what's happening. When using HotJava, upgrades happen transparently.
Figure 28.13 shows HotJava (running on a UNIX system) displaying a page with active content: When I clicked the three graphs, each began an animation of a sort algorithm, moving the bars around as I watched. The image in the figure was taken just after QSort finished, with the other two still running. HotJava doesn't have built-in knowledge of how to display that pageit searches for the extension and loads it from the network when it finds something it doesn't understand.
FIGURE 28.13. WebRunner (the original name of HotJava) displaying a page with active content.
The HotJava authors have written several other interesting Web applications that demonstrate a level of interactivity not found elsewhere on the Web. It's an interesting browser that's worth watching for. The home page for more information is http://java.sun.com/.
Another strategy, similar to the componentware approach, is to give helper applications a way to interact with the browser. Currently, documents viewed by helper applications are rather static (they can't contain links to other documents, for example). The Mosaic team is developing a system called the Common Client Interface (CCI), which allows helper applications to control the browser itself. This system will permit much more interactive documents and may be another way to achieve the same kinds of interesting applications the componentware browsers hope to achieve.
The current, loose organization of the Web has its advantages, but it has its problems, too. We're certain to see many different attempts to give the Web a little more structure.
Currently, links on the Web are done in terms of URLs, or Uniform Resource Locators. As the name implies, URLs contain information about the location of a document. If a document moves, URLs that used to be valid suddenly become invalid. (These are called broken links.)
One solution currently being worked on is to use Uniform Resource Names (URNs), which would not contain information about the document's location at all. URN servers on the network would help applications translate from URNs to URLs. After this system is in place, an information provider will be able to move a document without breaking all the links to it, as long as the information provider informs the URN servers of the change.
Another application for URNs is to provide "mirrored" resources. Consider the WWW server at CERN, the home of the Web. That server has a lot of useful information and is very popular. Partly as a result of that popularity, it's also overloaded a lot of the time, so retrievals from there are slow. Additionally, it's in Europe, which slows it down even more if you are trying to get to it from the United States. It would be great if there were a copy of all of the CERN Web pages (a mirror) in the U.S. as well, to be closer to a large group of users and to spread the load around. However, by using URLs, it wouldn't do a lot of good because links have to point to one site or the other, and most links would probably continue to point to CERN because it seems more "authoritative."
URNs may solve this problem. URN servers will translate an URN into multiple URLs, each of them pointing to a mirror site at a different location. Then the browser can decide which site is closer and retrieve the document from there.
In "Shared Caches," earlier in this chapter, I mentioned the use of caches to help speed access to commonly accessed documents. The Harvest cache system supports a hierarchy of caches with local caches first trying larger regional caches to see whether they have a copy of the document. Then the caches try a huge statewide cache before finally attempting to retrieve the document from its source. It will be interesting to see whether states or organizations begin setting up such groups of caches for Web access.
In addition to the indexes that attempt to catalog the entire Web, servers are starting to appear that have a more limited goal: They provide a clearinghouse for WWW information on a particular topic. These sites are already quite useful. One service they do not yet provide (but may in the future) is a notification service. It would be very nice if the clearinghouse site on, say, anthropology were able to notify you (through electronic mail, for example) when it learned of a new anthropological resource available on the Web.
Almost certainly, many of you will have read about the availability of pornographic material on the Internet. This is of great concern to many parents, and also to many schools that are trying to introduce their students to the Internet. Some researchers have proposed the creation of "rating servers" that could provide information on the "suitability" of information at a particular site. Browsers could be configured to check with a rating server first and to allow retrieval of a document only if it were approved by the rating server (or, more likely, if it were not disapproved).
Some people have been concerned that this proposal would engender censorship, but the nature of the Web means that adherence to the recommendations of such servers will almost certainly be voluntary. Furthermore, such rating servers will probably be useful for other things. One example is the question of scientific peer review. Currently, scientific papers published in journals are reviewed by other scientists to ensure that the research is valid and that it follows accepted scientific practices. On the Web, where anyone can publish whatever information they like, some similar system is needed so that readers can tell whether they are reading serious science or quackery. It may be that scientific organizations (such as the American Medical Association) will set up rating servers to inform readers of those papers that have been reviewed and found to be acceptable.
One of the ways in which the Web differs from Ted Nelson's early vision of hypertext is that links are not bidirectional; that is, when you're reading one document, you can't find out what other documents have linked to it. Sometimes it would be nice to know. Suppose that you were reading a document on the Web about a revolutionary new medical technique. Wouldn't it be nice to know what other people had to say about it? If others have read the paper, certainly they would link to it if they found it useful, or write a critique (also with a link to the original) if they found the research to be flawed. It would be nice to be able to follow those links back from the current document to all the other documents that refer to it.
The current Web design has links stored inside the source document so that you can't find a link from the destination document at all. However, it is possible that future developments may have link information stored separately in link servers so that you can learn about links based on either the source or the destination document. Obviously, backward links would work a little differently, but this capability will be useful, nonetheless.
Finally, people will certainly start to use the Web in new ways. Consider the following examples.
Already, URLs are the most common way for people on the Internet to tell each other the location of interesting resources. People send URLs around in mail and Usenet posts all the time. It's inevitable that the Web and e-mail will be tied together a little more tightly. One example is the X-URI header tag people are already adding to their e-mail and Usenet posts. Usually, it contains the URL of the sender's home page. Also, at least one e-mail program can automatically recognize URLs in the text of a message and highlight them so that the reader can click on them; the URL is then automatically passed to a Web browser to be displayed.
In the future, it may be common to send HTML in the body of e-mail messages, with mail programs that understand how to display it properly.
Almost from the beginning of the Web, people have found ways to interface existing programs and applications to it. These interfaces are called gateways, and there are gateways to interesting systems all over the Web (two examples are the Xerox map viewer, http://pubweb.parc.xerox.com/map/ and the Stock Quote server, http://www.secapl.com/cgi-bin/qs). Such gateways will continue to be an important part of the Web, and they will probably become even more common.
As completely new services come to the Internet, such as video conferencing or video on demand, we may see the Web serve as another kind of "gateway." We may use the Web as a directory of available services and the interface for accessing those services. It's easy to imagine a video catalog on the Web that you can browse and search, finally selecting a movie and arranging a time for it to be sent over the network to your home.
Perhaps the most important change we may see on the Web is not so much a change in function, but a change in perception. In reality, the Web is just a gigantic, shared network file system, similar in many ways to the file system on your hard disk. Currently, it's difficult for most people to actually use it that way because fast, full-time connections to the Internet are still somewhat uncommonbut that may not be the case for long.
After you have easy access to the Internet whenever you're using your computer, it becomes easy to use the Web as an extension of your hard disk. You don't have to download a personal copy of every document that interests you. Instead, you can just store a pointer to it and retrieve it whenever necessary. There are already people who are doing this on a regular basis. It's very useful to have many applications (not just specialized "browsers") that know how to retrieve files from the Web.
For now, when you are using the World Wide Web, you're aware of it: You're using a Web browser, flying around and exploring, and traversing new links. However, when other applications use the Web in the course of helping you do your work, it won't be so obvious. You may use the Web so often that you don't even think about itthe Web will become an extension of your computer.