On the Internet, people have a remarkable desire to share knowledge. Why altruism should be a feature of cyberspace is anyone's guess, but the pioneer spirit may have something to do with it. Just as the Wild West campfire always had room for a stranger (in contrast to today's urban scene), the database always has room for another terminal. One of the great tools for finding useful stuff in many databases is WAIS.
The Wide Area Information Server (WAIS, pronounced ways) attempts to harness the vast data resources of the Internet by making it easy to search for and retrieve information from remote databases, called sources in WAIS terminology.
Sources are collections of files that consist mostly of textual material. For example, if chemistry is your forte, you can find several journals on the subject through WAIS. WAIS servers not only help you find the right source, they also handle your access to it.
Like Gopher, WAIS systems use the client-server model to make navigating around data resources easy. Unlike Gopher, WAIS does the searching for you. Currently, more than 520 sources are available through WAIS servers. A WAIS client (run either on your own computer or on a remote system through Telnet) talks to a WAIS server and asks it to perform a search for data containing a specific word or words.
Most WAIS servers are free, which means that the data is occasionally eccentric and erratic. The data can also have great gaps in coverage on some subjects and more coverage than you can believe on others. For example, you can find tons of material in WAIS about chemistry and computer science, but sources on, say, art history or the theory of juggling, are nonexistent at the moment. New WAIS servers and sources are created from time to time, so a library of Van Gogh's writings may yet be established.
WAIS is simple to use, although its text-based interface is a little user hostile. The X Window client is much easier to use, but requires that you run X Window (of course). WAIS clients are available for Macintoshes, PCs, and even supercomputers.
WAIS was one of the first programs to be based on the Z39.50 standard. The American National Standard Z39.50Information Retrieval Service Definition and Protocol Specification for Library Applications standard, revised by the National Information Standards Organization (NISO)attempts to provide interconnection of computer systems despite differences in hardware and software.
WAIS was the first database system to use this standard (which may well become a universal data-search format). Unfortunately, WAIS was based on an old version of the Z39.50 standard. The newer standard is somewhat incompatible with the older one. There have been discussions about making WAIS clients and servers that can use both protocols, however.
NOTE Z39.50 is similar in some respects to Structured Query Language (SQL), but it is simplified. Although this makes Z39.50 less powerful, it consequently makes it more general, so Z39.50 is likely to gain wide acceptance.
Z39.50 is an important step in making information sources on the Internet more accessible. Today, most Internet databases are accessed in ways completely different from each other. They use different standards for storing data and different tools to access that data. Although Z39.50 may change that, it is not yet clear when or how.
For example, one library catalog system may use find as its search command for a subject heading; another may use subject. Still another may use topic. If they all conformed to a standard, life would be much simpler. Z39.50-compliant systems all use the same format to construct queries. You don't have to know anything special to search a WAIS database. You just use whatever word you think may be used in relevant documents because WAIS indexes all the text in a source.
After you run a search that identifies any documents, you receive a list of hits, or ranked document titles. The WAIS server ranks the hits from the most-relevant to the least-relevant document. Each document is scored, with the best-fitting document awarded 1,000 points. All other scores are relative to the top score.
WAIS ranks documents by the number of search words that occur in the document and the number of times those words appear.
WAIS servers also take into consideration the length of the document. WAIS servers are smart enough to exclude common words, called stop words, to make the search manageable. Words such as a, about, above, across, after, the, and so on should be excluded from your search because the frequency of their appearance in most documents makes them irrelevant in most searches. For example, if you search for Who is Richard Simmons?, who and is are excluded from the search because they are stop words.
NOTE Stop words are controlled by the administrator of each WAIS server. In addition to generally common words, many words common to a database may become stop words. For example, the word WAIS may be a stop word in the database of a WAIS newsgroup; the word Internet may be a stop word in a database of Internet protocols.
You cannot use Boolean logic in most WAIS searches. That is, you can't do anything other than find a single word or several words. A search for cow and farm searches for documents that contain cow and/or and and/or farm. The and should be excluded from the search. Notice that the search is "and/or" not just "and." The search for cow farm gives you all documents that contain any of the following:
You can guarantee that this limitation won't always be the way of things; already there's a new version of WAIS called FREEWAIS (get it? freeways?) which does support Boolean searches.
Also, no wildcard searching is available in WAIS. This means that you can't specify that you would accept cows as well as cow.
Unlike many regular database searches, WAIS searches can't be expanded to include articles that may talk about similar topics or to retrieve all articles that have those words (for example, cars or automobiles or trucks or motorcycles). Neither can you exclude words in a search (for example, cars but not trucks).
You can, however, increase the number of relevant documents by using more specific terms in a search. A search for car automobile crash statistics may retrieve more pertinent documents on the subject you want.
The sources available through WAIS are as varied as the groups that communicate over the Internet: Renaissance music, beer brewing, Aesop's fables, software reviews, recipes, ZIP code information, a thesaurus, environmental reports, and many other databases are available.
The WAIS system for Thinking Machines alone gives access to more than 60,000 documents, including weather maps and forecasts, the CIA World Factbook, a collection of molecular biology abstracts, Usenet's Info Mac digests, and the Connection Machine's FORTRAN manual (a must for pipe-stress freaks and crystallography addicts). The Massachusetts Institute of Technology makes a compendium of classical and modern poetry available through WAIS.
WAIS was developed by Thinking Machines Corporation, Apple Computer, and Dow Jones; access to the system is available free from Thinking Machines by connecting to telnet://quake.think.com/WAIS.
As an alternative, WAIS client software (both executable and source) is available through anonymous FTP at Thinking Machines (use the same Internet address) in the pub/wais/ directory. WAIS clients are available for a number of operating systems (X Window, DOS, Macintosh, and others), but they do require that your computer have some kind of TCP/IP connection to the Internet.
You can access WAIS in three ways. You can Telnet to quake.think.com and log in as wais, or you can run a local WAIS client. Your system administrator may have set up your system so that typing wais automatically connects you to whatever WAIS service is available. Another way to get to WAIS is through Gopher. You'll find an entry on Gopher menus such as Other Gopher and Information Servers that will lead you eventually to WAIS.
The first screen you see on WAIS is a list of the WAIS servers and sources available. At the time of this writing, 529 WAIS sources are available through the WAIS client at Thinking Machines, starting with aarnet-resource-guide and ending with zipcodes.
The following example screen gives you a reference number for each source, the location of the WAIS server in brackets, the name of the server, and the cost of searching that library. At this time, all WAIS servers available through Thinking Machines are free.
# Server Source Cost
001: [ archie.au] aarnet-resource-guide Free
002: [ munin.ub2.lu.se] academic_email_conf Free
003: [wraith.cs.uow.edu.au] acronyms Free
004: [ archive.orst.edu] aeronautics Free
005: [ bloat.media.mit.edu] Aesop-Fables Free
006: [ ftp.cs.colorado.edu] aftp-cs-colorado-edu Free
007: [nostromo.oes.orst.ed] agricultural-market-news Free
008: [ archive.orst.edu] alt.drugs Free
009: [ wais.oit.unc.edu] alt.gopher Free
010: [sun-wais.oit.unc.edu] alt.sys.sun Free
011: [ wais.oit.unc.edu] alt.wais Free
012: [alfred.ccs.carleton.] amiga-slip Free
013: [ munin.ub2.lu.se] amiga_fish_contents Free
014: [ coombs.anu.edu.au] ANU-Aboriginal-Studies $0.00/minute
015: [ coombs.anu.edu.au] ANU-Asian-Computing $0.00/minute
016: [ coombs.anu.edu.au] ANU-Asian-Religions $0.00/minute
017: [ coombs.anu.edu.au] ANU-CAUT-Projects $0.00/minute
018: [ coombs.anu.edu.au] ANU-French-Databanks $0.00/minute
Keywords:
<space> selects, w for keywords, arrows move, <return> searches, q quits, or ?
You are now ready to conduct a search. As is true with Gopher, the problem with using WAIS is deciding which of the 529 libraries to search. An added problem is that the names of the servers don't necessarily describe what they contain. Fortunately, a directory of servers is available that contains short abstracts of the contents of each server and other information about the source of the server. Until you know exactly which server you want to search, you should start with the directory of servers.
How do you get there? The preceding screen looks like an alphabetical list of WAIS servers, so using the down-arrow key can do the trick but may take a while. Issuing the ? (help) command to reveal the online help that comes with this client displays the following information:
SWAIS Source Selection Help Page: 1
j, down arrow, ^N Move Down one source
k, up arrow, ^P Move Up one source
J, ^V, ^D Move Down one screen
K, <esc> v, ^U Move Up one screen
### Position to source number ##
/sss Search for source sss
<space>, <period> Select current source
= Deselect all sources
v, <comma> View current source info
<SB2 BOX>
<ret> Perform search
s Select new sources (refresh sources list)
w Select new keywords
X, - Remove current source permanently
o Set and show swais options
h, ? Show this help display
H Display program history
q Leave this program
Press any key to continue
This help screen tells you how to move through the screens of the source directory. WAIS uses UNIX editor commands for moving around (the j and J, for example, are UNIX editor commands for moving down by line or by screen). Try your Page Down and arrow keys; they may work if you're using VT-100 terminal emulation. The /sss is an important command because it quickly moves the pointer to a source on a specific line. Also note that the space or period selects a source; the equal sign deselects all sources.
NOTE Unless your terminal emulator does a good VT-100 emulation, don't bother with swais; you'll go crazy trying to figure out what's going on.
TIP Here's a feature not covered in the swais help screen: use the spacebar or period on a selected source to deselect it.
It's too bad that the directory of servers isn't the first item on the list of sources. You know the name, so use a forward slash with the name of the server to get there. Type /dir to get close; after the screen is refreshed with names of new sources, use the down arrow key or type j once to highlight the directory of servers.
SWAIS Source Selection Sources: 429
# Server Source Cost
145: [ ds.internic.net] ddbs-info Free
146: [ irit.irit.fr] directory-irit-fr Free
147: [ quake.think.com] directory-of-servers Free
148: [ zenon.inria.fr] directory-zenon-inria-fr Free
149: [ zenon.inria.fr] disco-mm-zenon-inria-fr Free
150: [ wais.cic.net] disi-catalog Free
151: [ munin.ub2.lu.se] dit-library Free
152: [ ridgisd.er.usgs.gov] DOE_Climate_Data Free
153: [ wais.cic.net] domain-contacts Free
<SB2 BOX>
154: [ wais.cic.net] domain-organizations Free
155: [ ftp.cs.colorado.edu] dynamic-archie Free
156: [ wais.wu-wien.ac.at] earlym-l Free
157: [ bio.vu.nl] EC-enzyme Free
158: [ kumr.lns.com] edis Free
159: [ ivory.educom.edu] educom Free
160: [ wais.eff.org] eff-documents Free
161: [ wais.eff.org] eff-talk Free
162: [ quake.think.com] EIA-Petroleum-Supply-Monthly Free
Remember that you are not searching a huge database containing source materials but a database of descriptions of source databases. The terms you choose should reflect what the author or owner of the database would probably use to describe it.
The following example search uses the words wais and Z39.50 to find information on the NISO standard and how WAIS uses it. WAIS uses the words wais and Z39.50 to retrieve search results that contain those words (see the following example). The information is returned in ranked orderthe order WAIS thinks is most likely to contain your information. The first item, scored 1,000, is the one WAIS thinks is most likely to contain what you're looking for.
SWAIS Search Results Items: 40
# Score Source Title Lines
001: [1000] (directory-of-se) cool-cfl 76
002: [ 953] (directory-of-se) dynamic-archie 59
003: [ 858] (directory-of-se) wais-docs 24
004: [ 834] (directory-of-se) wais-talk-archives 18
005: [ 810] (directory-of-se) alt.wais 18
006: [ 810] (directory-of-se) wais-discussion-archives 18
007: [ 691] (directory-of-se) cool-net 50
008: [ 572] (directory-of-se) aftp-cs-colorado-edu 144
009: [ 476] (directory-of-se) bionic-directory-of-servers 31
010: [ 452] (directory-of-se) cicnet-wais-servers 55
011: [ 381] (directory-of-se) cool-lex 59
012: [ 333] (directory-of-se) IUBio-INFO 71
013: [ 333] (directory-of-se) directory-of-servers 32
014: [ 333] (directory-of-se) sample-pictures 23
015: [ 333] (directory-of-se) utsun.s.u-tokyo.ac.jp 32
016: [ 309] (directory-of-se) journalism.periodicals 58
017: [ 309] (directory-of-se) x.500.working-group 38
018: [ 286] (directory-of-se) ANU-Theses-Abstracts 89
This search resulted in some irrelevant sources. For example, cool-cfl is a database of files from a group concerned with conservation in libraries, archives, and museums. This might be a bug in WAISnot improbable; Internet software is being developed and improved continuously.
The second source, dynamic-archie, discusses a Dynamic WAIS prototype at the University of Colorado that performs Archie searches with WAIS. This could be useful...and so could the next four sources. The rest don't seem to be relevant.
The information that describes the sources in WAIS is determined by the owners of the source. Some sources, such as ERIC databases, give detailed information that makes the directory of sources a valuable tool in finding out which sources are relevant. Other sources have minimal descriptions that aren't very useful or won't be found through the directory of services. Such source descriptions are probably of use only to people who know they are available in the WAIS database.
From here, press the letter s to return to the sources, using the /wais command to select the three sources with wais in the name.
SWAIS Source Selection Sources: 429
# Server Source Cost
415: * [ quake.think.com] wais-discussion-archives Free
416: * [ quake.think.com] wais-docs Free
417: * [ quake.think.com] wais-talk-archives Free
418: [hermes.ecn.purdue.ed] water-quality Free
419: [ quake.think.com] weather Free
420: [ sunsite.unc.edu] White-House-Papers Free
421: [ wais.nic.ddn.mil] whois Free
422: [ sunsite.unc.edu] winsock Free
423: [ cmns-moon.think.com] world-factbook Free
424: [ quake.think.com] world91a Free
425: [ wais.cic.net] wuarchive Free
426: [ wais.cic.net] x.500.working-group Free
427: [wais.unidata.ucar.ed] xgks Free
428: [ cs.widener.edu] zen-internet Free
429: [ quake.think.com] zipcodes Free
You could also select the alt.wais group (the one ranked fifth in your initial search), but these three will work. Using Z39.50 as a search criterion simplifies the search; the word wais is probably scattered throughout most of the documents, lessening its relevance to the search. To enter the search text, select the sources you want to search; you are then prompted for keywords. After typing the keywords, press Enter; WAIS searches each selected source and ranks the results according to their relevance.
SWAIS Search Results Items: 39
# Score Source Title Lines
001: [1000] ( wais-docs) z3950-spec 2674
002: [1000] (wais-talk-archi) Edward Vie Re: [wald@mhuxd.att.com: more 383
003: [1000] (wais-discussion) Clifford L Re: The Z39.50 Protocol: Ques 325
004: [ 939] (wais-discussion) Brewster K Re: online version of the z39 2659
005: [ 893] (wais-discussion) akel@seq1. Re: Net resource list model(s 347
006: [ 823] ( wais-docs) waisprot 1004
007: [ 800] (wais-discussion) Michael Sc Re: Dynamic WAIS prototype an 27
008: [ 338] (wais-discussion) harvard!ap Re: Z39.50 Product Announceme 51
009: [ 333] ( wais-docs) protspec 915
010: [ 331] (wais-discussion) Unknown Subject 6
011: [ 331] (wais-discussion) uriel wile Re: poetry server is up [most 31
012: [ 313] (wais-talk-archi) brewster@q Re: Re: Information about z39 69
013: [ 313] (wais-talk-archi) ses@cmns.t Re: Z39.50 1992 171
014: [ 313] (wais-talk-archi) ses@cmns.t Re: Z39.50 1992 90
015: [ 308] (wais-discussion) Brewster K Re:Hooking up WAIS with othe 66
016: [ 292] (wais-discussion) Brewster K Re: [morris@Think.COM: it's s 25
017: [ 286] (wais-talk-archi) mitra@pand Re:Z39.50 1992 71
018: [ 284] (wais-discussion) Brewster K Re: WAIS-discussion digest #6 18
The results look promising. The first Z39.50 is ranked 1,000. In fact, the first three seem to be relevant. The name of the information source is given, along with the title of the information. In this case, the title appears to come from e-mail message subject headings. Finally, the screen gives the number of lines contained in the information.
From here, you can read each result and have pertinent results e-mailed to you or even to another person. At the search result screen, type the letter m to receive a prompt asking for an e-mail address. If none of the documents are relevant, you can go back to the sources and redefine the search strategies or add additional appropriate sources to search. The sample documents contain the desired information, so this search has worked.
Because WAIS uses natural language query in its search mode and searches the full-text index of the source, changing any of the search words produces different results. Using a natural language search such as how does wais use Z39.50 protocol produces the following results:
SWAIS Search Results Items: 39
# Score Source Title Lines
001: [1000] ( wais-docs) z3950-spec 2674
002: [1000] (wais-talk-archi) Edward Vie Re: [wald@mhuxd.att.com: more 383
003: [1000] (wais-discussion) Michael Sc Re: Dynamic WAIS prototype an 27
004: [ 998] (wais-discussion) Brewster K Re: online version of the z39 2659
005: [ 777] (wais-talk-archi) news-mail- Re: WAIS-discussion digest #4 554
006: [ 675] (wais-talk-archi) news-mail- Re: WAIS-discussion digest #3 535
007: [ 640] (wais-talk-archi) news-mail- Re: WAIS-discussion digest #3 636
008: [ 629] (wais-talk-archi) brewster@t Re: WAIS-discussion digest #5 749
009: [ 608] (wais-talk-archi) news-mail- Re: WAIS-discussion digest #4 601
010: [ 607] (wais-talk-archi) fad@think. Re: WAIS Corporate Paper " 424
011: [ 607] (wais-talk-archi) composer@b Re: WAIS, A Sketch of an Over 449
012: [ 589] (wais-talk-archi) news-mail- Re: WAIS-discussion digest #4 621
013: [ 549] (wais-talk-archi) news-mail- Re: WAIS-discussion digest #3 575
014: [ 524] (wais-talk-archi) brewster@t Re: WAIS-discussion digest #4 682
015: [ 515] (wais-talk-archi) news-mail- Re: WAIS-discussion digest #3 521
016: [ 510] (wais-talk-archi) news-mail- Re: WAIS-discussion digest #4 480
017: [ 507] (wais-discussion) akel@seq1. Re: Net resource list model(s 347
018: [ 495] (wais-discussion) Unknown Subject 6
Although many of the results are duplicates of the search using just the text Z39.50, some new documents are listed. An extensive search for all relevant documents may mean using different search strategies and a variety of WAIS source servers.
In addition to its search features, WAIS also functions as a data-indexing tool. WAIS can take large amounts of information, index it, and make the resultant Z39.50-compliant database searchable. You can build an indexed database for your own use as a stand-alone database or, if you have a TCP/IP connection, you can make your WAIS database public by registering it with wais.com and listing it in the Directory of Sources.
To obtain the WAIS software, go to ftp://think.com/wais. This is the main distribution site for WAIS software and WAIS documentation. Both the WAIS server code and client code are available from think.com. You can find more WAIS software at ftp://ftp.cnidr.org/pub/NIDR.tools/wais, ftp://ftp.wais.com/pub, and ftp://sunsite.unc.edu/pub/wais.
Getting WAIS up and running is no trivial matter. Because it's very complicated, we'll leave that as an exercise for more daring users with time on their hands and a good supply of Valium.
The use of WAIS is growing slowly on the Internet. WAIS provides a convenient and efficient way to index and search large amounts of information using standards that are starting to be generally accepted on the Internet.
However, WAIS faces some tough competition. The WWW, Harvest, and Hyper-G tools offer facilities that may replace WAIS. However, it is likely that WAIS will survive and be used in niche database applications.