4. Scanning Data Bases To Identify Emerging Global Issues

4.1. Introduction

The Millennium Project uses three primary sources of information to identify and explore future issues and opportunities:

In this year's activity, priority was placed on the panel; scenarios were explored at lower priority, and scanning was at least initiated to begin the evaluation of the potential and means for effectively implementing this approach. The results of our early scanning work are presented here.

During the Millennium Project's feasibility study, other methods of environmental scanning were addressed previously.6 As a result, this report focuses on publications on the Internet, using World Wide Web as the data base, and commercially available search engines. We asked, in effect, could scanning of the Internet with currently available tools contribute to the objectives of the Millennium Project?

4.2. The World Wide Web

By now, anyone interested in computers and communications has heard of and probably used the Internet and the World Wide Web. Beginning from a system which connected computers between universities and defense contractors (ARPA Net), the Internet has emerged as a phenomenon of our times, available for research or surfing, digital conversation, serious inquiry, or trivial pursuits. Initially, the Internet provided information only in the form of text. The World Wide Web (WWW) introduced user-friendly color graphics that have greatly contributed to its popularity and growing use. Although most on-line users are in the United States, the WWW is accessible almost everywhere in the world and use by people outside of the United States is growing. Web pages are predominately written in English, but this too is changing and the WWW is becoming increasingly multi-lingual. Many of the major search engines on the WWW offer search options in multiple languages.

The amount of information available on the WWW is seemingly endless and constantly changing. Identifying specific articles of interest from this wealth of information is not always simple. The problem is that seemingly reasonable search strategies can produce volumes of "noise", burying the few articles and reports that are on target. Another problem is the continual under forcasted demand and extensive "free" use, requiring huge new capacity and costs now being installed by corporations and govenments. Access costs may begin to rise along with advertising.

4.3. Background Information on Major Search Engines

 There are a dozen or more search engines available to the average user. Each focuses on a different set of databases and matches the key words provided by users in different ways. Without attention to the search strategy, the returns from a simple search can be overwhelming; one search on global issues, for example, found over three million matches. Clearly no one would even begin to manually sort through a stack that high to find the few nuggets that it might contain.

Below is a list of the most common and useful search engines and their respective URLs.

AltaVista: http://altavista.digital.com

CUI: http://cuiwww.unige.ch/meta-index.html

CyberHound: http://www.cyberhound.com

Excite: http://www.excite.com

EZ Connect: http://www.ezconnect.com

Infoseek: http://www2.infoseek.com

Lycos: http://www.lycos.com

Medexplorer: http://www.medexplorer.com

Medsearch: http://www.medsearch.com

Open Text: http://www.opentext.com/omw/f-omw.html

Ultraseek: http://www.ultraseek.com

Webcrawler: http://www.webcrawler.com

Yahoo: http://www.yahoo.com

The following URLs will lead to library reference sites that offer helpful comparisons of individual search engines:

http://www.indiana.edu/~librcsd/search

http://www.cnet.com/Content/Reviews/Compare/Search/ss3a.html

http://www.state.ia.us/educate/depteduc/offtech/search.html

Recently several "meta-search engines" have become available. Meta search engines are able to search multiple individual search engine databases with the launch of a single "meta" search. Meta search engines are becoming more and more popular as they can save users from jumping between multiple search engines. The following is a list of four Meta search engines currently available on the World Wide Web.

IBM InfoMarket: http://www.infomkt.ibm.com/

MetaCrawler: http://MetaCrawler.cs.washington.edu:8080/index.html

Profusion: http://www.designlab.ukans.edu/ProFusion.html

SavvySearch: http://cage.cs.colostate.edu:1969/

Yet a third level of search engines has recently been added to the search galaxy - Personal Net agent software. Net or Web Agents are a class of software that essentially are meta search sites which run on the user's personal computer and are far more customizable than even the best of on-line search or meta-search engines. After running a search, agent software will typically either create a detailed summary of all of the pages found or download the actual pages with all inline images and Java scripts for review. Agent software can be programmed to automatically update searches at weekly or daily intervals. All of this can be performed in unattended mode, late at night for example, to maximize the users time and resources.

Two contemporary examples of agent software are NetAttache and WebCompass. Excellent reviews of these and other agent-based programs can be found at:

http://www.stroud.com/sagents.html.

Also in this class, though more specialized and limited in scope, are news agents. These deliver specific newsfeeds on a regular basis to your computer based on detailed selection algorithms which "hit" search engines and wire services. While they will not search the Web per se, they will provide highly customized current events in hundreds of selectable fields and industries. An example is

PointCast: (http://www.pointcast.com)

4.4 Preliminary Test of the Internet for the Millennium Project

There are primarily two reasons for scanning data bases in the Millennium Project. The first is to identify newly emerging global issues and opportunities, and the second is to uncover more information about issues already identified by other means. To explore the usefulness of the Internet in addressing both types of questions, certain key words were chosen, and the Internet World Wide Web was searched using the Meta search engine MetaCrawler (MetaCrawler searches using several engines including Lycos, WebCrawler, AltaVista, Yahoo, and Galaxy) and the search engine, Infoseek.

The searches were conducted using key words such as "Emerging Global Issues" and "Aging Nuclear Power Plants." Here are the results:

Using "Emerging Global Issues" (with a plus sign in front of the words to require their presence) MetaCrawler found a total number of 38 matches. These matches included material that was far off the mark and some good hits. For example, for reasons that are not understood, the references that were off the track pointed us to the home page of the accounting firm of Coopers and Lybrand, the Cornell Law Library, and the same sex sodomy law in Arkansas. But the hits were encouraging, and included the following:

Medium-Term Target of ODA (outline):

1. Aim. The stability and sustained growth of developing countries are essential to the creation of a post-Cold War framework for peace and

http://www.nttls.co.jp/infomofa/oda/sum1995/fifth.html

The Emerging Global Village:

The Emerging Global Village: Issues & Problems. The Emerging Global Village web site will use multimedia to explore the impact of globalization on the world community.
http://www-plateau.cs.berkeley.edu/globalvillage/issues.html

Watson Institute Publications List:

The Watson Institute publishes a book series on Emerging Global Issues, an Occasional Paper Series, Briefings, ...

http://www.brown.edu/Departments/Watson_Institute/Publications/WIIS_PUB_comp.shtml

Emerging Global Issues:

Watson Institute Emerging Global Issues Book Series Third World Security in the Post-Cold War Era edited by Thomas G. Weiss and ...

http://www.brown.edu/Departments/Watson_Institute/Publications/egi.shtm

Full House:

Reassessing the Earth's Population Carrying Capacity by Lester R. Brown and Hall Kane. Population outrunning the earth's carrying capacity..

http://csf.colorado.edu/authors/hanson/page28.htm

World Issues Forum:

Home | Contents | Journal | Corporate | Gopher | Members | Education | Diplomacy | Related. World Issues Forum. AFSA Brings the World to You. The World...

http://www.afsa.org/educ/wdissue.html We note parenthetically that the Millennium Home Page was not found in this search.

Changing the search strategy on MetaCrawler to "+new +world +issues" produced 20 responses, again with the variety seen earlier. Some but not many items were duplicated between the two lists. Some of the interesting additional hits included:

Contemporary Forms of Genocide:

Mirghani Mohamed and Mona Mobamed, The National Islamic Front (NIF) Genocidal Policy in Sudan. Perspective. Israel Chamy, A Classification of Denials of Holocaust and Genocide: From Culpability to Corruption to "Innocence" in Celebration of Violence.

http://www.unI.edu'conted/acpp/genocide/index.htmI

And at least one enigmatic item appeared that might be followed up:

New issues of the world - November 1996:.

Every month Cronaca Filatelica reproduces, in its central black pages, new issues from all over the world in color with descriptions in Italian. [Eder Home Page] [Main menu] [Cronaca Filatelica] [New issues of the world Index].

http://www.intecs.it/eder/crofil/nere/uknov96.htm

Using "+emerging +global +issues" with Infoseek produced a total of 18 references. Most of these dealt with global investing; however, one hit was of possible interest:

Global Issues:

The global issue theme focuses discussion on a broad range of environmental, social, political, and economic issues with concern and affect the...

http://www.iearn.org/lcguide/gi/gi.html

Using the key words "+new +world +issues" on Infoseek produced bizarre results. Somehow the program associated the word "cigars" with this search request. It obtained five hits, none of which were useful. However, an earlier search from Paris using the same terms produced 36 matches of which 3 were judged particularly relevant.

Emerging Infectious Diseases:

Tracking trends and analyzing new and reemerging issues around the world

http://www.cdc.gov/ncidod/EID/eid.htm

Connected:

news and view from the connected society

http://www.access.ch/e-news/

Foreign Policy Association

http://www.fpa.org/

When the key words "+aging +nuclear +plants" were used with MetaCrawler a total of 49 matches were produced of which most were on target. The references included pointers to material degradation, specific power plants that had been shut down because of malfunction or age, the nature of the risk involved in aging power plants, radiation effect, documents from the Nuclear Regulatory Agency pertaining to the subject, articles on decommissioning procedures and experience, operational difficulties with a specific Russian plant, nuclear safety research, etc.

This search even found a reference to the Millennium Project's Issue #13, "Aging Nuclear Plants," identified in the Projects third round and posted on its home page. The specific search engine used by MetaCrawler to find this reference was Excite.

With Infoseek, 22 matches were found and most were useful. There was no overlap with the articles retrieved from the MetaCrawler search.

Infoseek was used a second time with a slightly different designation. This search was performed using quotation marks, thus establishing the key words as a phrase: "aging nuclear power plants." Now a total of 28 matches were retrieved. The results were generally good but the misses were more abundant than in the MetaCrawer search.

4.5 Analysis.

It is apparent that the capability to search enormous data bases on the Internet will be invaluable to the Millennium Project. This preliminary work has barely scratched the surface but a scratch was made deep enough to show the potential.

Clearly, the searches were more efficient and productive in providing information on the focused topic of aging nuclear power plants than in searching for newly emerging global issues. The problem of discovering some issue that is truly new seems to stem from the following conundrum: if the subject has been written about and can be found on the Internet, it may no longer be new.

Perhaps other search terms would be more rewarding in uncovering issues of future importance. Replacing "+new" with "unexpected" in the search using MetaCrawler, produced some promising results. In addition to the usual "noise," this search produced articles on global warming, evidence-based medicine, massively distributed systems, trends in nanotechnology, and unexpected threats to US interests.

There are yet no clear criteria by which to select a best search engine or search strategy, except some general rules of thumb. In our limited work, the Meta search engine outperformed the single search engine in terms of its reach and accuracy. Meta search engines offer an efficient alternative to searching multiple individual search engines. However, Meta search engines do not incorporate all individual search engines. For example, MetaCrawler did

not query Infoseek in its searches. As a result, individual searches on Infoseek revealed matches that MetaCrawler had not found.

Without employing some of the engine-specific search focusing techniques, the number of items retrieved generally was so large that the searches became essentially meaningless. The researcher should become familiar with the peculiarities of each individual searching tool. A few of the searching tips provided by the home pages of the search engines are listed in the appendix. Finally, a refined search is much more time and information efficient than one unrefined search which produces hundreds of irrelevant matches.

At this point, the best search strategy appears to be one of "refined" trial and error. The searches themselves sometime reveal the next search terms to use and the ones to avoid. For example, using the term "futures" for the purpose of finding articles on "futures research" also elicits the futures commodity market; so the next logical search would input "futures, NOT commodities." The term "aging nuclear plants" gave additional retrievals associated with how biologically growing plants aged under nuclear environments; the next search could be worded to exclude such items.

The best place to start with a search on the WWW is with a precise subject. If the subject matter initially turns out to be too specific and leads to no matches, gradually broaden the scope of the subject. Beginning with a very focused search, then widening the parameters of the search will prove much more effective than the inverse approach.

4.6 Future Research

Ideally, a future search system would operate autonomously, and ring an alarm every time it recognized that a new topic of interest had appeared in the data base. This recognition would be based on the novelty of the item, the plausibility of its emergence, and its potential importance. (In prior Millennium work, "importance" was defined on the basis of the number of people likely to be affected, the severity and permanence of the effects, and imminence.) To have such an automatic system is not nearly at hand, but discussions have taken place between the Project and personnel at the Maui High Performance Computer Center about how such a system might be constructed. The plan is to set up a number of search strategies and review in detail the nature of the material retrieved. The value of each strategy would be judged by reviewers on the basis of how close it came to finding new global issues and opportunities. Those that did best would be combined (mated) in pairs or appropriate permutations and a set of instructions would be added: find other articles similar to those that rated high. Similarity would be specified on the basis of many different factors: author, published organization, length, publication date, etc. Then retrieved material would again be judged and the winning pairs mated. By the fifth generation the search criteria would presumably not be recognizable, but the agents would be bringing back precise material of great interest.

In such an approach, of course, other reviewers would have trained their agents to recognize material that was satisfying to themselves; hence there is not likely to be an absolute convergence between different search institutions.

Whether such an enterprise can be pursued depends on funding and the availability of interested and capable researchers.


Hand Millennium Project Home Page

Hand Back to Table of Contents