Digital libraries and electronic publishing today
Digital libraries and electronic publishing are here. They are not an academic concept to debate, or a dream of utopia. In this last chapter, the temptation is to predict what lies ahead, but the history of computing shows how fruitless such predictions can be. Rather than attempt to predict the future, it is more profitable to celebrate the present.
This is not to ignore the future. The work that is in progress today will be in production soon. Not all new projects enter the mainstream, but understanding what is happening today helps comprehension of the potential that lies ahead. This book was written over a twelve month period in 1997 and 1998. Even during that period, digital libraries developed at a rapid pace. This final chapter looks at some of the current trends, not to forecast the future, but to understand the present.
A reason that predictions are so difficult is that, at present, the technology is more mature than the uses being made of it. In the past, forecasts of basic technology have proved to be reasonably accurate. Every year, semiconductors and magnetic devices are smaller, cheaper, faster, and with greater capacity. There are good engineering reasons to expect these trends to continue for the next five or ten years. Funds are already committed for the high-speed networks that will be available in a few years time.
Predictions about the new applications that will develop have always been much less accurate. Seminal applications such as spread sheets, desk-top publishing, and web browsers emerged from obscurity with little warning. Even if no such breakthrough takes place, forecasts about how the applications will be used and the social effect of new technology are even less dependable. The only reliable rule is that, when a pundit starts out with the words, "It is inevitable that," the statement is inevitably wrong.
A possible interpretation of the current situation is that digital libraries are at the end of an initial phase and about to begin a new one. The first phase can be thought of as a movement of traditional publications and library collections to digital networks. Online newspapers, electronic versions of scientific journals, and the conversion of historic materials all fall into this category. Fundamentally, they use new technology to enhance established types of information. If the thinking is correct, the next phase will see new types of collections and services that have no analog in traditional media. The forms that they will take are almost impossible to anticipate.
A myopic view of digital libraries
Looking back on any period of time, trends and key events become apparent that were not obvious at the time. Those of us who work in digital libraries can only have a myopic view of the field, but here are some observations about how the field of digital libraries appears to an insider.
A useful metaphor is the contrast between a rowing boat and a supertanker. A rowing boat can change direction quickly and accelerate to full speed, but has little momentum. The early digital library projects, such as our Mercury project at Carnegie Mellon University, were like rowing boats. They made fast progress, but when the initial enthusiasm ended or funding expired they lost their momentum. Established libraries, publishers, and commercial corporations are like supertankers. They move more deliberately and changing direction is a slow process, but once they head in a new direction they move steadily in that direction. In digital libraries and electronic publishing, success requires attention to the thousands of details that transform good ideas into practical services. As the Internet and the web mature, many organizations are making these long-term investments in library collections, electronic publications, and online services. The supertankers are changing direction.
Consider these two years, 1997 and 1998, as typical. During this short period, many developments matured that had been in progress for several years. Online versions of newspapers reached a high quality and the level of readership of some of them began to rival the readership of printed editions. In 1997 and 1998, major scientific publications first became available online, from both commercial and society publishers. It was also an important time for conversion projects in libraries, such as JSTOR and the Library of Congress, as the volume of materials available from these projects accelerated sharply. During 1997 and 1998, the Dublin Core approach to metadata and Digital Object Identifiers for electronic publications both gained momentum; they appear to have made the transition from the fringe to the mainstream.
On the technical front, products for automatic mirroring and caching reached the market, tools for web security became available, and the Java language at last became widely used. These developments were all begun in earlier years, none can be considered research, yet in aggregate they represent tremendous progress.
In the United States, electronic commerce on the Internet grew rapidly. It has become widely accepted to buy books, airline tickets, stock market investments, and automobiles through Internet transactions. The Internal Revenue Service now urges people to pay their income tax online. Congress passed a reasonable revision of the copyright law. Funds are available for entrepreneurial investments; markets for new products are open for the right idea. Internet stocks are the favorite speculation of the stock market. Low-cost personal computers, under $1,000, are selling vigorously, bringing online information to an ever broader range of people.
No year is perfect and pessimists can point out a few worrying events. During 1997, junk electronic mail reached an annoying level. Leading manufacturers released incompatible versions of the Java programming language. The United States policy on encryption continued to emulate an ostrich. These are short-term problems. Hopefully, by the time that this book is published, junk electronic mail will be controlled and Java will be standardized. (There appears to be less hope for the United States' encryption policy.)
For the next few years, incremental developments similar to those in 1997 in 1998 can be expected. They can be summarized succinctly. Large numbers of energetic people are exploiting the opportunities provided by the Internet to provide new products and services.
From a myopic viewpoint, it is easy to identify the individual activities. It is much harder to see the underlying changes of which they are part. In the long term, the most fundamental trend is perhaps the most difficult to measure in the short term. How are people's habits changing? Many people are writing for the web. Graphic designers are designing online materials. People are reading these materials. Who are these people? What would they be doing otherwise?
Habits clearly have changed. Visit a university campus or any organization that uses information intensively and it is obvious that people spend hours every week in front of computers, using online information. At home, there is evidence that part of the time people use the web is time that used to be spent watching television. At work, are people reading more, or are they substituting online information for traditional activities such as visits to the library? Ten years from now, when we look back at this period of change, the answers to these questions may be obvious. Today we can only hypothesize or extrapolate wildly from small amounts of data. Here are some guesses, based on personal observation and private hunches.
The excitement of online information has brought large numbers of new people into the field. People who would have considered librarianship dull and publishing too bookish, are enthralled by creating and designing online materials. The enthusiasm and energy that these newcomers bring is influencing the older professions more fundamentally than anybody might have expected. Although every group has its Luddites, many people are reveling in their new opportunities.
When the web began, Internet expertise was in such short supply that anybody with a modicum of skill could command a high salary. Now, although real experts are still in great demand, the aggregate level of skill is quite high. A sign of this change is the growth in program that help mid-career people to learn about the new fields. In the United States, every community college is running courses on the Internet and the web. Companies that provide computer training programs are booming. Specialist program in digital libraries are over-subscribed.
The new generation
In 1997, a Cornell student who was asked to find information in the library reportedly said, "Please can I use the web? I don't do libraries." More recently, a faculty member from the University of California at Berkeley mused that the term "digital library" is becoming a tautology. For the students that she sees, the Internet is the library. In the future, will they think of Berkeley's fine conventional libraries as physical substitutes for the real thing?
Are these the fringe opinions of a privileged minority or are they insights about the next generation of library users? Nobody knows. The data is fragmentary and often contradictory. The following statistics have been culled from a number of sources. They should be treated with healthy skepticism.
A survey in Pittsburgh found that 56 percent of people aged eighteen to twenty four used the Internet, but only 7 percent of those over fifty five are Internet users. Another poll, in 1997, found that 61 percent of American teenagers use the web. Although boys outnumber girls, the difference in usage is only 66 to 56 percent. In 1996, a third study found that 72 percent of children aged eight to twelve had spent time on a computer during the last month.
The network that these young people are beginning to use as their library had about 5.3 million computers in 1998. A careful study in late 1996 estimated that one million web site names were in common usage, on 450,000 unique host machines, of which 300,000 appear to be stable, with about 80 million HTML pages on public servers. Two years later, the number of web pages was estimated at about 320 million web pages. Whatever the exact numbers, everybody agrees that they are large and growing fast.
A pessimist would read these figures as a statement that, in the United States, the young people have embraced online information, the mid-career people are striving to convert to the new world, and the older people who make plans and control resources are obsolete. Observation, however, shows that this analysis is unduly gloomy. The fact that so many large organizations are investing heavily in digital information shows that at least some of the leaders embrace the new world.
Although many organizations are investing in digital libraries and electronic publications, nobody can be sure what sort of organizations are likely to be most successful. In some circumstances, size may be an advantage, but small, nimble organizations are also thriving.
A 1995 article in The Economist described control of the Internet in terms of giants and ants. The giants are big companies, such as the telephones companies and the media giants. The ants are individuals; separately, each has tiny power, but in aggregate they have consistently succeeded in shaping the Internet, often in direct opposition to the perceived interests of the giants. In particular, the community of ants has succeeded in keeping the Internet and its processes open. During the past few years, both digital libraries and electronic publishing have seen consolidation in ever large organizational units. In libraries this is seen as the movement to consortia; in publishing it has been a number of huge corporate mergers. Yet, even as these giants have been formed, the energy of the ants has continued. At present, it appears that giants and ants can coexist and both are thriving. Meanwhile, some of the ants, such as Yahoo and Amazon.com, are becoming the new giants.
Collections and access
Two factors that will greatly influence the future of digital libraries are the rate at which well-managed collections become available on the Internet and the business models that emerge. Currently, materials are being mounted online at an enormous rate. The growth of the web shows no sign of slowing down.
The number of good, online sites is clearly growing. The sites run by newspapers and news agencies are fine examples of what is best and what is most vulnerable. There are many online news services, from the Sydney Morning Herald to the New York Times and CNN. These site provides up-to-date, well-presented news at no charge to the users. The readership probably exceeds any American newspaper. As a source of current information they are excellent, but ephemeral. The information is changed continually and at the end of day most of it disappears. Conventional libraries collect newspapers and store them for centuries, usually on microform. No library or archive is storing these web sites. Among standard library materials, it is hard to estimate which disciplines have the largest proportion of their material available through digital libraries. Large portions of the current scientific and technical literature are now available, as is much government information. Legal information has long been online, albeit at a steep price. Business and medical information are patchy. Public libraries play an important role in providing current information such as newspapers, travel timetables, job advertisements, and tax forms. These are mainly available online, usually with open access.
In many situations, current information is in digital form but not historic materials, though projects to convert traditional materials to digital format and mount them in digital libraries are flourishing. Libraries are converting their historic collections; publishers are converting their back-runs. Established projects are planning to increase their rate of conversion and new projects are springing up. Several projects have already converted more than a million pages. The first plan to convert a billion pages was made public in 1998.
Currently, the biggest gap is in commercial entertainment. Some of the major creators of entertainment - films, television, radio, novels, and magazines - have experimented with ways to use the Internet, but with little impact. Partly, their difficulty comes from the technical limitations of the Internet. Most people receive much better images from cable television than can be delivered over the network or rendered on moderate priced personal computers. Partly, the rate of change is dictated by business practices. Entertainment is big business and has not yet discovered how to use the Internet for its profit.
Open access appears to be a permanent part of digital libraries, but few services have yet discovered good business models for providing open access material. A few web sites make significant money from advertising, but most are supported by external funds. Digital libraries and electronic publishing require skilled professionals to create, organize, and manage information; they are expensive. Ultimately, an economic balance will emerge, with some collections open access and others paid for directly by their users, but it is not obvious yet what this balance will be.
The web technology, which has fueled so much recent growth, is maturing. During the middle years of the 1990s, the web developed so rapidly that people coined the term "web year" for a short period of time, packed with so much change that it seemed like a full calendar year, though in fact it was much shorter. As the web has matured, the pace of change in the technology has slowed down to the normal rate in computing. Every year brings incremental change and, in combination, over a few years these incremental changes are substantial, but the hectic pace has slowed down.
This does not mean that the growth in the web has ended, far from it. The number of web sites continues to grow rapidly. Busy sites report that the volume of usage is doubling every year. The quality of graphics and the standards of service improve steadily. For digital libraries and electronic publishing, several emerging technologies show promise: persistent names such as handles and Digital Object Identifiers, XML mark-up, the Resource Description Framework, and Unicode. The success of these technologies will depend upon the vagaries of the market place. Widespread acceptance of any or all would be highly beneficial to digital libraries.
Finally, the growth in performance of the underlying Internet remains spectacular. The past couple of years have seen a series of governmental and commercial initiatives that aim to provide leaps in performance, reliability, and coverage over the next few years. We can not predict how digital libraries will use this performance, but it provides remarkable opportunities.
Research and development
Digital libraries are now an established field of research, with the usual paraphernalia of workshops and conferences. There have even been attempts to establish printed journals about digital libraries. More importantly, at least one thousand people consider that their job is to carry out research in the field. The immediate impact of this work is difficult to evaluate, but it is clearly significant. Examples of projects funded by the NSF and DARPA appear throughout this book. Many of the more recent ones were funded explicitly as digital libraries research.
In March 1997, the National Science Foundation sponsored a workshop in Santa Fe, New Mexico to discuss future research in digital libraries. It was part of a planning process that subsequently led to the announcement of a major new digital libraries research program, the Digital Libraries Initiative, Phase 2. The meeting was fascinating because it was an opportunity for the people who carry out research to describe how they saw the development of digital libraries and the research opportunities.
Many of the people at the meeting had been part of the first Digital Libraries Initiative or other federally funded research projects. Naturally, they were interested in continuing their research, but they did not simply recommend a continuation of the same programs. These early projects constructed digital library test collections and used them for research, mainly on technical topics. Some people at the workshop argued that a valuable use of government funds would be to build large digital libraries. Many people agreed that archiving is a vital topic, worthy of serious research.
Most of the discussion, however, was about making existing collections more usable. The people at the workshop are senior researchers. They need not only to find information, but also to escape from too much information. The discussions sometimes suggested that the central problem of digital libraries research is information overload. How can automatic methods be used to filter, extract, and consolidate information? The discussions embraced methods by which individuals can manage their private libraries or groups can carry out collaborative work over the Internet. Digital libraries are managed collections of information. How can they be managed for the convenience of the users?
Social, economic, and legal issues were also fully discussed. As ever, in these areas, the difficulty is how to articulate a coherent research strategy. While nobody denies the importance of these areas, there remains skepticism whether they can be tackled by large research projects.
Technical people at the meeting pointed out that digital libraries have become one of the major uses of supercomputers. In 1986, when the national supercomputing centers were established, the Internet backbone ran at 56 kbits/second, shared by all users. Today, this speed is provided by low-cost, dial-up modems used by individuals. Today's laptop computers have the performance of the supercomputers of twelve years ago. In twelve year's time ever smaller computers will have the performance of today's supercomputers.
After a few years, rapid incremental growth adds up to fundamental changes. At the Santa Fe workshop, we were asked to explore the assumptions about digital libraries that are so deeply rooted in our thinking that we take them for granted. By challenging such assumptions, the aim was to stimulate a creative agenda for the next generation of digital library research.
Finally, here is a personal footnote. In writing this book, I looked at hundreds of sources. Most were primary material, descriptions written by researchers or the builders of digital libraries about their own work. One source was an exhibit at the U.S. Copyright Office; one was an out-of-print book; for a couple of topics, I sent electronic mail to friends. For everything else, the source of the material was the Internet. Many of the materials do not exist in conventional formats. In the field of digital libraries, the Internet is already the library.
A dream of future libraries combines everything that we most prize about traditional methods, with the best that online information can offer. Sometimes we have nightmares in which the worst aspects of each are combined. In the first years of this century, the philanthropy of Andrew Carnegie brought public libraries to the United States. Now a new form of library is emerging. Hopefully, digital libraries will attract the same passion and respect, and serve the same deep needs that have long been associated with the best of libraries and publishing.
Last revision of content: January 1999
Formatted for the Web: December 2002
(c) Copyright The MIT Press 2000