Skip to main content

Part of the book series: Science and Fiction ((SCIFICT))

  • 537 Accesses

Abstract

In his description of a “Universal Library” Lasswitz asks us to try and make sense of an almost unimaginable amount of information. Even in our increasingly digital world we don’t have to deal with anything remotely resembling a Universal Library. Nevertheless, as citizens navigating our connected world, we are all of us creating ever-increasing amounts of digital data. Algorithms are now routinely mining big data sets and finding hitherto unknown relationships. It’s a field of immense promise, but also one of great threat.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 19.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 29.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Bibliography

Download references

Author information

Authors and Affiliations

Authors

Appendices

Note on the Translation

Kurd Lasswitz published this story in Germany in 1904. The German–American science writer Willy Ley published an English translation, which appeared in the popular 1958 book Fantasia Mathematica, edited by Clifton Fadiman. Ley’s translation, however, focused strongly on the mathematics, and he chose to omit numerous elements of the story while adding some of his own material. I’ve therefore presented an original translation here, which I hope better retains the feel of a turn-of-the-century German story. (In this I have been guided by my wife Heike, who is a native German speaker.)

Some specific notes follow.

  • Characters—four characters appear in the story: Professor Wallhausen; his wife; their niece, Miss Susanne Briggen (whom the Wallhausens sometimes refer to by the diminutive “Suse”); and Wallhausen’s friend, an editor, Max Burkel. In the original story, the professor’s wife is referred to as “Frau Professor” and “die Hausfrau”; we’ve translated these throughout by the phrase Mrs Wallhausen. We aren’t told the given name of either the professor or his wife.

  • Kulmbacher—a beer. The Kulmbacher brewery, based in a small town near Bayreuth, is still producing beer. I can strongly recommend it.

  • table of logarithms—a more modern translation would probably use “smartphone” rather than “table of logarithms” because smartphone apps can handle advanced mathematics. Before people had such technological wonders available to them they’d perform tedious mathematical calculations with the help of log tables.

  • the book with the train timetables—the original story talks about the Reichskursbuch, or Imperial timetables. This was a book of train schedules and ship and canal connections. The first such book was published in 1850.

  • look something up in Faust—Ley’s translation, for an English-speaking audience, employed Shakespeare as the author whose name everyone would know. The closest equivalent in Germany is perhaps Goethe, and his play Faust is the best-known literary work in German.

  • begin ‘History of the Thirty Years War’—Lasswitz deliberately mixes up various events to demonstrate why it’s impossible to trust a history text taken from the Universal Library. The Thirty Years War (1618–1648) was a deadly conflict in Central Europe, which began as an altercation between Protestant and Catholic states and then escalated into more general war. Prince Blücher fought against Napoleon at the Battle of Waterloo, some two hundred years after the Thirty Years War began. The kingdom of Dahomey was annexed by the French in 1894; it’s now Benin. It’s unlikely Blücher would have ever met the Queen of Dahomey; and even he had, he wouldn’t have married her at Thermopylae, which is best known for the battle that took place there in 480 BC during the second Persian invasion of Greece!

  • the opening lines of Iphigenia in Tauris, as I once recited them—When Susanne recited the opening lines of Goethe’s play Iphigenia in Taurus she mixed up well known lines from three different German plays. “Out in your shadows, brisk treetops” is correct—it’s the first line from Goethe’s play; but “Obeying need, not my own desires” comes from Schiller’s The Bride of Messina; and the line “I want to sit down on this stone bench” comes from Schiller’s Wilhelm Tell. An English version of this passage might be from Hamlet’s soliloquy: “To be or not to be, that is the question. To live is the rarest thing in the world; most people exist, that is all. What’s breaking into a bank compared with founding a bank?”

  • a trillion is a nice number—Lasswitz has the professor use the old definition of a trillion: a million million. This definition is still sometimes used, but nowadays the usual definition of a trillion is one thousand million: a one followed by nine zeros. In the context of the number of volumes in the Universal Library it really doesn’t matter which version of a trillion you care to use.

  • You all know that light travels—the speed of light had been carefully measured in the decades before this story was written. At the time of its publication, a few scientists were speculating that the speed of light might be a limiting velocity in dynamics (see next chapter).

  • the farthest visible galaxies—Lasswitz uses the term Nebelflecken, which is the German word for ‘nebulae’. I’ve translated it as ‘galaxies’, which is what Lasswitz meant in spirit. At the time he was writing, however, astronomers didn’t understand the nature of galaxies.

  • his wife softly quoted—Mrs Wallhausen gives the second verse of Goethe’s poem “Limits of humanity”. I’m sure there are much better translations than the one given here!

Note

As this book was going to press, I became aware of a recent English translation of “The Universal Library”. See Born (2017).

Commentary

For my undergraduate physics degree, many years ago, the final set of exams contained a general paper in which we students could be asked anything. The sky is dark at night: explain. Or: How many molecules of Caesar’s last breath do you inhale each time you take a breath? Or: Derive a lower limit on the proton lifetime from the fact of your own existence. The purpose wasn’t to test whether we could remember physics facts—books were allowed into the exam hall, so we could look up whatever facts our books made available. Rather, the exam tested our ability to strip down a problem to its essence and apply mathematical arguments to reach a reasonable, if not exact, conclusion. (This was in the years B.I., before the internet. Nowadays, if smartphones were allowed, students could google the answer even to seemingly random problems such as those above.) One of the questions on my final paper essentially required us to recreate the argument made by Professor Wallhausen. I aced the paper, so I’ve had a soft spot for the concept of a universal library ever since.

I first came across the concept in fiction in “The Library of Babel” by Jorge Luis Borges. The Borges tale was published in 1941, almost four decades after “The Universal Library”, but it’s far more celebrated than Lasswitz’s story. (Incidentally, Borges wrote his story while he was working—unhappily—as a librarian.) Where the scientifically-trained Lasswitz took a rigorously mathematical approach to the question of a universal library, the philosophically inclined Borges took a more metaphysical approach. In “The Library of Babel”, Borges imagines a universe filled with planes of interlocking hexagonal rooms, each room having four of its walls lined with bookshelves. Spiral staircases connect the planes. Each book on the shelves is different. For Borges, each book contains 410 pages; there are 40 lines per page, 80 characters per line, and 25 different possible typographical symbols. If you run the numbers in the same way as Lasswitz did, it’s easy to calculate that Borges’ Library of Babel contains about 1.95 × 101834097 different books. This is a huge number, incomparably bigger than the number of particles in the observable universe. It is, however, vastly smaller than the number of all possible books, calculated by Lasswitz, which is 102000000. It’s difficult to comprehend the size of the numbers contained in “The Universal Library” and “The Library of Babel”. Perhaps a comparison with some real-world attempts at a universal library can put them into perspective.

The original universal library was the ancient Library of Alexandria, the tragic destruction of which through fire meant manuscripts of immense cultural significance were lost to the ages. The library’s index was also lost in the conflagration, so it’s not known for certain how many books were housed at Alexandria—experts suggest the number of scrolls would have been between 40,000 and 400,000. If the number of scrolls were at the top end of the range then the Library of Alexandria would have housed a significant fraction of the ancient world’s knowledge. (It’s interesting to note that an important function of a library is to classify and organize knowledge. An ancient library wouldn’t necessarily have been organized in the same way as a modern library because the ancients viewed the world in a quite different way to us. What we might classify as poetry, for example, the ancients might have classified as natural science.) Moving forward in time, the present British Library contains some of the world’s most significant books and manuscripts, items that are priceless. In addition to the quality of its collections, the British Library stands out in terms of quantity: it has more books than any other library except the American Library of Congress. The LOC, the world’s largest library, has 32 million books and many more millions of photographs, maps, and manuscripts. Of course, even the British Library and the Library of Congress (see Fig. 7.1) now have a rival: the internet. The internet can be thought of as a library containing not just text, but images, sounds, videos, and simulations. (Indeed, one website even provides a simulation of Borge’s Library of Babel—visit https://libraryofbabel.info for a disorientating glimpse of what the Library contains; see Fig. 7.2. You’ll struggle to find anything of interest in it, however. As both Lasswitz and Borges emphasised, there’s a problem with indexing a universal library.)

Fig. 7.1
figure 1

Reading rooms of the Library of Congress (top) and the British Library (bottom). The LOC and BL are among the largest libraries in the world, but their combined storage capacity would be insufficient to house the printed output of the Web—let alone the unimaginable vastness of the Universal Library (Credit: top—Carol M. Highsmith; bottom—Diliff)

Fig. 7.2
figure 2

Jonathan Basile, creator of the online universal library, standing in front of a page of text from the library (Credit: Alan Levine)

The Library of Alexandria, the British Library, the Library of Congress, and the internet. If you were to collect all the items contained in these libraries and throw in all the items from all the other libraries, public and private, that people have put together throughout history—well, the collection fill only a tiny fraction of Borge’s Library of Babel. It would be vanishingly small compared to Lasswitz’s Universal Library. The real world is much smaller than the world of mathematical possibility. Nevertheless, you can’t deny our technological civilization is producing data at an unprecedented rate. And this opens up numerous challenges and opportunities. Let’s see why.

Suppose we wanted to print out all the text that appears on the Web. How many books would we need? It’s impossible to give a definitive answer, of course, but we can make an informed guess.

In 2014, scientists estimated that the internet housed 1 billion websites; the number of sites fluctuates because new websites are created and old websites are retired, but a round billion is a reasonable figure. The number of websites by itself doesn’t help us, because each site can contain multiple pages, but it’s possible to account for that. In 2016, scientists estimated that there were 4.66 billion web pages. (These estimates ignored material on the so-called “Deep Web”—a corner of the internet which, for many different reasons, both legitimate and illegitimate, is not indexed by search engines. By definition, it’s difficult to calculate the size of the Deep Web but experts estimate that it contains orders of magnitude more material than appears in traditionally indexed pages. For simplicity, though, let’s agree to omit the Deep Web from our calculations.) In the spirit of Professor Wallhausen, let’s suppose the internet contains 5 billion web pages and each web page, if printed out on paper, corresponds to 10 book pages. (I have no idea how realistic this estimate is, but it doesn’t seem too unreasonable.) In this case, if you were to make a hard copy of the Web you’d end up with 50 billion printed pages. If we assume an average book has a page length of 500 then we know how big a library would have to be in order to house the “Surface” Web: the library would have to hold 100 million books. Neither the British Library nor the Library of Congress would suffice.

But our online world consists of more than just static webpages. With their tweets, Twitter users generate the equivalent of about 25,000 500-page books each day; Facebook users share 78 million links each day; around the world, about 200 billion emails are sent each day. It’s as if the general public is regularly filling the Library of Congress with content. And of course digital content isn’t restricted to textual material of the sort that interested Professor Wallhausen: there are graphics, maps, photos, songs, videos, simulations … all sorts of information is now in digital form. Text-based data constitutes only a small fraction of what is stored on the internet. It’s worth reiterating that any numbers we use to capture the size of the internet can’t compare with the ungraspably large numbers discussed by Lasswitz in “The Universal Library”. But by most real-world standards we are surely justified in saying the internet is big, and getting bigger with each passing year. In order to better quantify this, though, and understand why this trend carries with it both challenges and opportunities, we need to look at how computer scientists measure storage requirements for data.

Ultimately, computers work with binary digits—bits: 0 or 1, on or off, north or south. When discussing data, however, a more useful unit is the byte, which is eight binary digits long. Most computers use a byte to represent a single character—letter, number, typographical symbol. In the early days of personal computing, when people worried about how much memory was inside their machine and about the file size of their documents, units such as kilobyte and megabyte entered the common parlance. Note that the terms have two different but equally valid definitions, so there is some confusion here. A kilobyte can be 1000 bytes, as the name implies; but in computing it’s convenient to work in powers of 2 so a kilobyte is often 210 = 1024 bytes. A megabyte can be 1,000,000 bytes; but it can also be 220 = 1,048,576 bytes. The difference between the two definitions increases as the size increases, but for the purposes of this discussion we needn’t be concerned. We are interested in orders of magnitude, not in precise numbers.) Anyway, as many aspects of computer technology advanced along the exponential curve known as Moore’s Law—with a doubling every 18 months or so—people began talking about the gigabyte (a billion bytes) and then the terabyte (a trillion bytes). My current computer contains a terabyte hard disk, a luxury unthinkable back when home computers typically came with a 360-kilobyte floppy disk drive. The pattern whereby each named unit of data is one thousand (or 1024) times greater than the previous unit continues: after the terabyte we have the petabyte, then the exabyte, zettabyte, and yottabyte. I’ve even seen mention of the brontobyte and geobyte. Phew.

For someone who remembers having to transfer data from computer to computer on 5.25-inch floppy disks, a unit such as the zettabyte seems ludicrously inappropriate for any realistic computing task. And yet last year, as I write, internet traffic exceeded a zettabyte; this year there’s been even more traffic; next year it will be greater still. Individuals, businesses, universities, research teams … it seems as if the world is becoming a factory for generating data. As mentioned above, handling such a flood of data presents challenges—conceptual, technical, and ethical. But if we can tame the deluge then the opportunities are immense.

Let me give just one example of the challenges and opportunities of so-called Big Data. The example happens to come from astronomy, but I could have taken an example from other areas of science—or from healthcare, retail, technology … indeed, from most aspects of human endeavour.

We are entering a golden age of observational astronomy and cosmology. Consider, for example, the Large Synoptic Survey Telescope (LSST). When this wonderful telescope commences operations in 2022 it will consist essentially of three very large mirrors, behind which a 3.2-gigapixel digital camera will take 15-second exposures of the sky every 20 seconds. Scientists expect the camera to take 1.28 petabytes of data every year. Human astronomers simply won’t be able to process that amount of data: there aren’t enough eyes and brains for the task. And the LSST is just one of many observatories—operating not just throughout the electromagnetic spectrum, but also using gravitational waves and particle detectors. Some have already seen “first light”, some are soon to come online. Each of them will generate such vast quantities of data that human astronomers would drown if they tried to process it manually. But if data scientists could store and index the observational data in an efficient way then machines would be able to mine the data for us—and make discoveries much more quickly than humans would be able to do. Indeed, machines might make serendipitous discoveries that humans themselves would miss.

This approach is already bearing fruit.

As I began to write this commentary, astronomers published a paper explaining how they trained an AI (the same sort of algorithm that Google DeepMind used to beat the world Go champion, as discussed in the previous chapter) to search for gravitational lenses. A gravitational lens can be seen when light rays from a distant galaxy are bent by the gravitational influence of an intervening galaxy; instead of observing a small disk we instead see the light of the distant galaxy smeared into arcs. The AI was trained to recognise known gravitational lenses and then asked to find lenses in a much larger data set containing millions of astronomical images. The AI quickly discovered 56 new gravitational lenses. At present, there remains a large element of human intervention in this work. Eventually, though, there’ll be no need for visual inspection by humans. The discovery process will speed up immensely.

The same approach will, I’m sure, be taken in all those other areas of endeavour I mentioned above, all those other areas in which Big Data is being generated. In other words, artificial intelligence will be applied everywhere.

Perhaps this will be the future of artificial intelligence: not androids walking around with us but AIs analysing data to help us make scientific discoveries, guide political decisions, improve human health. Asimov’s robot stories are typically remembered as being about androids—machines in human form. But he also wrote stories about a supercomputer called Multivac. The all-powerful Multivac was essentially a machine that had learned to navigate a useful corner of the Universal Library and excelled at Big Data problems. It acted as humanity’s guide. Perhaps humans and machines will go forward together—with humans asking the questions and machines providing the answers?

I have to end this chapter with mention of Asimov’s personal favourite of his own stories: “The Last Question”. In the story, a technician asks Multivac a question involving the basic laws of physics: can the universal increase in entropy be reversed? Multivac ponders, then replies: “Insufficient data for meaningful answer”. But Multivac doesn’t forget the question, and considers it through the aeons. I won’t spoil the story for you, except to say that Multivac does eventually present an answer.

The notion explored in “The Last Question”—whether it’s possible, even for a super-advanced AI, to circumvent the laws of physics—leads us nicely into Chapter 8. The next story asks: is it possible to travel faster than light?

Notes and Further Reading

  • Caesar’s last breath—Norman Thompson (1987) collected 137 problems asked of Bristol final-year undergraduate students over a 25-year period. The questions in the book reflect a particular local tradition—short, unstructured problems that often require subtle mathematical analysis—so the book won’t be for all tastes. But the questions do force you to think. Incidentally, the question regarding Caesar’s last breath provided the title to a fascinating book about Earth’s atmosphere; see Kean (2017).

  • the “Library of Babel” by Jorge Luis Borges—Lasswitz’s (1904) tale clearly influenced Borge’s own story. Borges read German literature in the original, and in his essay “The total library”, which was published a year before “Library of Babel”, Borges explicitly refers to Lasswitz as being the first exponent of the concept. Borges story can be found in the collection Labyrinths (Borges 1962); his essay can be found in the collection The Total Library (Borges 2000).

  • difficult to comprehend the size of the numbers—Although the numbers appearing in this story tale seem huge beyond comprehension, mathematicians routinely work with quantities that dwarf the 102,000,000 which Lasswitz gives as the number of possible books. In combinatorics, for example, mathematicians often use the uparrow, or ↑, to denote exponentiation. Thus 2↑2 = 22 = 4; 3↑4 = 34 = 81; and so on. A pair of arrows, ↑↑, represents a tower of exponents. Thus 3↑↑3 represents 3 to the power of 3 to the power of 3, or 327, which is 7,625,597,484,987. The number 3↑↑4 is thus 37,625,597,484,987, which already dwarfs the number Lasswitz calculated. For the sorts of problems in which the uparrow notation is used, even 3↑↑4 would be considered insignificantly small. A discussion of the uparrow notation can be found in Webb (2018).

  • the ancients might have classified as natural science—For more information on ancient libraries, see Nicholls (n.d.).

  • the internet housed 1 billion websites—Netcraft, a UK-based internet services company, provides research data and analysis on many aspects of the internet. It has been surveying the internet since August 1995. The first time the survey exceeded one billion websites was in September 2014 (Netcraft 2014). The site internetlivestats.com (n.d.) contains a live estimate of the number of websites; at the time of writing, the number is approaching two billion.

  • 4.66 billion web pages—This estimate of the number of web pages was made by Van den Bosch et al. (2016).

  • regularly filling the Library of Congress with content—The program SCIgen, created by scientists at MIT, adds a small but enjoyable trickle to this flood of textual information. An editor such as Max Burkel would dread SCIgen: the program generates nonsense that nevertheless possesses a level of superficial plausibility. The program’s authors developed it to autogenerate submissions to conferences that one suspects have low submission standards. The program works. An early paper (SCIgen 2005)—with an abstract consisting of the sentences “Many physicists would agree that, had it not been for congestion control, the evaluation of web browsers might never have occurred. In fact, few hackers worldwide would disagree with the essential unification of voice-over-IP and public/private key pair. In order to solve this riddle, we confirm that SMPs can be made stochastic, cacheable, and interposable.”—was accepted by a conference. Since then, SCIgen output has fooled many others. Visit https://pdos.csail.mit.edu/archive/scigen/ to generate your own computer science paper, complete with fake graphs!

  • astronomers published a paper—See Petrillo et al. (2017).

  • Asimov’s personal favourite—Several Multivac stories, including “The Last Question”, appear in Robot Dreams (Asimov 1986).

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Webb, S. (2019). Big Data. In: New Light Through Old Windows: Exploring Contemporary Science Through 12 Classic Science Fiction Tales. Science and Fiction. Springer, Cham. https://doi.org/10.1007/978-3-030-03195-4_7

Download citation

Publish with us

Policies and ethics