21st century databases
David Newbury, Museums as 21st Century Databases, Studio – Carnegie Museum, 6 October 2015
The 21st century is now well underway, and many of our institutions are entering their second century. Some of us are approaching our third. What does it take to support a museum conceived in a time before television, operating in a world completely revolutionized by digital technology? How do we use our institutional experience to build infrastructure that thrives in our modern world of connected databases?
It is important to remember that museums have always been databases. From the founding of the first museum, our goal has been to take items that are culturally significant and protect them, catalog them, research them, and love them. We treat our collections not as objects stored on a shelf, but rather as the physical embodiment of a vast repository of data describing our cultures and our histories. In this, museums were ahead of their time. As industry has grown around us, they have begun to realize the value of stored knowledge. To talk about it they have borrowed our language and our concepts. Provenance, once a dusty word of art dealers and historians, is now the focus of a W3 working group; developers store the history of their code in repositories.
It is important, however to remember that museums have a fundamentally different view of their data than industry. Industry views data as a means to an end. They store sale histories to grease the wheels of future purchases; they store search histories to improve future searches. In industry, data’s raison d’être is to make money, and because of this, vast sums of money have been spent building tools to manipulate data. We benefit from their work in that we could never afford to construct such elegant tools, but we should never forget that the tools we use were designed to solve a problem other than ours.
Our ultimate goal is “… the increase and diffusion of knowledge.”1 In pursuit of that goal, our data is not stored for monetization, but because data is the root of knowledge. The knowledge of our cultures and our sciences is intrinsically valuable. Data becomes knowledge when it is connected; connected to people, to events, to the stories of our humanity. Museums are not merely vaults; we store objects and their data in trust for the public. This does not mean that we shouldn’t care about preservation: quite the opposite! But we should remember that preservation is not the goal, and that we preserve in service of our mission to put knowledge in the hands of those we serve. We educate, we preserve, and we provide access to objects are not ours. We are merely the caretakers.
The combination of this wealth of information, combined with tools from industry to store and analyze this data, provide us with an unprecedented opportunity connect and disseminate knowledge. But we cannot do it ourselves. Even at the largest museums, the firehose of information overwhelms. We must come together to share knowledge and expertise. Together, we can focus on making explicit the connectionsbetween the data, not just the facts themselves. These connections are stronger and the stories are richer when we all contribute to the pool of data. There are innumerable stories in the connections between the data, and the connections are stronger and more varied when we all work together.
Given all that, what are the characteristics of a 21st Century digital infrastructure?
It is available. Too often our information is locked within systems that are individually useful, but difficult to access. We should provide secure, consistent access to everyone who needs the information, both within our individual museums and, when appropriate, across the world.
It is consistent. Without a universal way to access and understand this information, every system and every institution’s data remains intelligible only to itself. We should provide aggregated information, while respecting the nuances and expertise of each institution and department.
It is connected. Too often, the rich connections between our institutions’ data are hidden, even to the expert. We should make explicit these connections, allowing the richness of our history and our staff’s expertise to inform across domains and institutions.
It is global. As the world becomes smaller and as technologies make information more readily available, museums need to join together to fulfill our collective mission of holding our culture’s artifacts and data in public trust. We should share our expertise with the world and we should also leverage the strengths of other authorities to enrich our knowledge and experiences.
Much of this was true in the 20th century, and the 19th, and the 18th. Why do these conversations seem to be happening now, and not 50 years ago, or 50 years in the future? One answer is the inconceivable amount of information available. More information is created and stored every day than existed a century ago. Almost two billion photos are taken daily, dwarfing even the most prolific museum’s collecting. One of those billion photos will be a masterpiece, but how will we find it?
The story of the 20th century was that of data collection—the story of the 21st is going to be that of filtering the data. Storage and preservation are still great challenges for our sector, but they are quickly being eclipsed by the problems of access, discovery and filtering. From industry, we can borrow tools: Machine learning, semantic technologies, graph theory, linked data, and distributed systems. However, we must work to understand not only the capabilities these tools provide but also the nuances of the problems we want them to solve. The tools can help make the connections, but the stories must come from ourselves.
1 James Smithson, 1829, Smithsonian Bequest