US scientists archive data
Dana Varinsky, Scientists are archiving government data to protect it from the Trump administration, Business Insider, 19 January 2017
Bergman, a PhD student in applied physics at Harvard, knows firsthand how much research is conducted by government scientists at agencies like the Environmental Protection Agency and the National Oceanic and Atmospheric Administration (NOAA). And he understands how heavily scientists around the world depend on it.
As of now, many government-produced reports about climate change, greenhouse gases, ocean temperatures and more are available to the public online. But Bergman realised that if Trump’s administration does indeed slash research budgets as promised, or even if a new agency head doesn’t want data on a topic like climate science to be as easily accessible, scientists might soon find it much harder to find the data they need.
“Explicit threats in our mind have been made to research programs, regulatory programs, and data collection programs that the scientific community — through many conversations I’ve had — has said are crucial to the work that we do,” Bergman tells Business Insider. “For example, there’s a number of climate models that run on NOAA’s computers — really complex, fluid, dynamical models that essentially are only running on government computers.”
In November, Bergman got together with some friends and other PhD students at Harvard and started brainstorming how to preserve the data that scientists think is valuable. They soon found other scientists and policymakers with the same concerns, and joined a group called the Environmental Data and Governance Initiative (EDGI), which formed after the election and is now working to archive publicly available information and monitor changes in government websites.
The group has approximately 50 members, and is rapidly working to download and store the government’s scientific data. The members are interviewing scientists, policy makers, and current and former agency employees to determine the top-priority websites and data to protect for the scientific community.
They have proprietary software that’s crawling government websites and downloading all the information they contain, including both what the sites say and look like, as well as the PDFs, links, and other reports they contain. For things the crawler can’t catch — like online databases or interactive platforms — the group is assembling volunteers and paid coders to capture and archive the data.
For reports and information that aren’t available online, EDGI and the Sierra Club are both working to submit FOIA requests. According to a Bloomberg report, the requests could ensure that agencies don’t get rid of files, since government agencies aren’t permitted to destroy files that are pending public release.
It’s a massive project — one that won’t be complete by Friday, when Trump is inaugurated. But the group doesn’t expect data to disappear immediately. Plus, Bergman explains that EDGI’s long-term goal goes beyond fears about Trump’s agenda. The group plans to eventually create an open, independent server that stores the government’s scientific data and allows anyone to search through it and find what they need, regardless of the administration in power. (They’re currently raising funds to further that goal.)
“Our work is motivated by making sure that scientific works that we know are being done or have been done, and data out there that is so important, are preserved in a way that can be useful for ongoing research and ongoing regulatory practices,” Bergman says. “We believe that this stuff should be stored on servers throughout the country that are public, so that nothing can ever be removed from the internet.”
Bergman says EDGI doesn’t yet have reason to believe that Trump’s administration will intentionally burn or delete data on, say, climate change or pollution. Instead, he says, scientific reports might simply become harder to find because an agency lacks funding to keep servers running. Or if projects get combined or moved, data sets might slip through the cracks in that shift. And eventually, if that research ceases to be listed online, future researchers might not even know it exists.
“When you’re defunding something, sometimes you intend to reduce capacity — you believe that the expense is not worthwhile and you genuinely don’t want a supercomputer on as much because it costs money. In that way, even though it doesn’t have a political impact, it does have the intention of reducing the amount of scientific research that can be done,” Bergman says.