Te Papa botanical specimen data now on GBIF

Lucy Schrader, Te Papa’s botanical specimen data now on GBIF, Te Papa Blog, 19 February 2024

Te Papa’s botanical specimen data now on GBIF

Where can you find harakeke outside Aotearoa New Zealand? What species of forget-me-not live on Banks Peninsula?* Answering these questions is now a lot easier because our herbarium’s botanical specimen data – 250,000 records about plants, with information about what they are, where they’re from and lots more – have been released on the world’s biggest database of living things, GBIF. Kaitūhono Hora Raraunga | Digital Channels Outreach Manager Lucy Schrader tells you about making it happen.

What’s GBIF?

The Global Biodiversity Information Facility (GBIF) is the go-to place for anyone who wants to know what’s living where: plants, animals, fungi, and whatever slime moulds are. It’s a valuable resource for anyone interested in exploring and safeguarding the wonders of our natural world.

It’s based on the data collection already done by scientists, research organisations, and citizen science platforms like iNaturalist, who record where and when specimens are found, their characteristics, and what species they are.

So you don’t have to go ask every single source for their data to do some science, the data gets shared on GBIF, creating a centralised repository that saves time and effort while giving you a more complete picture of what’s out there.

Data from around the globe is also available to anyone – access is open and free for scientists, policymakers, and the general public. If you’re interested in the diversity of life on our planet, you can use it.

What might you use it for?

Scientific Research: Researchers use GBIF data to study and understand the distribution, behavior, and characteristics of various species. This helps in scientific discoveries, biodiversity conservation, and ecological studies.
Conservation: Conservationists and policymakers use GBIF to make informed decisions about protecting endangered species and their habitats. It provides essential information for creating effective conservation strategies.
Education: Students, educators, and the general public can access GBIF to learn more about the incredible diversity of life. It supports educational programs and raises awareness about the importance of biodiversity.
Climate Change Studies: Understanding the distribution of species over time is crucial for studying the impact of climate change on different ecosystems. GBIF data can help track changes in species distribution and abundance.

GBIF is built to get this information to all these people in a FAIR way – meaning Findable, Accessible, Interoperable, and Reuseable. It standardises what can be massively diverse datasets and practices so you can search everything at once, grab what you need, and ask useful questions that span time and space.

Te Papa’s natural history collections

A screenshot of an entry on GBIF showing taxonomic, location, dataset, reference, and image data, as well as a selection of a map with a blue pin showing location of specimen collection. — Thelymitra cynea (Lindl.) Benth. (Blue sun-orcid) from our collections on GBIF. Click through and scroll down to see all the images and structured data we’ve shared.

Our natural history collections are significant, both because they include many important specimens (like types, the specimens linked to the scientific name of a species), and due to the long history they cover.

Our botany collection (called WELT by people in the know – short for WELLINGTON) includes plants collected by Banks and Solander on Captain Cook’s voyages around Aotearoa, and represents how our natural world has changed since. With specimens from Aotearoa, the Pacific, and the wider world, our data are an important piece of the global and historical picture.

By adding these records to GBIF and making them FAIRer it’s now much easier to incorporate that piece. Researchers can combine our data with those from places like Auckland Museum, Manaaki Whenua’s Allan Herbarium and Museums Victoria in Australia and learn more than any one source could tell them.

What we’re sharing

Our dataset is what’s called occurrence data – the record of a particular specimen existing at a defined place and time.

We provide an identification (which species it is), and information about when and where it was collected. Go to one of our records and scroll down to see what else we include.

When available, we also share images. For the botany collection these are often herbarium sheets but more recent collections often have photos from the field too. These are great for showing the living plant and the environment it lived in.

All the data and images are shared openly with a Creative Commons Attribution license, because we want them to be used.

Some of the more than 90,000 images that have been published along with our botanical records.

What it’s useful for

It’s hard to sum up just how useful getting data into GBIF is, because it’s helpful in such a massive range of ways. Here are just a few examples.

Search and mapping let you do stuff like finding all the specimens within a rohe or on an island. It’s also a great tool when you just need to look up whether a particular critter is likely to exist in your area.

As mentioned, more complete information supports better research, in areas like taxonomy (how many species are there, how are they distinguished?) and ecology (when do different plants flower, and is this changing, what species grow together?).

GBIF data is used by Bionomia, which links up data about specimens and the people who collected them. Not only does this help assign credit where it’s due and raise the profile of working scientists, it’s supported research to reveal the contributions of women collectors.

Keeping data useful – updates and improvements

A section of a map with a blue outline around a location and yellow dots in different places within the outline. — GBIF’s mapping tools make it easy to see what comes from specific places, like the nearly 700 plants in Te Papa that come from Whakatōhea’s rohe. This map uses boundaries from native-land.ca.

As you can see throughout this blog, we’re always collecting new specimens. We’re also identifying them, taking images of them, doing research on them – plus adding more information about the older collections.

If our data on GBIF doesn’t keep up with these changes, it’ll become less complete and possibly even misleading – it’ll be less useful. So every two weeks our data will get refreshed with all the latest additions, updates, and new images.

We’re also going to keep working on improving what we share. The current dataset doesn’t have all the information from our collection management system, so we’re looking at the possible cleanup, standardisation, and processing we could do to get it out there as well.

Next year we’ll start releasing our zoological datasets the same way – look forward to 250 years of birds, insects, molluscs and more!

How it happens – the tech stuff

As mentioned, our specimen records are held in our collection management system, EMu. They’re made available online through our API, which is partially compliant with Darwin Core, the data standard used by GBIF.

Darwin Core provides a breakdown of how to structure information, and what labels to use for each part. For example, to let people know who collected a plant you use the recordedBy field. If everyone does this the same way, people or applications reading the data always know where to look for names of collectors.

A black screen with green text written as code on numbered lines. — Some of the event metadata for a specimen of Myosotis umbrosa using Darwin Core fields.

To export the data and do the processing that makes it fully compliant we use Blastochor, a python application that interacts with the API. That github link includes the config and mapping files for the GBIF export, if you’re curious.

What comes out is a set of spreadsheets, where each column is a Darwin Core field. We currently use an occurrence core file, with two extension files:

Occurrence core – everything we have about a specimen occurring at a place and time
Identification extension – the full identification history of each specimen
Media extension – metadata and links for images we’re sharing

We zip up the exported spreadsheets and load them into our Integrated Publishing Toolkit (IPT), where we also add all the useful metadata that describes the dataset. When we’re ready, we hit the button and the IPT talks to GBIF, publishing the records for all to use.

Drop me a line on [email protected] if you have feedback about the data or our tools.