Can AI appraise Te Papa’s public records?

Gareth Watkins & Jennifer Twist, Can AI appraise Te Papa’s public records?, Te Papa, 3 February 2026,

Could Artificial Intelligence really appraise Te Papa’s public records? Collections Data Manager Gareth Watkins and Archivist Jennifer Twist tested it on thousands of records and got results that were faster, less resource-intensive, and more consistent than expected – until they weren’t. This post unpacks the experiment, the limits we encountered, and why curatorial context still matters.At first glance, it seemed simple: teach AI to help appraise 32,000 public records. The idea was appealing – AI processing thousands of records at speed while remaining consistent across a highly repetitive task. But as we quickly discovered, that is easier said than done.

Some of Te Papa’s organisational records, colloquially known as “the Museum Archive”, are a set of mainly paper-based records with a special legal status of public records. Created by museum staff from 1865 onwards, they document over 150 years of the museum’s operations. From memoranda, photographs, and exhibition files to sketches and original research, each class of records must be appraised against approved disposal authorities. This appraisal process identifies which classes of records are of permanent value, either for business or archival purposes. Records with archival value must eventually be transferred to Te Rua Mahara o te Kawanatanga Archives New Zealand.

This mahi is part of a wider programme of work required under the Public Records Act and the mandatory standard. As a public sector organisation, Te Papa must implement disposal across our information holdings, both current and inactive, in a systematic and consistent way. The outcome of disposal implementation can be long-term retention, destruction, or transfer to Archives NZ.

A black and white photo of five dogs tied up to a wooden fence. One is lying down and the others are sitting up.
Detail of Sled dogs in quarantine on Quail Island, 1907–1909, Wellington, by James McDonald. Te Papa (MU000523/002/0254)

Each series of records must fall within the class of records recorded in the Te Papa Disposal Authority (approved by Archives NZ in March 2025) and the broader General Disposal Authorities issued by Archives NZ. These authorities, or more commonly called “retention and disposal schedules,” are legal instruments using a formal policy-style language.

By contrast, the Museum Archive has been registered in EMu, Te Papa’s collection management system, with numerous staff having input in the registration and management process over the years, using different vocabularies, formats, and conventions.

So, our first challenge was to see whether we could bridge the language gap between the disposal authorities and the descriptive records.

Smart word matching without AI

Our early attempts did not use AI at all, as we wanted to see whether the task could be done purely through traditional programming. Gareth wrote a Python script to perform smart word matching between the EMu records and the schedules.

However, the results were disappointing. There simply was not enough shared vocabulary between the two sources. Even when the same words did appear, their meanings were sometimes completely different. For example, “donation” could refer to a financial contribution in one context, or a collection acquisition in another.

We quickly concluded that this was not a task about finding matching words, but about finding meaning. What we needed was a tool that could understand context. This is where artificial intelligence entered the picture.

A watercolour of a fish with red fins and a long head fin that fans out.
Oar fish, circa 1880, maker unknown. Te Papa (MU000424/002/0004)

Governance and cultural considerations

Helpfully, Te Papa had already implemented an AI governance policy, which allowed us to clearly and confidently experiment with AI in a safe way. The policy provides a framework for documentation, risk mitigation, and alignment with key principles such as transparency, human oversight, environmental sustainability, and Māori rights and interests.

With the significant amount of Māori-related information held within the Museum Archive, we met with Dr Amber Aranui, Curator Mātauranga Māori, who reviewed the proposal and discussed potential cultural and privacy considerations. We were able to exclude fields such as provenance notes, associated people and places, focusing only on core metadata required for appraisal.

Enter AI (and our first surprise)

We then began our AI experiment using OpenAI’s Assistants platform. This enabled us to load the schedules as knowledge files and send multiple messages to and from the model – effectively running an automated “chat” for each record. No record data was retained by the model or used for OpenAI training.

In summary, we told the AI model: “You are a strict appraiser of public records in the museum’s care. Match the user input (our database record) to the most appropriate class of records in your knowledge and return the class, confidence level, and reasoning for your decision”.

Sadly, the first results were poor. The AI unexpectedly, yet confidently, returned classes that did not exist, and sometimes mashed up class definitions.

We concluded that the model didn’t truly understand the hierarchical nature of the schedules we submitted as structured JSON. So, we changed our approach and provided a simplified, flat CSV representation of the classes. This dramatically improved the AI’s ability to maintain the relationships between class definitions.

A scan of a handwritten letter on lined paper.
Outward letter: Colonial Secretary : Page 115: Purchase of specimens, 25 October 1865, by Sir James Hector. Te Papa (MU000013/001/0001/0092)

The four levers that improved performance

Through iterative testing, we identified four key levers that shaped the AI model’s performance.

First, clear system instructions were essential, as they set the rules and expectations for how the model should think and respond. Second, providing hints to the AI proved valuable; subtle nudges in the supplied text helped guide it toward the most appropriate appraisal decision. Third, setting boundaries by limiting the data the AI could see ensured that record-level detail was not overshadowed by generic, high-level information. Finally, validation checks were added to the Python script to identify obvious errors and automatically prompt the AI to try again.

The workflow

Using these levers, we developed a more structured workflow. Each record exported from EMu was converted into a single text block containing the title, description, and other relevant fields. Our Python script prefixed this with a hint generated through automated keyword scanning. When results were returned from the AI model, the script checked whether the class of records existed in the schedules and, if not, prompted the AI to retry.

These levers raised the success rate in our tests to around 75%. From a resourcing perspective, the approach looked very promising, with 1,000 records appraised under test conditions in 3.5 hours (around 13 seconds per record) at a total cost of approximately NZD$12.

What we learnt

However, there was one key learning that stopped us from proceeding with AI in this instance. The AI model did not understand the wider context or cultural ecosystem in which Te Papa operates. For example, if the AI had selected the class DA470 7.4 Research, Te Papa would retain the record, whereas if it had selected class GDA 6, 8.1.8 Records of Agency History/Social Development, the record would be transferred to Archives NZ. The record itself might look similar in either case – but the downstream consequences are completely different. A human curatorial eye was essential to consider the implications of applying a specific class of record, and how that appraisal could affect our relationships with other organisations and researchers.

We knew that appraisal would have a direct impact on how curators, external researchers, and other cultural institutions access, use, and interpret our public records. Assessing those consequences required contextual judgement and was not something that could be safely delegated to AI alone.

So, while we did not proceed with using AI at this time, the work taught us some valuable lessons. Language matters: the AI only knows what we teach it, and small changes in wording can dramatically alter meaning. Structure matters: clean, consistent data enables the AI to form more reliable connections. Human context matters: AI can assist and recommend, but judgement and responsibility must remain with people.

A black and white photo of a waterside quay with buildings in the background.
Customhouse Quay, Wellington, 1902, Wellington, by James McDonald. Te Papa (MU000523/002/0158)

Further reading

 


Read more: Annual Reports and key documents