mobile icon
Project

ArchiveGPT: Psychological and technical perspectives on the use of multimodal large language models in archives

WorkgroupPerception and Action
Duration09/2024-open
FundingIWM budget resources
Project description

Multimodal large language models (LLMs) generate texts based on image inputs. This makes them attractive for a wide range of applications where a large amount of image data needs to be processed. One of these applications is the cataloguing of archival images. ArchiveGPT thus focuses on applying a multimodal LLM to archaeological photo material provided by the Leibniz-Zentrum für Archäologie (LEIZA) in Mainz.


We investigate the following questions: How does a multimodal LLM perform when confronted with – for the model, often unfamiliar – archaeological objects and terms? How do archive experts (compared to non-experts) perceive the quality of the model’s image descriptions? Can they at all tell the difference between these AI-generated descriptions and descriptions generated by archive experts? How good are they at estimating their ability to distinguish them beforehand? How important is trust in AI in regard to its use?
For the first study on these questions, we generated the experimental material in close collaboration with the LEIZA. Provided with photocards from the image archive, a meta-data template usable in an archival cataloging process was generated for each photocard by the multimodal LLM and LEIZA archivists.

Cooperations
  • Mag. Dominik Kimmer, Leibniz-Zentrum für Archäologie (LEIZA), Mainz