ArchiveGPT

ArchiveGPT: Psychological and technical perspectives on the use of multimodal large language models in archives

Workgroup	Perception and Action
Duration	09/2024-open
Funding	IWM budget resources

Project description

Multimodal large language models (LLMs) generate texts based on image inputs. This makes them attractive for a wide range of applications where a large amount of image data needs to be processed. One of these applications is the cataloguing of archival images. ArchiveGPT thus focuses on applying a multimodal LLM to archaeological photo material provided by the Leibniz-Zentrum für Archäologie (LEIZA) in Mainz.

We investigate the following questions: How does a multimodal LLM perform when confronted with – for the model, often unfamiliar – archaeological objects and terms? How do archive experts (compared to non-experts) perceive the quality of the model’s image descriptions? Can they at all tell the difference between these AI-generated descriptions and descriptions generated by archive experts? How good are they at estimating their ability to distinguish them beforehand? How important is trust in AI in regard to its use?
For the first study on these questions, we generated the experimental material in close collaboration with the LEIZA. Provided with photocards from the image archive, a meta-data template usable in an archival cataloging process was generated for each photocard by the multimodal LLM and LEIZA archivists.

Cooperations

Mag. Dominik Kimmer, Leibniz-Zentrum für Archäologie (LEIZA), Mainz

ArchiveGPT: Psychological and technical perspectives on the use of multimodal large language models in archives

Project description

Cooperations

contact

Project team