The Daily Zurich

Zurich news, every day

News

Zurich's Digital Archives Race to Fix a Duplicate Image Crisis — Here's Where Things Stand This Week

A technical flaw affecting thousands of digitised records has pushed ETH Zurich's library and the city's municipal archive to accelerate a joint remediation effort that began in earnest this spring.

By Zurich News Desk · Published 4 July 2026, 9:16 pm

3 min read

Zurich's Digital Archives Race to Fix a Duplicate Image Crisis — Here's Where Things Stand This Week
Photo: Photo by Elijah Cobb on Pexels

Thousands of digitised photographs and historical documents stored across Zurich's major public archives contain duplicate image entries — in some cases the same scan indexed under multiple file identifiers — and institutions are now scrambling to clean up the problem before it compounds further. This week, the remediation project entered a new phase, with ETH-Bibliothek and the Stadtarchiv Zürich both confirming they have begun deploying automated deduplication pipelines across their shared metadata repositories.

The issue matters now because both institutions are in the middle of a broader push to migrate legacy collections onto the Swiss national aggregator platform Helveticat and its successor infrastructure. Every duplicate record that travels into the new system creates downstream errors — broken permalinks, double-counted holdings statistics, and, most critically, misdirected user queries. Researchers at ETH Zurich's main campus on Rämistrasse who rely on those archives for primary-source work have been flagging the problem for the better part of eighteen months.

What Triggered the Urgency This Week

The immediate trigger was a routine data-quality audit completed at the end of June. The audit, conducted internally by ETH-Bibliothek's digital preservation team, found that roughly 4,200 image objects within a single historical photography collection — acquired in batches between 2019 and 2024 — carried duplicate persistent identifiers. That figure does not include the Stadtarchiv Zürich's own holdings on Neumarkt, where a parallel audit is still under way. The two institutions share an interoperability agreement signed in 2022, which means a contaminated record in one system can propagate to the other during routine synchronisation cycles that run every 72 hours.

The Stadtarchiv, which holds civil registration documents, city council minutes, and photographic collections stretching back to the mid-nineteenth century, paused its synchronisation feed to ETH-Bibliothek on Monday, 30 June, as a precautionary step. That pause is expected to remain in place until at least 11 July, according to a technical notice posted to the archive's public portal this week.

Staff at both institutions are working with a Python-based deduplication tool developed initially for the Zentralbibliothek Zürich on Zähringerplatz, which ran a smaller but structurally similar cleanup operation on its newspaper microfilm catalogue in late 2024. That earlier project took eleven weeks to resolve approximately 900 conflicting records, giving archivists a benchmark — and a warning — about how labour-intensive the current, larger effort is likely to be.

What Comes Next for Researchers and the Public

For anyone using the online portals right now, the practical advice is straightforward: if a permalink to an archival image returns a 404 error or redirects unexpectedly, the record is likely among those under active remediation. Both institutions are asking users to report broken links through their standard helpdesk channels rather than assuming material has been withdrawn or destroyed.

The longer-term fix involves adopting the IIIF Presentation API version 3.0 standard across both repositories by the end of 2026, a deadline that was already on the roadmap but has now gained sharper urgency. IIIF — the International Image Interoperability Framework — provides a structured way to assign and validate unique manifests for digitised objects, making the kind of duplication currently causing headaches structurally much harder to replicate.

Zurich's situation is not unique in Switzerland. The Swiss Federal Archives in Bern flagged comparable metadata integrity problems in a published review last year, noting that rapid digitisation drives during the pandemic years created conditions in which quality controls were sometimes compressed. The current effort in Zurich is, in that sense, part of a national reckoning with the costs of speed-over-accuracy digitisation decisions made between 2020 and 2022.

Archivists at both Neumarkt and Rämistrasse say they expect a full remediation report to be published by September. Until then, the synchronisation pause between the two institutions' systems means that new acquisitions will continue to be catalogued locally but will not be visible to cross-institutional searches — a limitation that, for now, researchers simply have to work around.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily Zurich

This article was produced by the The Daily Zurich editorial desk and covers news in Zurich. See our editorial standards for how we use AI.

The Daily Zurich brief

The day's Zurich news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily Zurich and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to Zurich news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily Zurich and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily Zurich

More in News

Enjoyed this story? Get tomorrow's briefing free.