Thousands of digitised photographs and historical documents stored across Zurich's major public archives contain duplicate image entries — in some cases the same scan indexed under multiple file identifiers — and institutions are now scrambling to clean up the problem before it compounds further. This week, the remediation project entered a new phase, with ETH-Bibliothek and the Stadtarchiv Zürich both confirming they have begun deploying automated deduplication pipelines across their shared metadata repositories.
The issue matters now because both institutions are in the middle of a broader push to migrate legacy collections onto the Swiss national aggregator platform Helveticat and its successor infrastructure. Every duplicate record that travels into the new system creates downstream errors — broken permalinks, double-counted holdings statistics, and, most critically, misdirected user queries. Researchers at ETH Zurich's main campus on Rämistrasse who rely on those archives for primary-source work have been flagging the problem for the better part of eighteen months.
What Triggered the Urgency This Week
The immediate trigger was a routine data-quality audit completed at the end of June. The audit, conducted internally by ETH-Bibliothek's digital preservation team, found that roughly 4,200 image objects within a single historical photography collection — acquired in batches between 2019 and 2024 — carried duplicate persistent identifiers. That figure does not include the Stadtarchiv Zürich's own holdings on Neumarkt, where a parallel audit is still under way. The two institutions share an interoperability agreement signed in 2022, which means a contaminated record in one system can propagate to the other during routine synchronisation cycles that run every 72 hours.
The Stadtarchiv, which holds civil registration documents, city council minutes, and photographic collections stretching back to the mid-nineteenth century, paused its synchronisation feed to ETH-Bibliothek on Monday, 30 June, as a precautionary step. That pause is expected to remain in place until at least 11 July, according to a technical notice posted to the archive's public portal this week.
Staff at both institutions are working with a Python-based deduplication tool developed initially for the Zentralbibliothek Zürich on Zähringerplatz, which ran a smaller but structurally similar cleanup operation on its newspaper microfilm catalogue in late 2024. That earlier project took eleven weeks to resolve approximately 900 conflicting records, giving archivists a benchmark — and a warning — about how labour-intensive the current, larger effort is likely to be.
What Comes Next for Researchers and the Public
For anyone using the online portals right now, the practical advice is straightforward: if a permalink to an archival image returns a 404 error or redirects unexpectedly, the record is likely among those under active remediation. Both institutions are asking users to report broken links through their standard helpdesk channels rather than assuming material has been withdrawn or destroyed.
The longer-term fix involves adopting the IIIF Presentation API version 3.0 standard across both repositories by the end of 2026, a deadline that was already on the roadmap but has now gained sharper urgency. IIIF — the International Image Interoperability Framework — provides a structured way to assign and validate unique manifests for digitised objects, making the kind of duplication currently causing headaches structurally much harder to replicate.
Zurich's situation is not unique in Switzerland. The Swiss Federal Archives in Bern flagged comparable metadata integrity problems in a published review last year, noting that rapid digitisation drives during the pandemic years created conditions in which quality controls were sometimes compressed. The current effort in Zurich is, in that sense, part of a national reckoning with the costs of speed-over-accuracy digitisation decisions made between 2020 and 2022.
Archivists at both Neumarkt and Rämistrasse say they expect a full remediation report to be published by September. Until then, the synchronisation pause between the two institutions' systems means that new acquisitions will continue to be catalogued locally but will not be visible to cross-institutional searches — a limitation that, for now, researchers simply have to work around.