Thousands of duplicate images are clogging the digital archives of Zurich's public institutions, driving up storage costs, slowing research workflows and threatening the integrity of the city's cultural record. The problem, long treated as a minor administrative nuisance, has gained new urgency as institutions push toward fully digitised collections by the end of 2027.
The timing matters. Zurich Stadtarchiv, located near the Rathaus on the right bank of the Limmat, completed a major scanning drive in late 2025 that added hundreds of thousands of images to its public-facing portal. Digital preservation teams discovered that a significant share of newly ingested files duplicated material already in the system — sometimes three or four versions of the same photograph, scanned at different resolutions by different departments over the years. The result is bloated repositories, inconsistent metadata and real money spent on redundant cloud storage contracts.
What the Experts Are Saying
Specialists in digital preservation at ETH Zurich's main campus on Rämistrasse have been studying the problem across Swiss institutions for several years. Researchers there argue that the issue is structural, not accidental: organisations typically lack a single ingest protocol, so images enter archives through multiple doors — donated collections, internal photography teams, scanned print runs — without any automated deduplication step at the point of entry. The longer institutions wait, the more expensive the cleanup becomes, both in computing time and in human curatorial hours needed to verify which version of a duplicate is canonical.
The Zentralbibliothek Zürich on Zähringerplatz, one of the largest research libraries in the German-speaking world with holdings dating to the 16th century, has acknowledged the challenge in its own digitisation roadmap. Librarians there have publicly discussed adopting perceptual hash algorithms — software tools that identify visually near-identical images even when file names or metadata differ — as part of a broader platform upgrade planned for 2026. The library manages more than 10 million digital objects, a figure that makes manual deduplication effectively impossible.
Private-sector voices have joined the conversation. Zurich's cluster of data engineering firms based in the Zürich West district, many of them working with pharma clients in Basel and financial institutions in the Paradeplatz area, have pitched AI-assisted deduplication tools to public-sector clients. Industry practitioners argue that machine-learning pipelines can cut duplicate rates in large archives by more than 60 percent in a first pass, though they caution that human review remains essential for historically significant images where two similar photographs may document genuinely distinct moments.
What Comes Next
The Stadt Zürich's Departement der Industriellen Betriebe, which oversees parts of the city's digital infrastructure, is expected to publish updated data governance guidelines before the end of the third quarter of 2026. Archive professionals are watching those guidelines closely, hoping they will mandate deduplication standards across city-linked institutions rather than leaving each body to devise its own approach.
For researchers and members of the public who rely on Zurich's online collections — from historians using the Stadtarchiv portal to journalists pulling images from the Zentralbibliothek's e-manuscripta platform — the practical advice right now is straightforward: cross-reference any image retrieved with at least one secondary catalogue entry before treating it as a unique source. Duplicate entries often carry conflicting dates or attribution notes, and relying on one version without checking the others can introduce errors into published work.
The broader lesson, archivists and data specialists broadly agree, is that deduplication cannot be retrofitted cheaply after a collection scales. Every month that ingestion continues without a deduplication gate adds to a backlog that becomes progressively harder and more expensive to clear. For a city that has staked considerable civic pride on the quality and accessibility of its historical record, getting this right is not optional.