Swiss cultural and research institutions hold tens of millions of digitised images across their combined archives — and a growing body of evidence suggests that anywhere between 15 and 30 percent of those files are exact or near-exact duplicates taking up server space, inflating storage costs, and complicating search and retrieval. The problem is not new. What is new is the scale at which it is now being measured.
The issue has gained urgency in 2026 partly because Switzerland's Federal Office of Culture, based in Bern, extended its digitisation funding framework through 2028, releasing a fresh tranche of grants to cantonal institutions. That influx of newly scanned material is landing in repositories that are already struggling with redundancy inherited from earlier, less coordinated digitisation campaigns. For Zurich — home to more major archival institutions than almost any other Swiss city — the timing is uncomfortable.
What the Data Actually Shows
ETH Zurich's library, the ETH-Bibliothek on Rämistrasse, manages one of the largest openly accessible image collections in the German-speaking world. Internal audits at comparable European research libraries — including the Bavarian State Library in Munich and the Bibliothèque nationale de France — have found duplicate rates in large-scale digitisation projects running between 12 and 28 percent when cross-collection matching is applied. Storage costs for institutional-grade archival servers in Switzerland currently run at roughly CHF 0.04 to CHF 0.08 per gigabyte per month depending on redundancy tier. For a mid-sized cantonal archive storing 200 terabytes, even a 20 percent duplication rate translates to tens of thousands of francs in avoidable annual expenditure.
The Stadtarchiv Zürich, located in the Rathaus complex near the Limmat, and the Zentralbibliothek Zürich on Zähringerplatz are both participants in the Swiss national aggregation platform Helveticat and feed records into the broader Europeana network. Every duplicate image that enters those pipelines gets indexed, crosslinked, and in some cases licensed — creating downstream confusion for researchers, journalists, and the general public searching for authoritative versions of historical photographs or city planning documents.
Duplicate detection is not a simple binary. Perceptual hashing tools — software that compares visual fingerprints rather than raw file data — can identify near-duplicates that differ only in resolution, crop, or compression level. When the Kantonsbibliothek Graubünden piloted such a tool on a subset of its postcard collection in 2024, it found that roughly one in five images in that subset matched another file already in the system. Extrapolating that ratio is imprecise, but the directional signal is consistent across institutions that have run similar audits.
Why Zurich Institutions Face Particular Pressure
Zurich's archival infrastructure expanded rapidly during the 2015–2022 period, when several city departments digitised physical records in parallel, with limited central coordination. The Bauarchiv, the Fotoarchiv of the Stadtarchiv, and private collections donated to the Schweizerisches Sozialarchiv on Stadelhoferstrasse each ran largely independent workflows. That created the conditions for structural duplication: the same photograph of, say, Bellevueplatz in the 1960s might exist in three separate repositories under three different identifiers, each assigned its own metadata record and storage allocation.
The Swiss memory institutions consortium mem-o-ria, which coordinates best-practice guidelines for the sector, has flagged deduplication as a priority action item for the current funding cycle. The consortium's position, outlined in its 2025 annual report, calls for shared tooling and harmonised metadata schemas as preconditions for effective cross-institutional deduplication — neither of which currently exists in a standardised form across Zurich's main archives.
For institutions planning their next storage procurement — typically on three-year hardware refresh cycles — the practical advice from the sector is straightforward: run a perceptual hash audit before signing the next server contract, not after. For the Stadtarchiv and the Zentralbibliothek, whose next budget submissions to the city council are expected in autumn 2026, the window to make that case with clean numbers is right now.