Zurich's Digital Archives Are Drowning in Duplicate Images — and the Numbers Are Startling
A quiet crisis in civic data management is costing Zurich institutions measurable storage capacity, staff hours, and taxpayer money.
A quiet crisis in civic data management is costing Zurich institutions measurable storage capacity, staff hours, and taxpayer money.

At least 34 percent of all image files held across Zurich's municipal digital repositories are estimated to be duplicates — identical or near-identical copies that consume server space, slow archive searches, and inflate IT budgets. That figure, derived from internal audits conducted by the Stadt Zürich's Stadtarchiv on Neumarkt and cross-referenced with benchmarking work done at ETH Zürich's Data Management Services unit, has quietly alarmed records managers across the city's public sector.
The issue is not unique to Zurich, but the scale here is particularly visible because of how aggressively the city digitised its holdings between 2018 and 2023. That five-year push — driven partly by pandemic-era pressure to make civic records accessible remotely — generated enormous image libraries without consistent deduplication protocols. The result is swollen storage pools and a retrieval problem that staff describe in straightforward terms: finding the authoritative version of any given image can mean sifting through a dozen copies.
ETH Zürich's Data Management Services team published a working paper in March 2026 examining image redundancy across Swiss higher-education and public-sector archives. The paper found that a typical civic institution with a 50-terabyte image store carries between 12 and 18 terabytes of duplicated content. At current Zurich datacenter pricing — roughly CHF 180 per terabyte per year for managed storage — that translates to between CHF 2,160 and CHF 3,240 in wasted annual spend per institution, before factoring in backup costs, which typically double the effective price.
Across the dozen or so municipal departments that maintain independent image repositories — including the Baugeschichtliches Archiv on Neumarkt, the Zentralbibliothek Zürich on Zähringerplatz, and the communications divisions of several Stadtkreis offices — the cumulative waste runs into tens of thousands of francs annually. Small numbers individually; collectively, they represent a budget argument that IT procurement boards are starting to take seriously.
The duplication problem compounds in organisations that have absorbed legacy collections. Zentralbibliothek Zürich, which holds one of the largest historical photographic collections in the German-speaking world, saw its digital holdings expand by roughly 40 percent between 2020 and 2025 following several digitisation partnerships. Without automated deduplication, every batch upload risks layering new copies over existing ones.
Two forces are pushing the issue into sharper focus in mid-2026. First, the Swiss federal government's updated E-Government Strategy, which runs through 2027, requires cantonal and municipal archives to meet interoperability standards that include metadata consistency — standards that are impossible to meet cleanly when multiple versions of the same image carry different metadata tags. Second, the commercial software market has matured: tools that use perceptual hashing to identify near-duplicate images — not just byte-identical copies — have dropped in price by roughly 60 percent since 2022, making procurement decisions easier to justify.
Several Zurich institutions are now piloting or evaluating deduplication workflows. The Stadtarchiv has been running a six-month pilot since January 2026 using open-source perceptual hashing tools integrated into its existing Axiell Collections management system. Preliminary results, shared at a February archivists' forum at the Rietberg Museum, suggested the pilot identified duplicates at a rate of one in every three files processed — closely tracking the broader 34 percent estimate.
For institutions considering action, archivists and IT managers suggest a phased approach: begin with automated flagging using perceptual hash comparisons, then move to human-in-the-loop review before any deletion, given that what looks like a duplicate sometimes carries unique metadata or provenance information. The cost of a wrongly deleted historical image is not measurable in francs. Running the numbers first, however, is increasingly non-negotiable — storage budgets across Zurich's public sector are under pressure, and the case for cleaning up before expanding capacity is now arithmetic, not aspiration.
How does this story make you feel?
Spread the word
About this article
Published by The Daily Zurich
Daily brief
Free, in your inbox before 7am. Weekdays.
More in News