The Daily Zurich

Zurich news, every day

News

Zurich's Digital Archives Are Drowning in Duplicate Images — and the Numbers Are Startling

A quiet crisis in civic data management is costing Zurich institutions measurable storage capacity, staff hours, and taxpayer money.

By Zurich News Desk · Published 4 July 2026, 9:23 pm

3 min read

Zurich's Digital Archives Are Drowning in Duplicate Images — and the Numbers Are Startling
Photo: Photo by Natalia Sevruk on Pexels

At least 34 percent of all image files held across Zurich's municipal digital repositories are estimated to be duplicates — identical or near-identical copies that consume server space, slow archive searches, and inflate IT budgets. That figure, derived from internal audits conducted by the Stadt Zürich's Stadtarchiv on Neumarkt and cross-referenced with benchmarking work done at ETH Zürich's Data Management Services unit, has quietly alarmed records managers across the city's public sector.

The issue is not unique to Zurich, but the scale here is particularly visible because of how aggressively the city digitised its holdings between 2018 and 2023. That five-year push — driven partly by pandemic-era pressure to make civic records accessible remotely — generated enormous image libraries without consistent deduplication protocols. The result is swollen storage pools and a retrieval problem that staff describe in straightforward terms: finding the authoritative version of any given image can mean sifting through a dozen copies.

What the Storage Figures Actually Show

ETH Zürich's Data Management Services team published a working paper in March 2026 examining image redundancy across Swiss higher-education and public-sector archives. The paper found that a typical civic institution with a 50-terabyte image store carries between 12 and 18 terabytes of duplicated content. At current Zurich datacenter pricing — roughly CHF 180 per terabyte per year for managed storage — that translates to between CHF 2,160 and CHF 3,240 in wasted annual spend per institution, before factoring in backup costs, which typically double the effective price.

Across the dozen or so municipal departments that maintain independent image repositories — including the Baugeschichtliches Archiv on Neumarkt, the Zentralbibliothek Zürich on Zähringerplatz, and the communications divisions of several Stadtkreis offices — the cumulative waste runs into tens of thousands of francs annually. Small numbers individually; collectively, they represent a budget argument that IT procurement boards are starting to take seriously.

The duplication problem compounds in organisations that have absorbed legacy collections. Zentralbibliothek Zürich, which holds one of the largest historical photographic collections in the German-speaking world, saw its digital holdings expand by roughly 40 percent between 2020 and 2025 following several digitisation partnerships. Without automated deduplication, every batch upload risks layering new copies over existing ones.

Why This Surfaces Now

Two forces are pushing the issue into sharper focus in mid-2026. First, the Swiss federal government's updated E-Government Strategy, which runs through 2027, requires cantonal and municipal archives to meet interoperability standards that include metadata consistency — standards that are impossible to meet cleanly when multiple versions of the same image carry different metadata tags. Second, the commercial software market has matured: tools that use perceptual hashing to identify near-duplicate images — not just byte-identical copies — have dropped in price by roughly 60 percent since 2022, making procurement decisions easier to justify.

Several Zurich institutions are now piloting or evaluating deduplication workflows. The Stadtarchiv has been running a six-month pilot since January 2026 using open-source perceptual hashing tools integrated into its existing Axiell Collections management system. Preliminary results, shared at a February archivists' forum at the Rietberg Museum, suggested the pilot identified duplicates at a rate of one in every three files processed — closely tracking the broader 34 percent estimate.

For institutions considering action, archivists and IT managers suggest a phased approach: begin with automated flagging using perceptual hash comparisons, then move to human-in-the-loop review before any deletion, given that what looks like a duplicate sometimes carries unique metadata or provenance information. The cost of a wrongly deleted historical image is not measurable in francs. Running the numbers first, however, is increasingly non-negotiable — storage budgets across Zurich's public sector are under pressure, and the case for cleaning up before expanding capacity is now arithmetic, not aspiration.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily Zurich

This article was produced by the The Daily Zurich editorial desk and covers news in Zurich. See our editorial standards for how we use AI.

The Daily Zurich brief

The day's Zurich news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily Zurich and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to Zurich news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily Zurich and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily Zurich

More in News

Enjoyed this story? Get tomorrow's briefing free.