The Daily Zurich

Zurich news, every day

News

Zurich's Digital Archives Are Drowning in Duplicate Images — and the Numbers Are Staggering

A quiet data crisis in Zurich's public and institutional image libraries is costing storage budgets, distorting search results, and demanding a systematic fix.

By Zurich News Desk · Published 4 July 2026, 8:44 pm

3 min read

Zurich's Digital Archives Are Drowning in Duplicate Images — and the Numbers Are Staggering
Photo: Photo by Mâide Arslan on Pexels

Somewhere between a third and half of all images stored across Zurich's major institutional digital repositories are duplicates — redundant files eating storage space, inflating licensing costs, and undermining the accuracy of digital search tools. That is the working estimate emerging from an ongoing audit coordinated through the Stadtarchiv Zürich, the city's official record-keeping body on Neumarkt, which began a structured review of its digital asset management systems in early 2026.

The timing is not accidental. Zurich's public institutions have spent the past four years accelerating digitisation programs — partly in response to pandemic-era closures, partly driven by federal mandates under the Swiss national digital strategy. The result has been a rapid accumulation of image files with minimal deduplication discipline. What began as an archiving convenience has quietly metastasised into a measurable financial and administrative problem.

The Scale of the Problem in Zurich's Institutions

The Stadtarchiv holds tens of thousands of digitised photographic records spanning more than a century of the city's built environment. But the duplication problem is not confined to historical collections. ETH Zurich's library services on Rämistrasse, which manages one of the largest research image databases in German-speaking Europe, identified in a 2025 internal efficiency review that redundant image files were consuming a disproportionate share of allocated server capacity. Precise figures from that review have not been published, but the library has since begun piloting automated hash-based deduplication software as part of its infrastructure renewal cycle.

At the Zentralbibliothek Zürich on Zähringerplatz, staff have faced a related but distinct variant of the issue: multiple scans of the same physical document, made at different resolutions and by different operators over time, which then proliferate across backup systems and cloud mirrors. Each redundant file carries a storage cost. Cloud storage pricing for institutional contracts in Switzerland runs to approximately CHF 0.02–0.04 per gigabyte per month depending on the provider and redundancy tier — a figure that sounds trivial until multiplied across hundreds of thousands of duplicate files stored for years.

The problem has a commercial dimension too. Organisations that license image collections — including Zurich's tourism promotion body Zürich Tourismus and several of the city's larger pharmaceutical communications teams operating out of the Zurich North corridor around Oerlikon — risk paying duplicate licensing fees when the same image appears under different file names or metadata tags in procurement databases. Industry estimates in European digital asset management literature suggest between 15 and 25 percent of enterprise image licensing spend in organisations without active deduplication protocols may be redundant.

What Deduplication Actually Requires — and What It Costs

Solving duplicate image accumulation is not simply a matter of running a cleanup script. The standard technical approach involves perceptual hashing — algorithms that identify visually identical or near-identical images even when file names, formats, or metadata differ. Commercial platforms offering this capability for institutional use typically price enterprise licences in the range of CHF 8,000–25,000 annually for mid-sized repositories, based on current Swiss market pricing from vendors active in the DACH region.

Human review remains unavoidable for ambiguous cases, particularly in historical archives where two photographs that appear nearly identical may represent meaningfully distinct moments. The Stadtarchiv's methodology, according to its published 2025–2027 strategic plan, allocates a portion of its digitisation budget specifically to data quality review — though the document does not itemise deduplication as a separate line.

For Zurich's institutions, the practical path forward involves three steps that archivists and digital librarians broadly agree on: establish a consistent file naming and metadata standard before ingesting new material, run perceptual hash audits on existing collections to flag duplicates for human review, and integrate deduplication checks into upload workflows so the problem does not rebuild itself. The Stadtarchiv has indicated it expects its current audit phase to conclude by the end of 2026. The Zentralbibliothek's pilot deduplication project is scheduled for evaluation in the fourth quarter of this year. Neither institution will get the storage budget or the search quality they need until the backlog is cleared — and the numbers show that backlog is larger than most administrators previously assumed.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily Zurich

This article was produced by the The Daily Zurich editorial desk and covers news in Zurich. See our editorial standards for how we use AI.

The Daily Zurich brief

The day's Zurich news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily Zurich and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to Zurich news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily Zurich and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily Zurich

More in News

Enjoyed this story? Get tomorrow's briefing free.