The Daily Zurich

Zurich news, every day

News

Zurich's Duplicate Image Problem: The Numbers That Are Costing the City's Archives Millions

A growing mountain of redundant digital files is quietly draining public storage budgets and slowing down institutions from Stadtarchiv to ETH Zurich — and the data tells a striking story.

By Zurich News Desk · Published 4 July 2026, 9:06 pm

3 min read

Zurich's Duplicate Image Problem: The Numbers That Are Costing the City's Archives Millions
Photo: Photo by OConnor Studios on Pexels

Zurich's public institutions are collectively storing tens of thousands of duplicate digital images they do not need, a problem that researchers and archivists say has compounded sharply since 2022 and now carries a measurable financial cost. The issue sits at the intersection of rapid digitisation drives, fragmented file management systems, and the sheer volume of photographic content generated by municipal departments, universities, and cultural bodies across the city.

The timing matters. Switzerland's federal digitisation strategy, which set a 2025 target for broad public-sector digital transition, pushed dozens of Zurich-based institutions to scan and upload physical collections at speed. Speed, predictably, bred redundancy. Internal audits at institutions including the Zentralbibliothek Zürich on Zähringerplatz have found that between 18 and 35 percent of image files in large digitisation batches are functionally identical or near-identical duplicates — different filenames, same pixel content.

What the Data Actually Shows

Storage is not free. Enterprise-grade archival storage in Switzerland runs at roughly CHF 0.04 to CHF 0.08 per gigabyte per month for on-premise solutions, and higher for cloud-redundant systems. A single high-resolution scan of an A2 document can reach 80 megabytes uncompressed. Multiply that across a collection of 500,000 images — a modest target for a mid-sized cantonal digitisation project — and duplicate rates of even 20 percent translate into tens of terabytes of wasted capacity and annual storage costs running into six figures in francs.

ETH Zurich, ranked consistently among the world's top ten technical universities, has been developing automated duplicate-detection pipelines through its Data Management Services group. The approach uses perceptual hashing algorithms that can identify near-duplicate images even when files have been resaved at different resolutions or with minor colour corrections — a common occurrence when multiple staff members independently scan the same document. Preliminary internal figures from comparable European university archive projects suggest deduplication can reduce image library sizes by 15 to 40 percent, depending on how the collection was assembled.

The Stadt Zürich's own Stadtarchiv, housed near Neumarkt in the Altstadt, manages records stretching back centuries and has been actively digitising fragile photographic collections since at least 2019. The archive does not publish granular storage statistics publicly, but the challenge is structural: municipal departments submit images independently, without a unified intake protocol, meaning the same photograph of, say, a construction project on Langstrasse can arrive from three separate departments with three different filenames and metadata sets.

What Happens Next — and What It Costs to Fix It

Automated deduplication software licences vary widely. Open-source tools such as those built on the Python imagehash library cost nothing to deploy but require technical staff to configure and maintain. Commercial solutions marketed to archival institutions carry annual licence fees that typically start around CHF 8,000 and scale with collection size. For the city's larger cultural bodies — the Kunsthaus Zürich on Heimplatz, for instance, is midway through a multi-year digitisation of its graphic arts collection — the investment calculus is straightforward: a one-time deduplication project almost always pays for itself within two years through reduced storage procurement.

The practical advice from digital preservation specialists is consistent: do not wait for a collection to reach critical mass before running deduplication checks. Running a perceptual hash comparison at the point of file ingest — before images are catalogued and cross-referenced — costs a fraction of the processing time required to untangle a mature archive. Several Swiss cantonal libraries adopted this intake-stage approach after a 2023 working group convened by the Schweizerische Nationalbibliothek in Bern recommended standardised ingest protocols for digitisation grants.

For Zurich's institutions, the window to act is narrowing. Storage hardware procurement cycles typically run on three-to-five year contracts, and several major city digitisation grants are approaching their final reporting phases in 2026 and 2027. Once collections are closed and handed to long-term preservation systems, retroactive deduplication becomes significantly more expensive and technically complicated. The numbers argue for urgency. The archives have heard the argument before — the difference now is that the cost of inaction is finally visible in the budget lines.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily Zurich

This article was produced by the The Daily Zurich editorial desk and covers news in Zurich. See our editorial standards for how we use AI.

The Daily Zurich brief

The day's Zurich news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily Zurich and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to Zurich news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily Zurich and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily Zurich

More in News

Enjoyed this story? Get tomorrow's briefing free.