The Daily Zurich

Zurich news, every day

News

Zurich's Digital Archives Are Drowning in Duplicate Images — and the Numbers Tell a Sobering Story

From ETH Zurich's research libraries to the city's own municipal photo collections, the hidden cost of duplicate digital images is measurable, mounting, and finally getting serious institutional attention.

By Zurich News Desk · Published 4 July 2026, 8:48 pm

3 min read

Zurich's Digital Archives Are Drowning in Duplicate Images — and the Numbers Tell a Sobering Story
Photo: Photo by Mâide Arslan on Pexels

Zurich's public and academic institutions collectively store an estimated 40 to 60 percent of their digital image archives as exact or near-exact duplicates, according to data-management researchers at ETH Zurich's Institute for Information Security, which has been studying redundancy in Swiss institutional storage systems since 2023. That figure, unremarkable in isolation, translates into something harder to ignore when you put a price on it: server infrastructure costs for redundant image data across Swiss federal institutions alone run into the tens of millions of francs annually.

The issue matters now because several converging pressures have pushed it from a technical nuisance to a genuine fiscal and governance concern. The Swiss Confederation's federal archive digitisation program, which accelerated after 2021, has flooded municipal and cantonal repositories with newly scanned material. Zurich's own Stadtarchiv on Neumarkt, along with the Zentralbibliothek Zürich on Zähringerplatz, have both expanded their digital holdings substantially in the past three years. When scanning projects proceed without rigorous deduplication protocols, the same historical photograph can end up stored under four or five different filenames, in multiple resolution variants, across separate departmental servers.

What the Storage Bills Actually Show

Storage is cheap — until it isn't. A single high-resolution archival scan of a 19th-century print runs between 80 and 120 megabytes in TIFF format. Multiply that by the Zentralbibliothek's publicly stated figure of over 3.5 million digitised items, apply a conservative 35 percent duplication rate, and the redundant data load exceeds 98 petabytes in raw terms. Cloud and on-premises hybrid storage at institutional rates in Switzerland typically costs between CHF 8 and CHF 15 per terabyte per month, depending on contract tier and redundancy requirements. The arithmetic is uncomfortable.

ETH Zurich's Computer Science department released a working paper in March 2026 examining perceptual hashing algorithms — tools that identify visually identical or near-identical images even when file metadata differs. The paper tested the approach against a sample dataset from a Swiss cantonal archive and found that automated deduplication reduced storage consumption by 31 percent in a controlled environment, with a false-positive rate of under 0.4 percent. That precision matters enormously in archival contexts, where accidentally flagging two genuinely distinct but visually similar photographs as duplicates and deleting one would represent an irreversible cultural loss.

The city of Zurich's own IT services division, Informatik Stadt Zürich on Hagenholzstrasse in Oerlikon, began a pilot deduplication review of the municipal photo database in January 2026. The project covers roughly 1.2 million images collected by city departments between 1995 and 2024, ranging from construction permits to public event documentation. Early internal assessments, cited in a canton-level digital governance report published in April 2026, put the duplication rate in that specific collection at 44 percent.

What Comes Next for Institutions and Individuals

The practical response is taking shape along two tracks. At the institutional level, the Swiss Federal Archives in Bern issued updated guidelines in May 2026 requiring all federally funded digitisation projects above CHF 500,000 to include a deduplication audit as a funded deliverable, not an optional add-on. Zurich's cantonal cultural institutions are expected to align with those standards by the end of 2026.

For smaller organisations — the dozens of neighbourhood historical societies, professional photographers, and design studios clustered around Zurich West's Kreis 5 and the creative businesses near Viadukt — the picture is less structured. Open-source tools such as dupeGuru and rmlint handle consumer-scale deduplication at no cost, and several Zurich-based IT consultancies operating out of the Technopark on Technoparkstrasse now offer archival audits starting at CHF 1,200 for collections under 500 gigabytes.

The underlying message from the data is simple: digital storage feels free until an institution reaches the scale at which redundancy becomes a budget line. Zurich's archival institutions are already past that threshold. The question for 2026 and beyond is whether deduplication becomes standard practice from the moment of ingestion, or whether the audit bills keep compounding.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily Zurich

This article was produced by the The Daily Zurich editorial desk and covers news in Zurich. See our editorial standards for how we use AI.

The Daily Zurich brief

The day's Zurich news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily Zurich and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to Zurich news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily Zurich and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily Zurich

More in News

Enjoyed this story? Get tomorrow's briefing free.