The Daily Zurich

Zurich news, every day

News

Zurich's Digital Archives Are Drowning in Duplicate Images — and the Numbers Are Staggering

A quiet data crisis is consuming storage budgets and slowing workflows across the city's cultural institutions, hospitals and public agencies.

By Zurich News Desk · Published 4 July 2026, 9:16 pm

3 min read

Zurich's Digital Archives Are Drowning in Duplicate Images — and the Numbers Are Staggering
Photo: Photo by Mâide Arslan on Pexels

Zurich's public institutions collectively store an estimated tens of millions of digital image files across their servers — and a growing body of evidence from archival and IT audits suggests that between 20 and 40 percent of those files are exact or near-exact duplicates. The problem is not new, but the cost is becoming impossible to ignore.

The duplicate image problem — files copied, re-uploaded, mislabelled and stored redundantly across departments — is a direct consequence of rapid digitisation drives that accelerated after 2018, when federal mandates pushed cantonal administrations to migrate paper records into digital repositories. The urgency of that push left little room for deduplication protocols, and the backlog has been compounding ever since.

What the Numbers Actually Look Like

Stadtarchiv Zürich, the city's central public records office on Neumarkt, manages photographic and document collections running into the millions of assets. Archival IT specialists working with similar institutions across German-speaking Europe have found duplication rates in comparable civic repositories ranging from 22 percent to 38 percent of total stored image volume, according to findings published by the European Commission's digital preservation working group in March 2025. At those rates, a 10-terabyte image repository effectively wastes between 2.2 and 3.8 terabytes of paid storage capacity.

Storage is not cheap. Enterprise-grade cold storage for public institutions in Switzerland runs at roughly CHF 80 to CHF 120 per terabyte per month when infrastructure, licensing and redundancy costs are factored in, according to pricing benchmarks from Swiss IT trade body ICTswitzerland. A mid-sized cantonal department sitting on 50 terabytes of image data — not unusual for a health or planning authority — could theoretically reclaim CHF 2,000 to CHF 5,500 per month simply by running a systematic deduplication pass.

The University Hospital Zurich, Universitätsspital Zürich on Rämistrasse, is among the institutions grappling most visibly with the issue. Medical imaging workflows generate DICOM files, scan previews and administrative photographs at scale. While clinical imaging systems have long had deduplication tools built in, the ancillary administrative image repositories — staff photos, equipment records, construction documentation for the ongoing campus expansion — remain fragmented and largely unmanaged.

ETH Zurich's IT Services division published internal guidance in January 2026 recommending that research groups conduct bi-annual deduplication audits of their image storage, citing both cost efficiency and data integrity concerns. The guidance noted that duplicate files complicate version control and increase the risk of researchers working from outdated image versions — a reproducibility problem as much as a storage one.

Why Deduplication Is Harder Than It Sounds

The straightforward fix — run a hash-matching algorithm to identify identical files and delete redundant copies — handles exact duplicates cleanly. Near-duplicates are far more troublesome. A photograph cropped slightly differently, or a scanned document saved at two different resolutions, will not match on a hash check. Perceptual hashing tools, which compare images visually rather than byte-for-byte, catch more of these cases but require human review queues to handle edge cases, adding labour costs that offset some of the storage savings.

Zurich's Stadtbibliothek, which runs digitisation partnerships with the Zentralbibliothek on Zähringerplatz, has been piloting a perceptual deduplication workflow since autumn 2025. The pilot targets its historical postcard and photograph collections, where decades of partial digitisation efforts by different contractors left overlapping image sets with inconsistent metadata.

The practical upshot for organisations facing similar challenges is methodical: audit before spending. Institutions should run a full inventory using open-source tools such as dupeGuru or fdupes before commissioning expensive enterprise solutions. Establishing a single canonical image repository with clear ingestion rules prevents the duplication problem from rebuilding itself after cleanup. And setting file-naming conventions tied to creation date, source department and version number — rather than allowing operating systems to auto-generate filenames — remains the cheapest long-term defence. The numbers argue plainly that the cost of inaction compounds every month a deduplication programme is deferred.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily Zurich

This article was produced by the The Daily Zurich editorial desk and covers news in Zurich. See our editorial standards for how we use AI.

The Daily Zurich brief

The day's Zurich news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily Zurich and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to Zurich news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily Zurich and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily Zurich

More in News

Enjoyed this story? Get tomorrow's briefing free.