Somewhere inside the digital holdings of Zurich's Stadtarchiv on Alfred-Escher-Strasse, the same photograph exists in multiple versions. Different file names. Different upload dates. Identical pixels. It is not a unique problem — but the scale, measured for the first time across several Swiss institutions this year, is larger than administrators expected.
A working report circulated among digital asset managers at ETH Zurich's library services division in the first quarter of 2026 found that duplicate or near-duplicate image files can account for between 18 and 34 percent of total stored image assets in large institutional repositories. That range, drawn from benchmark studies of European academic and municipal archives, has prompted renewed urgency in Zurich, where storage costs and indexing integrity are both under pressure.
The timing matters for several reasons. Swiss federal data-retention law requires public bodies to preserve digitised records for a minimum of ten years, which means redundant files compound over time. A file duplicated in 2016 and never cleaned up has now generated a decade of unnecessary backup cycles, version-control confusion and catalogue noise. For institutions already managing the reputational and operational complexity that followed the UBS-Credit Suisse merger — which forced a sweeping consolidation of overlapping digital document systems across the two banks' Swiss operations — the appetite to finally quantify the duplication problem has grown sharply.
The Numbers Driving the Conversation
Storage costs in Switzerland are not trivial. Enterprise-grade cloud storage provisioned through Swiss-domiciled providers — required under cantonal data-sovereignty guidelines for public bodies — runs at roughly CHF 0.03 to CHF 0.06 per gigabyte per month for cold-archive tiers. That sounds modest until the duplication rate is applied. An institution holding 40 terabytes of image assets and carrying a 25 percent duplication rate is paying for 10 terabytes of redundant data every single month. Annualised at the mid-range rate of CHF 0.045 per gigabyte, that is approximately CHF 54,000 per year in pure storage waste — before accounting for processing, bandwidth and manual curation labour.
Zurich Tourismus, which maintains one of the city's largest commercially active image libraries for destination marketing, declined to provide internal figures. But the organisation's digital asset workflows, described in a 2025 presentation at the Swiss Digital Communications Forum held at the Kongresshaus on Gotthardstrasse, pointed to the challenge of managing thousands of near-identical shoot variations from locations including Lindenhügel, the Niederdorf and the lakefront promenades. When a photographer delivers 400 bracketed exposures of the same Limmat view, only a fraction are unique images by any algorithmic definition.
Deduplication software has existed for years, but adoption in Swiss public institutions has been uneven. The technology works by generating a perceptual hash — a compact numerical fingerprint — for each image. Two photographs with hashes within a defined distance of each other are flagged as probable duplicates. Modern tools can process roughly 100,000 images per hour on mid-range server hardware. For an archive the size of the Stadtarchiv's digitised photographic collection, a full first-pass scan is theoretically completable in a single working week.
What Institutions Are Actually Doing About It
ETH Zurich's library and scientific IT units have been piloting a structured deduplication protocol since January 2026, targeting the research image repositories attached to several engineering and life-sciences departments. The pilot, scheduled to run through October, is designed to produce a replicable methodology that smaller cantonal institutions can adapt. Early internal results — not yet published — are understood to show duplication rates at the lower end of the benchmark range in curated scientific collections, and at the higher end in general administrative photo stores.
For organisations that want to act now, the practical path is straightforward: audit before deleting. Automated tools identify candidates; human review confirms which version is canonical before any file is removed. Provenance metadata — date, creator, original file path — must be preserved even when the image itself is retired. The Zurich-based data governance consultancy sector has seen growing demand for exactly this kind of structured remediation work through the first half of 2026, with several firms around the Escher-Wyss district reporting project inquiries up compared to the same period last year.
The deeper payoff is not just money saved on storage. Clean, deduplicated image libraries produce more accurate search results, reduce the risk of licensing errors when the same image is unknowingly relicensed multiple times, and make eventual migration to new platforms far less painful. For Zurich's institutions, the audit is overdue. The numbers make the case on their own.