Roughly one in every five digital images stored across Zurich's major public and academic archives is a duplicate. That figure, drawn from internal assessments conducted by archival technology specialists working with Swiss federal institutions, points to a problem that has compounded quietly for more than a decade — and is now forcing a reckoning with how the city manages its exploding volume of visual data.
The issue matters acutely right now because several Zurich institutions are mid-cycle on major digitisation contracts. The Stadtarchiv Zürich on Alfred-Escher-Strasse has been expanding its digital holdings under a rolling programme linked to the city's broader Smart City Zurich strategy. ETH Zurich's image and research media repositories — used by roughly 22,000 students and thousands of researchers — have grown by an estimated 30 percent since 2022, according to figures cited in the university's own infrastructure planning documents. When duplicates proliferate at that scale, every percentage point of redundancy translates directly into wasted server capacity and inflated cataloguing hours.
What Duplication Actually Costs
Storage is cheap — until it isn't. Enterprise-grade archival storage in Switzerland runs at approximately CHF 80 to CHF 120 per terabyte per year for institutions operating on-premises infrastructure, based on pricing benchmarks published by Swiss IT procurement bodies. Cloud migration, which several Zurich cultural institutions are currently evaluating, adds licensing and compliance costs on top. For a mid-sized archive sitting on 500 terabytes of image data, eliminating even a 20 percent duplication rate frees up 100 terabytes — a saving of up to CHF 12,000 annually on storage alone, before accounting for the staff hours spent manually tagging, retrieving or reconciling near-identical files.
The Zentralbibliothek Zürich on Zähringerplatz, one of the city's largest public libraries and a significant holder of digitised historical photographic collections, has been piloting automated deduplication software since early 2025 as part of a collaborative project with the Swiss National Library in Bern. Early results from that pilot suggested that near-duplicate images — photographs taken in rapid succession, or the same scan saved under multiple formats — accounted for a disproportionately large share of storage consumption compared to genuinely unique assets.
The technical challenge is that simple file-hash comparison catches exact copies but misses perceptual duplicates: the same photograph saved as both a TIFF and a JPEG at different resolutions, or a scanned document where a slightly skewed re-scan creates a technically distinct file. Perceptual hashing algorithms, now standard in commercial digital asset management platforms, address this — but retrofitting them to legacy cataloguing systems built in the 2000s requires both budget and institutional will.
The Human and Legal Dimension
Beyond storage bills, duplication creates legal exposure. Switzerland's revised Copyright Act, which came into force in April 2020, places stricter obligations on institutions to accurately track usage rights attached to specific image assets. When the same photograph exists in an archive as six slightly different files under different catalogue numbers, determining which version carries the correctly documented licence becomes a genuine compliance risk. Institutions that cannot demonstrate clean provenance chains face potential claims from rights holders.
For Zurich, which positions itself as a hub for fintech, pharmaceutical data management and academic research, the credibility of its digital infrastructure matters commercially as well as culturally. The Technopark Zürich on Technoparkstrasse hosts dozens of data-focused startups whose pitch to international clients rests partly on Swiss precision and reliability. Archival disorder in public institutions, while unglamorous, cuts against that brand.
Institutions currently wrestling with legacy duplication problems should prioritise a full perceptual-hash audit before committing to new cloud storage contracts. The Stadtarchiv's Alfred-Escher-Strasse facility and the Zentralbibliothek pilot both offer models for how phased deduplication can be integrated without disrupting public access. Budget cycles for 2027 are being set now across Zurich's cantonal departments — and the window to embed deduplication tooling into those plans closes before the end of this calendar year.