Switzerland's federal data infrastructure has a clutter problem. Across Zurich's public-sector digital systems — from the Staatsarchiv on Winkelriedstrasse to the sprawling image databases maintained by ETH Zurich's library services — duplicate image files now account for an estimated 18 to 23 percent of total stored visual data, according to benchmarks published by the European Digital Preservation Coalition in its 2025 annual report. That fraction sounds modest. At scale, it translates into hundreds of terabytes of redundant storage that institutions are actively paying to maintain.
The timing matters. The Canton of Zurich's IT directorate, Amt für Informatik, has been consolidating legacy systems since early 2025 as part of a broader digital modernisation push tied to the cantonal budget cycle running through 2027. Duplicate image data — created when archivists, researchers, and administrative staff upload the same scanned document or photograph multiple times across disconnected platforms — has emerged as one of the most persistent and least glamorous obstacles in that effort.
What the Numbers Actually Show
Storage is not cheap in Switzerland. Enterprise-grade data hosting in the Swiss market runs between CHF 0.04 and CHF 0.12 per gigabyte per month, depending on redundancy and compliance requirements — figures consistent with pricing published by Swiss cloud infrastructure providers including Swisscom and Init7. For a mid-sized cantonal archive holding 500 terabytes of image data, with one-fifth of it duplicated, the unnecessary monthly overhead can reach CHF 40,000 or more. Annualised, that figure clears CHF 480,000 — money that could otherwise fund digitisation of physical collections still sitting in acid-free boxes at the Zentralbibliothek Zürich on Zähringerplatz.
ETH Zurich's Research Collection, one of the largest open-access institutional repositories in the German-speaking world, processed more than 1.2 million file uploads in 2024 according to its published repository statistics. Perduplication rates in academic repositories globally tend to run between 12 and 31 percent for image-heavy collections, according to a 2024 study in the Journal of Digital Curation. Applied conservatively to ETH's volume, that implies well over 100,000 redundant image files may exist within a single repository at any given time.
The problem compounds when multiple departments work in parallel. The city of Zurich's own Stadtarchiv, based in the Neumarkt district, operates separately from cantonal infrastructure. Cross-system deduplication — matching a file held in one database against a copy in another — requires either shared metadata standards or active hash-comparison tools that most institutions have not yet deployed at scale.
Detection Tools and What Comes Next
Several Swiss institutions have begun piloting perceptual hashing software, which generates a compact numerical fingerprint for each image and flags near-identical files even when filenames differ. The approach differs from simple checksum matching in that it can catch duplicates that have been lightly compressed, resized, or had metadata stripped — the most common real-world scenarios. Zurich-based IT consultancy Netcetera, headquartered in Altstetten, has been involved in digital archiving infrastructure projects for Swiss public-sector clients, though the specifics of individual contracts are not publicly disclosed.
At the federal level, the Swiss Federal Archives in Bern updated its file-format recommendations in March 2025 to include stronger guidance on duplicate prevention during ingest workflows — a signal that Bern considers the problem serious enough to address at the policy layer, not just the technical one.
For Zurich's institutions, the practical path forward involves three concrete steps: adopting shared metadata schemas that make cross-system matching possible, scheduling quarterly deduplication audits tied to the existing IT budget review cycle, and training archival staff on upload protocols that check for existing copies before adding new ones. None of this is technically complex. The barrier is coordination across organisations that have historically operated their image databases as self-contained silos. The CHF numbers, at least, are now clear enough to make the case for changing that.