Zurich's public institutions are sitting on a digital filing problem years in the making. Across municipal departments, ETH Zurich's research repositories, and the Zentralbibliothek on Zähringerplatz, tens of thousands of duplicate image files have accumulated in shared content management systems—redundant photographs, scanned documents, and research visuals stored multiple times under different file names. The question now is not whether to act, but which method to adopt and how quickly the city can align its fragmented digital governance structures to do so.
The issue has gained urgency in 2026 because several Zurich institutions are mid-cycle on major platform migrations. ETH Zurich's IT services division is partway through a transition affecting research data infrastructure, while the Stadtarchiv on Neumarkt is finalising a digitisation programme that began in 2023. Running duplicate-heavy archives into newly consolidated systems risks compounding errors, inflating storage costs, and degrading search reliability for researchers and civil servants alike. Getting deduplication right before migration closes is now a hard deadline, not a preference.
The Options on the Table
Three broad approaches are under active consideration across Zurich's institutional landscape. The first is automated hash-based deduplication—software that identifies byte-identical files and flags them for deletion. It is fast and cheap but misses near-duplicate images: the same photograph saved at different resolutions or with minor colour corrections. The Zentralbibliothek, which holds digitised collections including historical maps of the Limmatquai district, has reportedly been reluctant to rely solely on automated tools for precisely this reason, given the archival sensitivity of distinguishing one scan quality from another.
The second option is perceptual hashing, a more sophisticated technique that compares images by visual similarity rather than exact data matches. Tools built on this method can catch the near-duplicates that slip past standard automation. The cost is higher processing time and a need for human review at the flagged-duplicate stage—an overhead that smaller departments, including some units within the Stadtentwicklung Zürich planning office, may struggle to absorb without additional staffing or budget allocation.
The third path is a hybrid model: automated first pass, perceptual review second, human sign-off on anything touching legally sensitive or historically significant material. Several Swiss federal agencies have moved toward this structure since 2024, following guidance issued by the Federal Archives in Bern. Zurich's cantonal IT coordination body, the Amt für Informatik, has been watching those federal pilots closely.
Costs, Timelines, and the Governance Gap
Storage costs in Swiss institutional cloud environments run roughly between CHF 0.02 and CHF 0.05 per gigabyte per month depending on contract tier—figures that seem trivial until multiplied across archives holding hundreds of terabytes. A medium-sized cantonal institution managing 50 terabytes of image data could be carrying CHF 1,000 to CHF 2,500 in monthly overhead attributable purely to redundant files, according to standard industry pricing benchmarks. For Zurich's university hospitals and research arms, the scale is considerably larger.
The more significant decision is not technical but structural. No single body currently holds authority to mandate a unified deduplication standard across Zurich's mix of cantonal, municipal, and federal-adjacent institutions. That governance gap means each organisation is effectively making its own call, which risks creating new incompatibilities even as old duplicates are cleared. A working group under the Kanton Zürich's Digitale Verwaltung initiative is expected to publish draft coordination guidelines before the end of the third quarter of 2026—a document that could set the template, or simply add another layer of competing frameworks.
What comes next is a sequence of decisions that will determine whether Zurich emerges from this with genuinely cleaner, faster, more reliable digital infrastructure, or simply pushes the problem into the next migration cycle. Institutions with imminent platform deadlines—particularly those feeding into the ETH domain and the Stadtarchiv project—should treat the third quarter as a hard window for committing to a method. Waiting for perfect consensus has already cost two years. The longer duplicate files sit in systems scheduled for consolidation, the more expensive and disruptive the eventual cleanup becomes. Swiss direct democracy works slowly by design; digital debt does not.