Thousands of duplicate images are clogging the digital archives of Zurich's public institutions, and the scale of the problem is only now becoming measurable. A review of digitisation workflows across several cantonal bodies, including the Staatsarchiv des Kantons Zürich on Winterthurerstrasse and the city's own Stadtarchiv in the Neumarkt district, shows that redundant image files account for a significant share of total stored data — in some cases representing nearly one file in five as an exact or near-exact copy of another.
The timing matters. Zurich is midway through an ambitious ten-year plan to digitise its historical record holdings, a programme that began in earnest in 2021 and is scheduled to run through 2031. Storage costs have risen sharply alongside that effort. Commercial cloud storage pricing for Swiss-hosted servers — a requirement under cantonal data protection rules — runs substantially higher than European Union equivalents, with enterprise contracts typically starting above CHF 0.04 per gigabyte per month for compliant infrastructure. When duplicate files multiply unchecked across interconnected systems, those fractions add up fast.
Where the Redundancy Builds Up
The problem is not unique to government. ETH Zurich's library and research data services team has documented the challenge in academic contexts for years. Researchers uploading scanned source material, photographs, and microscopy images frequently submit the same file through multiple portals — a grant management system, a project repository, a shared drive — without realising each upload creates a separate stored instance. ETH Zurich's research data management guidelines, updated in March 2025, explicitly flag deduplication as a recommended step before any large-scale data deposit.
At the Stadtarchiv, the issue surfaces most visibly in photographic collections from the 1970s through the 1990s, a period when prints were routinely scanned multiple times by different departments with no central coordination. One mid-sized thematic collection on Zurich's Langstrasse neighbourhood, digitised in phases between 2018 and 2023, was found to contain duplicate rates approaching 22 percent across its roughly 14,000 image files, according to internal documentation reviewed as part of the cantonal digitisation audit process.
Deduplication software has existed for decades, but its adoption inside Swiss public institutions has been patchy. Open-source tools like DupeGuru and perceptual hashing libraries can identify near-duplicate images — files that are visually identical but differ slightly in resolution, compression, or metadata — with accuracy rates above 95 percent on standard photographic collections. The catch is that automated deletion carries real risk in an archival context: a file flagged as a duplicate might carry unique embedded metadata or provenance information absent from its apparent twin.
What the Numbers Mean for Budgets and Policy
The financial stakes are real enough to have attracted attention from the canton's Finanzkontrolle, the public audit body based on Walchestrasse. A broader efficiency review of digital infrastructure spending published in late 2024 flagged redundant data storage as a systemic cost driver across multiple cantonal departments, though the report stopped short of quantifying the precise figure attributable to image duplication specifically.
For Zurich's housing-pressured population, the connection to public spending is direct. The city has been running structural budget deficits in its infrastructure accounts while simultaneously facing rising costs from the Wohnungsnot housing crisis, which has pushed cantonal administrative space costs higher as office leases in central districts like Kreis 1 and Kreis 4 command premium rates. Every franc spent on avoidable cloud storage is, at least arithmetically, a franc not available elsewhere.
Institutions managing large image collections are increasingly being advised to run deduplication audits before migrating data to new platforms, not after. The Staatsarchiv is expected to issue updated technical standards for image ingestion later this year, with mandatory hash-checking at the point of upload among the provisions under discussion. For smaller cultural institutions along Zurich's Limmatquai — galleries, local history societies, smaller museum archives — the practical advice is simpler: run a free perceptual hash tool across any collection before the next storage contract renewal, and expect to find redundancy rates well above ten percent.