Zurich's public institutions hold tens of millions of digital image files. A growing share of them are duplicates — and a study completed in spring 2026 by ETH Zurich's Institute for Information Systems and Networking found that duplicate or near-duplicate images account for roughly 23 percent of storage load across municipal digital archives surveyed in the canton. That fraction translates, in raw infrastructure terms, to significant and recurring expenditure on servers that are, in effect, warehousing identical content twice, sometimes dozens of times over.
The problem matters now for reasons beyond tidiness. Zurich's municipal digitisation drive — Stadtarchiv Zürich has been migrating physical holdings to digital formats since 2019 — has accelerated sharply since 2023, when the city allocated additional budget to the project under its Smart City Zurich programme. More images entering archives faster means the duplicate problem compounds. Without intervention, analysts at the institute estimate the redundant-data fraction could reach 30 percent by 2028, pushing storage costs higher at a moment when cantonal budgets are already stretched by housing and infrastructure demands.
Where the Duplicates Accumulate
Two institutions sit at the centre of the issue. Stadtarchiv Zürich, based at Neumarkt 4 in the Altstadt, manages historical photographic collections alongside newly digitised civic records. Staff there have flagged internally that the same image — a council chamber photograph, a construction permit scan — frequently enters the system multiple times via different departments, each uploading independently without a centralised deduplication check. The archive holds more than 4.5 million digital objects as of its most recent published inventory.
Simultaneously, ETH Zurich's main library on Rämistrasse processes scientific image datasets submitted alongside research publications. With the volume of image-heavy submissions in materials science, urban planning, and medical research growing year on year, the library's digital repository team identified in a January 2026 internal review that near-duplicate experimental images — slight variations of the same microscopy shot, for example — were consuming disproportionate storage. The library's repository crossed the 800-terabyte mark last year.
The financial dimension is concrete. Enterprise-grade archival storage in Switzerland currently runs between CHF 80 and CHF 140 per terabyte per year for institutions procuring at the scale these bodies operate. If 23 percent of an 800-terabyte repository is redundant, that represents approximately 184 terabytes of avoidable cost — potentially CHF 15,000 to CHF 26,000 annually at that single institution, before factoring in backup, indexing, and retrieval overhead. Across the broader municipal estate the sums scale considerably.
Detection Technology and What Comes Next
Automated deduplication is not new, but applying it to cultural heritage collections and scientific image libraries requires more sophistication than standard hash-matching tools offer. Perceptual hashing algorithms — which identify visually similar images even when file metadata differs — have improved substantially. A pilot programme launched in March 2026 by the Zentralbibliothek Zürich at Zähringerplatz 6 is testing one such system on a subset of its 19th-century photograph collection, which runs to approximately 60,000 digitised prints. Early results suggest the tool flags candidates for review at a rate of around one in eight images, though human archivists still make the final call on whether two images are true duplicates or meaningfully distinct variants.
The practical implication for institutions is that deduplication cannot be fully automated without risk of erasing archival material that appears redundant but carries distinct provenance or metadata value. Experts at ETH recommend a tiered approach: automatic deletion for byte-identical files, human review for perceptual near-matches, and a mandatory 90-day quarantine period before any image is permanently removed from a public-sector repository.
For Zurich's broader digital infrastructure agenda, the duplicate image question is a proxy for a larger governance gap — the absence of a canton-wide digital asset management standard that would prevent redundant uploads at the point of entry. The Smart City Zurich programme's next strategic review is scheduled for the fourth quarter of 2026. Whether unified upload protocols make it onto that agenda will depend on how clearly institutions can present the cost case. The numbers, increasingly, are available to make it.