Zurich's public institutions collectively hold tens of thousands of duplicate images in their digital archives — the same photograph catalogued under two different identifiers, filed in separate systems, licensed more than once. Stadtarchiv Zürich, which manages the city's historical records from its premises near Neumarkt, confirmed earlier this year that a systematic deduplication project is underway. The problem did not arrive overnight.
The roots go back to the late 1990s and early 2000s, when Swiss public bodies and private firms alike rushed to digitise analogue collections without coordinating file-naming conventions or metadata standards. Each institution built its own silo. A photograph of Paradeplatz taken in 1978 might exist as a 35mm slide in one cabinet, a low-resolution TIFF scanned in 2001 for one department, and a higher-resolution JPEG rescanned in 2009 for another — none of them cross-referenced. When cloud migration accelerated after 2015, those three files often travelled together into the same bucket, tagged differently and treated as distinct assets.
The Merger Effect and a Missed Opportunity
The UBS absorption of Credit Suisse in 2023 threw the problem into sharper relief for the private sector. Credit Suisse's internal communications archive — including product photography, event images and executive portraits stretching back to the 1980s — had to be reconciled with UBS's own holdings. Industry observers noted at the time that Swiss financial institutions had never developed a shared image-metadata protocol, leaving individual compliance and marketing teams to manage assets independently. The result, by 2025, was redundant storage running into the petabytes across Zurich's Bankenviertel alone.
ETH Zurich's library and archive services, based on Rämistrasse, had actually piloted a duplicate-detection algorithm as early as 2018 as part of a broader digital preservation research programme. The tool used perceptual hashing — a technique that identifies visually similar images even when file names, formats or resolutions differ. ETH published findings internally, but the methodology did not migrate quickly into city government workflows. A follow-on study completed in 2024 found that across a sample of three cantonal institutions, roughly 22 percent of catalogued image assets were either exact duplicates or near-identical variants of the same source file.
The Zurich Cantonal Library on Zähringerplatz faced a related but distinct version of the challenge. Donated private collections, digitised in batches between 2010 and 2022, frequently overlapped with images already held under institutional accessions. Without a unified record-management system, librarians had no automated way to flag the collision. Staff flagged the issue manually — an unsustainable approach as collection sizes grew.
What the Deduplication Drive Looks Like in Practice
The current push to systematically replace or consolidate duplicate images is driven partly by storage economics and partly by new cantonal data-governance guidelines that took effect on 1 January 2026. Under those rules, publicly funded archives must document provenance for every digital asset, which is nearly impossible when the same image sits in two folders with contradictory metadata. Institutions that cannot demonstrate clean provenance chains by the end of 2027 face restrictions on republishing those assets in public-facing platforms.
For smaller cultural venues — including the Helmhaus on Limmatquai and several neighbourhood Kulturzentren in Kreis 5 — the practical problem is more immediate. Many rely on image libraries built piecemeal from donations, press kits and unlicensed web downloads. A duplicate in their system is not just a storage cost; it may represent a licensing liability if two copies carry different attribution records.
Stadtarchiv Zürich's deduplication project is expected to process around 180,000 image files in its first phase, running through the end of 2026. Institutions seeking to get ahead of the 2027 deadline are being advised to adopt the Dublin Core metadata standard, which ETH's library services have promoted for years, and to run perceptual-hash checks before migrating any analogue collection to digital storage. The Swiss Federal Archives in Bern has made its own hashing tools available under an open licence since March 2026 — a resource that several Zurich bodies are now, belatedly, beginning to use.