A coordinated effort to purge thousands of duplicate images from Zurich's public digital collections moved into a new operational stage this week, with the Stadtarchiv Zürich and ETH-Bibliothek both confirming internal reviews of their holdings are underway. The immediate trigger: a joint working session held Tuesday at the ETH Zürich main building on Rämistrasse that brought together archivists, computer scientists and cantonal records officials to agree on shared detection protocols.
The problem has been quietly accumulating for years. As Zurich's cultural institutions accelerated their digitisation programs — particularly after the 2020-era funding injections tied to the city's Smart Zurich strategy — photo collections from different sources were ingested into overlapping databases without consistent deduplication checks. The result is redundant storage costs, cataloguing confusion, and, crucially, broken metadata chains that make historical photographs harder to find and attribute correctly.
What the Week's Sessions Actually Produced
Tuesday's working session, which ran for roughly five hours according to the agenda circulated to participating institutions, produced a draft technical specification for a shared hashing standard. The approach would assign each image a unique perceptual fingerprint, allowing automated systems to flag near-identical files even when file names, formats or compression levels differ. The Schweizerisches Nationalmuseum, which holds a substantial Zurich-related photographic collection at its Museumstrasse site, was represented in the discussions and is expected to run a pilot comparison of its digital holdings against the Stadtarchiv's catalogue before the end of August 2026.
For the ETH-Bibliothek, the stakes are particularly high. The library's e-manuscripta and e-rara platforms together host hundreds of thousands of digitised items, and an internal audit completed in spring 2026 identified a duplication rate of roughly 4.2 percent across image-based holdings — meaning tens of thousands of files that either replicate existing records or overlap substantially with partner institution uploads. Staff at the Hauptbibliothek on Zähringerplatz have been working through a backlog of flagged items since April.
The financial dimension matters too. Cloud storage costs for Swiss public-sector institutions have risen sharply since 2023, and the canton of Zurich's IT directorate has signalled that departments will face tighter per-gigabyte allocations from the 2027 budget cycle. Eliminating verified duplicates is now framed internally not just as a cataloguing hygiene measure but as a cost-containment priority. Rough estimates from comparable European digitisation programs suggest that duplication rates in the 3-5 percent range can translate to annual storage costs running into six figures for large collections — though Zurich institutions have not published their own figures publicly.
Why This Matters Beyond the Archives
The push is not confined to institutional back-offices. The city's open-data portal, accessible through the Stadt Zürich Open Government Data platform, draws on several of these same image repositories to populate public-facing historical maps and neighbourhood documentation tools — resources used by schools, journalists and researchers. Duplicated or misattributed images in the source databases surface as errors in those public tools, undermining the portal's credibility.
The Zürich-based digital preservation nonprofit IG digitale Langzeitarchivierung has been lobbying for a cantonal-level deduplication standard since at least 2024, arguing that voluntary coordination between institutions is too slow and that a binding technical framework is needed. Whether the Tuesday working session's draft specification eventually becomes that framework will depend on sign-off from the Stadtrat and the cantonal Bildungsdirektion — a process that, given Zurich's direct-democracy procedural requirements, is unlikely to conclude before early 2027.
For anyone using Zurich's digital collections in the meantime, archivists at the Stadtarchiv on Neumarkt have confirmed that queries about specific image duplications can be submitted through the existing public research request system. The ETH-Bibliothek has also indicated it will publish a progress update on its deduplication work in its next quarterly report, expected in September 2026.