ETH Zurich's library services confirmed this week that a working group focused on duplicate image detection in digitised collections had completed a pilot phase, clearing a backlog of redundant files that had accumulated across multiple scanning campaigns since 2019. The announcement, made Thursday, marks a concrete step forward in a problem that has quietly drained resources from heritage institutions across Switzerland for years.
The timing matters. Zurich's major public and academic institutions have spent the past three years accelerating digitisation programmes, partly in response to post-pandemic closures that exposed how dependent physical access was to public collections. That rush created a secondary crisis: scanning batches overlapped, metadata was inconsistently applied, and identical or near-identical images ended up catalogued separately, sometimes dozens of times. Storage costs and retrieval times both suffered.
What Happened This Week
The ETH-Bibliothek on Rämistrasse and the Stadtarchiv Zürich on Neumarkt are the two institutions most publicly associated with the cleanup effort. According to documentation circulating among Swiss library networks, the pilot at ETH-Bibliothek applied perceptual hashing technology — a method that compares image fingerprints rather than pixel-by-pixel content — to a corpus of roughly 140,000 digitised items. Early results identified a duplication rate the working group described as higher than initially projected, though a final verified figure has not been officially published.
The Stadtarchiv, which holds civil records, photographs and administrative documents stretching back to the medieval period, has been running a parallel review since January 2026. Staff there have been working through collections related to the 1970s urban planning era in districts like Aussersihl and Wiedikon, areas whose physical transformation generated dense photographic documentation and, consequently, heavy overlap in scanned materials.
Duplicate image management is not a glamorous problem. It sits at the intersection of information science, storage procurement and cataloguing policy, and tends to attract attention only when budgets get squeezed. Switzerland's federal archiving standards, governed under the Archivierungsgesetz, require long-term preservation of originals, which means institutions cannot simply delete duplicates without a documented review process. That legal constraint makes automated bulk deletion impossible and forces a slower, verified workflow.
Why Storage Costs Are Sharpening Minds
Cloud and on-premise storage costs have risen sharply across European institutional procurement since 2023, driven by energy price volatility and hyperscaler pricing shifts. Swiss institutions, which typically hold data domestically for data sovereignty reasons, face costs that do not benefit from the scale economies available to larger national archives in Germany or France. A mid-sized digitised photograph collection at archival resolution can easily run to several terabytes; multiplied across dozens of scanning projects, duplicates represent not just wasted metadata effort but measurable annual expenditure.
ETH Zurich's broader digital infrastructure ranks among the most heavily resourced in continental Europe, and the university has repeatedly cited its position in global research rankings — it placed 7th globally in the 2025 QS World University Rankings — as partial justification for sustained investment in digital collections. Even so, internal procurement reviews conducted in late 2025 flagged storage efficiency as a priority line for 2026.
Other Zurich institutions watching the outcome include the Zentralbibliothek Zürich on Zähringerplatz, which manages one of the country's largest cantonal manuscript collections, and the Museum Rietberg in Rieterpark, whose photographic archive of non-European art has grown substantially through donation campaigns over the past decade.
For institutions still working through similar backlogs, the ETH-Bibliothek pilot is expected to produce a publicly available methodology document by the end of the third quarter of 2026. Archivists at smaller cantonal institutions have been waiting on exactly that kind of transferable framework before committing their own digitisation budgets to a comparable review. The full results of the Stadtarchiv review on Neumarkt are scheduled for internal sign-off in September, with a public summary expected before the end of the year.