Thousands of redundant digital images are clogging the storage systems of Zurich's public institutions, and the people responsible for managing those archives say the problem got measurably worse this week as summer digitisation projects ramped up across the city. At the heart of it: no unified standard for detecting and removing duplicate files before they are ingested into permanent collections.
The issue is more than a housekeeping headache. With Switzerland's federal data retention rules tightening and cloud storage costs rising, institutions that maintain large photographic archives — from municipal departments to university libraries — are facing a direct financial and administrative reckoning. The timing matters because several major digitisation contracts, awarded earlier this year, reached active processing phases this week.
What's Happening on the Ground in Zurich
At ETH Zurich's main library on Rämistrasse, staff are midway through a project to digitise historical engineering faculty photographs dating back to the 1880s. The collection runs to more than 40,000 physical items. Duplicate detection at that scale requires automated tooling, and the library has been piloting open-source perceptual hashing software — technology that compares images by visual content rather than file name or metadata — since February 2026. This week, according to the project's publicly posted progress log on the ETH Library website, the team completed processing of the pre-1920 batch and flagged roughly one in eleven images as probable duplicates requiring human review.
Across the Limmat, the Zentralbibliothek Zürich on Zähringerplatz is running a parallel effort tied to its Bildarchiv holdings. The library has been a formal partner in the Swiss National Science Foundation's broader digital preservation framework since 2024. Staff there have described the duplicate image question as one of the more labour-intensive aspects of the current intake cycle, with review queues building faster than they can be cleared during peak submission periods. No formal figure has been released publicly for the Zentralbibliothek's duplicate rate, but the challenge is consistent with what ETH's logs suggest.
The wider context is a Swiss cultural sector that has committed significant public funds to digitisation since the federal government's Kulturbotschaft 2025–2028 program allocated resources specifically to making archival collections searchable and accessible online. Duplicate image bloat directly undermines that goal: redundant files inflate storage costs, confuse search indexing, and dilute the quality of publicly accessible databases.
The Technology Gap and What Comes Next
The core technical problem is that duplicate detection tools built for consumer photo libraries — the kind that powers the de-duplication feature on a smartphone — do not translate cleanly to archival work. Archival duplicates are often near-duplicates: the same photograph scanned twice at different resolutions, or a print and its negative both digitised. Standard hash-matching misses these. Perceptual hashing catches more, but requires calibration and still generates false positives that demand expert review.
Researchers at the University of Zurich's Department of Informatics, based at Binzmühlestrasse in Oerlikon, have published work on content-based image retrieval that addresses exactly this gap, though applying academic prototypes to live archival workflows at the scale Zürich's institutions require remains a work in progress.
For institutions managing these collections, the practical advice from archival professionals this week has been consistent: establish deduplication checkpoints at the point of scanning, before files enter the permanent repository, rather than trying to clean up retrospectively. Retroactive cleaning on a collection of tens of thousands of items is expensive and error-prone. Getting the intake workflow right matters far more than any downstream purge.
The ETH Library project is scheduled to complete its pre-1945 batch by September 2026. How it handles the duplicate review backlog between now and then will be watched by other institutions planning similar projects in Basel and Bern later this year. Zurich, as usual, is running the experiment the rest of Switzerland will learn from.