Zurich Archivists Tackle Duplicate Image Crisis as Digital Collections Balloon
City institutions are racing to clear redundant files from their digital archives this week, with ETH Zurich and the Stadtarchiv leading a coordinated clean-up effort.
City institutions are racing to clear redundant files from their digital archives this week, with ETH Zurich and the Stadtarchiv leading a coordinated clean-up effort.

Zurich's major cultural and academic institutions moved this week to address a growing backlog of duplicate digital images clogging their shared storage infrastructure, with ETH Zurich's library and the Stadtarchiv Zürich on Neumarkt both confirming active remediation work is underway. The push follows an internal audit completed in late June 2026 that identified tens of thousands of redundant image files spread across interconnected archival systems.
The timing matters. Switzerland's federal digitisation strategy, which allocated CHF 12 million across cantonal institutions between 2023 and 2026 for the accelerated scanning and ingestion of historical collections, has flooded repositories with high-resolution image files faster than quality-control workflows could keep pace. Institutions that rushed to hit digitisation targets ended up ingesting the same photographic prints, maps, and documents multiple times — sometimes under different metadata tags, sometimes through parallel project pipelines that had no mechanism to flag duplicates at point of entry.
At ETH Zürich's library on Rämistrasse, staff have been running deduplication software across an estimated 4.2 million image records since the start of this week. The library's digital collections team — which manages one of Switzerland's largest open-access image repositories — confirmed the process is ongoing but declined to give a precise figure for how many files are expected to be removed. The Stadtarchiv, which holds photographic and cartographic records stretching back to the nineteenth century, is running a parallel process and has already cleared a first batch of files identified as exact duplicates by hash-matching algorithms.
Redundant images are not simply a question of wasted server space. When researchers query a public-facing archive and retrieve multiple identical images with conflicting metadata — different date stamps, different provenance tags, different rights classifications — it undermines the reliability of the collection. A historian pulling aerial photographs of the Limmatquai district, for example, might unknowingly cite the same image twice under two different catalogue numbers, a mistake that compounds once the error enters published academic work.
The problem has a financial dimension too. Cloud storage costs for Swiss public institutions are calculated per terabyte per month, and archivists working on the project estimate that duplicate images account for between eight and fifteen percent of total storage volume in some collections — a range consistent with findings from comparable digitisation drives in Germany and the Netherlands. At current Swiss hosting rates, trimming even a conservative ten percent of redundant data from a mid-size cantonal archive can free up budget equivalent to several months of a junior archivist's salary.
The University of Zurich's digital humanities unit on Schönberggasse has been developing an open-source deduplication toolkit, called DupliClear, since early 2025. That project reached a stable release at the end of May 2026 and is now being tested by both ETH Zürich and the Stadtarchiv as part of a pilot that could eventually be adopted by the Swiss Federal Archives in Bern. DupliClear uses perceptual hashing rather than exact byte-matching, meaning it can flag near-duplicate images — slightly different scans of the same document, for instance — that conventional tools miss entirely.
Several archive portals will be intermittently unavailable over the coming two weeks as batch deletions and metadata reconciliation run in the background. The ETH Zürich library has posted a maintenance schedule on its website indicating rolling downtime windows between 22:00 and 06:00 on weekday nights through July 18. The Stadtarchiv has not yet published a formal schedule but advised researchers planning visits to its reading room on Neumarkt to confirm access by telephone before travelling.
For individual researchers working with downloaded image sets, archivists recommend running a local deduplication check before beginning citation work — several free tools are available for this — and cross-referencing any catalogue numbers retrieved before July 2026 against the updated metadata once the clean-up completes. The Swiss Library Information Network, Swissbib, is expected to publish a consolidated update to affected collection records by the end of August.
The broader lesson, archivists say, is structural: future digitisation contracts should mandate deduplication checks at the point of ingest rather than treating it as a post-hoc clean-up task. That recommendation is now being discussed at the cantonal level ahead of the next federal funding cycle.
How does this story make you feel?
Spread the word
About this article
Published by The Daily Zurich
Daily brief
Free, in your inbox before 7am. Weekdays.
More in News