Zurich's Digital Archives Push Forward on Duplicate Image Problem This Week
Libraries, research institutions and civic tech teams across the city are testing new automated tools to clean up years of redundant visual data clogging public databases.
Libraries, research institutions and civic tech teams across the city are testing new automated tools to clean up years of redundant visual data clogging public databases.

Zurich's main municipal archive, the Stadtarchiv on Alfred-Escher-Strasse, confirmed this week that it has begun a structured review of its digitised photographic holdings after internal assessments flagged a significant proportion of duplicate and near-duplicate images across collections spanning the past four decades. The review, which started in earnest on 1 July 2026, marks the most systematic attempt the institution has made to address redundant visual content since large-scale digitisation accelerated after 2018.
The timing matters. Swiss public institutions are under mounting pressure to make their digital holdings genuinely searchable and usable, not merely uploaded. The Federal Archives Act, updated in 2021, tightened requirements on accessibility and integrity of publicly held records. For cities like Zurich, where the volume of digitised material has roughly doubled since 2019 according to internal projections reported to the city council, the practical problem of duplicate images is no longer a minor housekeeping issue — it is a measurable obstacle to research, journalism and civic transparency.
ETH Zurich's Computer Vision Laboratory, based on Rämistrasse, has been running a pilot since May 2026 in partnership with two cantonal institutions to test perceptual hashing algorithms — a method that detects visually similar images even when file names, metadata or compression formats differ. The technique can process thousands of image pairs per hour and flag duplicates with a similarity threshold that archivists then review manually. ETH researchers presented preliminary findings at an internal workshop in early June, though those findings have not yet been published.
The practical results so far suggest that in one mid-sized civic photography collection of roughly 40,000 images, between 12 and 18 percent of files were flagged as probable duplicates on first pass — a figure that archivists described as higher than expected but consistent with what institutions in Hamburg and Amsterdam have reported after similar audits. That range, if confirmed across Zurich's broader holdings, would represent tens of thousands of files requiring human review before any deletion or consolidation takes place.
Alongside ETH's work, the Zentralbibliothek Zürich on Zähringerplatz has been trialling a commercial deduplication platform since the start of the second quarter of 2026. Librarians there are working through a backlog of scanned newspaper photographs and event imagery from the 1980s and 1990s, where physical reprints often mean the same image exists in three or four slightly different scans. The cost of the software licence has not been disclosed publicly, but the library confirmed the trial is funded through a canton-level digitisation grant awarded in late 2025.
For the thousands of researchers, journalists and students who use Zurich's digital collections each year — the Zentralbibliothek alone recorded more than 1.2 million digital access sessions in 2024 — cleaner archives would mean faster, more reliable search results. A duplicate image problem is not merely aesthetic: when search indexes treat each copy as a separate record, relevant material gets buried under noise, and provenance chains become harder to reconstruct.
The Stadtarchiv has indicated it aims to publish a methodology document by September 2026 outlining how it will handle confirmed duplicates — whether files are deleted, merged, or retained with flags indicating redundancy. Archivists have noted that some duplicates carry different annotations or provenance notes on each copy, which means automated deletion alone is not appropriate.
For anyone who relies on these collections — academic researchers at the University of Zurich on Rämistrasse, journalists working on historical pieces, genealogists, urban planners — the practical advice this week is to document your current search workflows. As deduplication reshapes what appears in results, search behaviours that work today may return different sets of records by the end of the year. The institutions involved have committed to publishing change logs, but checking directly with archivists before major projects begin is the surest way to avoid gaps in your source material.
How does this story make you feel?
Spread the word
About this article
Published by The Daily Zurich
Daily brief
Free, in your inbox before 7am. Weekdays.
More in News