The Daily Zurich

Zurich news, every day

News

Zurich Archives Push to Eliminate Duplicate Digital Images in Major Collection Overhaul

A city-wide effort to clean up redundant photograph records is reshaping how institutions from ETH Zurich to the Stadtarchiv manage their growing digital collections.

By Zurich News Desk · Published 4 July 2026, 9:00 pm

3 min read

Zurich Archives Push to Eliminate Duplicate Digital Images in Major Collection Overhaul
Photo: Photo by Marcel Biegger on Pexels

Zurich's archival community moved decisively this week to tackle a problem that has quietly inflated digital storage costs and confused researchers for years: tens of thousands of duplicate images scattered across public and institutional collections. The city's Stadtarchiv, housed on Neumarkt in the Altstadt, confirmed it has begun a phased replacement program targeting redundant digital image files across its publicly accessible database, with the first tranche of corrections expected to be completed before the end of the third quarter of 2026.

The timing is deliberate. Swiss federal data governance guidelines updated in early 2026 now require cantonal institutions to meet stricter metadata standards by December 31, 2026, or risk losing interoperability status with the national Dodis and Memoriav archival networks. For Zurich, that deadline has concentrated minds.

What Changed This Week

On Wednesday, July 2, the Stadtarchiv published an internal working document — made available to journalists on request — outlining the scope of the cleanup. The institution identified more than 14,000 image records flagged as probable duplicates through automated hash-matching software deployed since March. Of those, roughly 3,200 have already been reviewed by staff and either consolidated into single canonical records or marked for deletion pending a 30-day appeal window open to registered researchers.

ETH Zurich's library and archive division on Rämistrasse has been running a parallel process since February, using an open-source perceptual hashing tool developed in collaboration with the university's computer science department. The ETH project targets historical photograph collections that were digitised in multiple batches between 2008 and 2019, a period when scanning standards changed at least three times, producing near-identical images stored under different file names and catalogue numbers. The ETH library has not publicly disclosed the total number of duplicates identified, but the project's GitHub repository, updated as recently as June 30, lists 47 completed batch-review cycles covering approximately 280,000 image files.

The Zentralbibliothek Zürich on Zähringerplatz is also affected. The institution holds one of the largest photographic collections in the German-speaking world, and its digital portal has long carried warnings that some image sets may contain redundant entries from legacy migration projects. This week, a notice appeared on the portal's landing page confirming that a duplicate-review process is underway and that certain catalogue entries may be temporarily unavailable during verification checks.

Why Storage Costs and Research Integrity Both Depend on Getting This Right

The financial argument is straightforward. Swiss public cloud storage for uncompressed archival-grade TIFF files runs at roughly CHF 0.022 per gigabyte per month under standard cantonal procurement contracts, according to published rates from the cantonal IT service provider OIZ. A collection carrying even 10,000 redundant high-resolution files — each potentially 50 to 100 megabytes — represents a recurring and entirely unnecessary expense. Multiply that across a dozen institutions in the canton, and the cumulative waste across a five-year period becomes significant.

But researchers argue the integrity problem is more serious than the cost. Duplicate records with slightly different metadata — a different date stamp, a misspelled photographer's name, a wrong district attribution — generate false leads and distort search results. For historians working on projects tied to Zurich's urban development, particularly in contested areas like Escher-Wyss-Platz or the former Industriequartier, a duplicated image assigned conflicting dates can skew an entire argument about neighbourhood transformation.

The Stadtarchiv's replacement workflow requires human sign-off on every deletion, not just automated flagging. Researchers with active loans or citations tied to a catalogue number that is being retired will receive direct email notification before any record is altered. The Zentralbibliothek is expected to adopt a similar protocol once its own review phase concludes, a process staff described in a brief public statement this week as likely to run through October.

Anyone who has cited a digital image from any of these three institutions in academic work submitted since January 2025 is advised to cross-check their catalogue references against the current portal listings before the end of September. The Stadtarchiv has confirmed it will maintain a redirect table so that deprecated record numbers resolve to their replacement entries for at least three years after the changeover.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily Zurich

This article was produced by the The Daily Zurich editorial desk and covers news in Zurich. See our editorial standards for how we use AI.

The Daily Zurich brief

The day's Zurich news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily Zurich and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to Zurich news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily Zurich and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily Zurich

More in News

Enjoyed this story? Get tomorrow's briefing free.