The Daily Zurich

Zurich news, every day

News

Zurich's Digital Archives Push Forward on Duplicate Image Crisis — Here's What Changed This Week

Institutions across the city are grappling with how to clean up bloated digital collections as AI-assisted deduplication tools move from pilot phase to live deployment.

By Zurich News Desk · Published 4 July 2026, 8:48 pm

3 min read

Zurich's Digital Archives Push Forward on Duplicate Image Crisis — Here's What Changed This Week
Photo: Photo by Kemal Kartal on Pexels

A coordinated effort to eliminate duplicate images from Zurich's major public digital repositories moved into an active new phase this week, with the city's main library network and at least one cantonal research body confirming that automated deduplication pipelines were switched on in production environments. The shift is small in the grand scheme of urban administration, but its implications for how Zurich manages publicly funded visual archives are significant.

The problem has been building for years. Digital photography, drone surveys, social-media harvesting, and the mass digitisation of analogue collections have flooded civic and academic repositories with redundant files. Estimates from comparable European municipal archives suggest duplicate or near-duplicate images can account for between 15 and 40 percent of stored assets, driving up storage costs and degrading search quality. For Zurich, whose institutions hold some of the most comprehensively digitised urban records in the German-speaking world, the administrative overhead has become hard to ignore.

What Actually Happened This Week

Zürich Stadtarchiv, located on Neumarkt in the Altstadt, confirmed on Wednesday that it had completed the first full automated scan of its photographic holdings using a perceptual-hashing algorithm. The scan covered collections spanning the post-war reconstruction period through the early 2000s — a catalogue that runs to several hundred thousand individual image files. Staff archivists are now reviewing flagged clusters before any deletions are authorised; no images have been permanently removed yet.

Separately, the ETH-Bibliothek on the Hönggerberg campus disclosed that its pilot programme, which began in January 2026, had entered a second stage this week. The library has been testing a combination of perceptual hashing and machine-learning-based similarity scoring on its scientific image collections, which include photographs from field research, laboratory work, and historical geodata surveys. ETH Zürich's global ranking as a research institution means its data-management practices tend to set a benchmark that other Swiss universities monitor closely.

The Zentralbibliothek Zürich on Zähringerplatz is understood to be further behind in this process, still in procurement discussions for deduplication tooling. Its holdings include the Graphische Sammlung and a large body of digitised periodical imagery, where near-duplicate frames — slightly different exposures of the same scene, for example — are especially common.

Why the Timing Matters

The push is partly technical and partly financial. Cloud storage costs for large image files have risen sharply since 2024, and Swiss public institutions operate under budget constraints that make waste increasingly visible to cantonal auditors. Reducing a collection's effective footprint by even 10 percent can translate into meaningful annual savings on infrastructure contracts.

There is also a legal dimension. Switzerland's revised Datenschutzgesetz, which came fully into force in September 2023, places stricter obligations on institutions holding personal data — including photographs of identifiable individuals taken in public spaces. Duplicate images of people, scattered across multiple folders and backup tiers, complicate compliance. Consolidating them into single canonical files makes it easier to honour deletion requests.

For researchers and journalists who rely on these archives, the week's developments carry a practical warning: collections may look incomplete or return fewer results during the review period. The Stadtarchiv has posted a notice on its public access portal at Neumarkt 4 advising that some search results may be temporarily suppressed while deduplication clusters are being validated by human reviewers. The library expects that phase to last through the end of July 2026.

Institutions running similar processes elsewhere in the German-speaking region — including the Staatsarchiv in Basel and municipal collections in Bern — have reported that the human-review bottleneck is consistently the longest stage. Zurich's archivists, working with a team of four dedicated reviewers at the Stadtarchiv, are aiming to clear the backlog faster by restricting the first round to images flagged with a similarity score above 95 percent, leaving borderline cases for a later pass. The practical upshot for anyone accessing these collections before August: expect some gaps, and check back.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily Zurich

This article was produced by the The Daily Zurich editorial desk and covers news in Zurich. See our editorial standards for how we use AI.

The Daily Zurich brief

The day's Zurich news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily Zurich and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to Zurich news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily Zurich and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily Zurich

More in News

Enjoyed this story? Get tomorrow's briefing free.