The Daily Zurich

Zurich news, every day

News

Zurich's Digital Archives Push Forward on Duplicate Image Problem This Week

Institutions across the city are grappling with a surge in redundant visual data as AI-assisted cataloguing tools expose decades of duplicated photographic records.

By Zurich News Desk · Published 4 July 2026, 9:16 pm

3 min read

Zurich's Digital Archives Push Forward on Duplicate Image Problem This Week
Photo: Photo by Elijah Cobb on Pexels

Zurich's archival and media management community moved this week to address a long-standing problem that has quietly consumed server capacity and curatorial hours across the city: duplicate images embedded in institutional databases, news photo libraries, and public digital archives. The issue came into sharper focus after ETH Zurich's Chair for Information Science published a working paper on July 1st identifying duplicate image proliferation as a primary driver of inefficiency in Swiss institutional data management.

The timing is not accidental. The UBS Group's ongoing integration of former Credit Suisse digital assets — a process that began in earnest after the March 2023 emergency merger — has forced large-scale deduplication efforts across financial media libraries and internal communications archives. Compliance teams handling the combined entity's records have flagged duplicate image files as a compliance risk under Swiss financial documentation regulations, because redundant visual records can obscure version histories and audit trails.

What Happened This Week in Zurich

On Wednesday, the Stadtarchiv Zürich on Neumarkt confirmed it had completed the first phase of a deduplication sweep covering roughly 180,000 digitised photographic negatives from the 20th century. Archivists used perceptual hashing software — a technique that generates a fingerprint for each image based on visual content rather than file metadata — to identify near-identical records that had been scanned multiple times over different digitisation campaigns going back to 2004. The sweep identified a duplication rate of approximately 12 percent across the collection, meaning around 21,600 images had at least one redundant copy stored in the system.

Across the Limmat, the Kunsthaus Zürich's digital collections team has been running a parallel exercise since June. The museum, whose extension on Heimplatz opened in 2021, holds one of the largest art image databases in the German-speaking world. A person familiar with the project, who was not authorised to speak publicly, told The Daily Zurich that the museum's library had identified significant duplication arising from multiple vendor digitisation contracts over the years. The Kunsthaus declined to provide specific figures before a planned public update later this month.

The problem is not unique to Zurich's public institutions. Media organisations in the city — including photo agencies operating out of the Medienquartier on Kalanderplatz in Zürich-West — have faced the same challenge as AI-powered image recognition tools have become accessible enough for smaller editorial teams to deploy. Industry estimates circulating at a June media technology event in Basel suggested that mid-sized European news photo archives waste between 15 and 25 percent of their cloud storage budget on duplicate or near-duplicate files.

Why Deduplication Matters Beyond Storage Bills

The practical stakes go beyond tidying up hard drives. Under Swiss Federal Act on Data Protection provisions updated in September 2023, organisations holding personal image data — including photographs of identifiable individuals — carry legal obligations around data minimisation. Keeping multiple copies of the same image of a private person without a documented reason could, legal advisors say, create exposure under those rules. That concern has pushed the deduplication question up the priority list for Zurich's institutional IT departments in the first half of 2026.

ETH Zurich's working paper, authored by researchers in the Data Analytics Lab on Rämistrasse, recommended a city-wide shared deduplication protocol for public cultural institutions, arguing that a federated approach would reduce redundant processing work. The paper proposed a pilot involving at least four Zurich municipal archives starting in the fourth quarter of 2026.

For individual organisations still in the early stages, the practical advice from the ETH paper is straightforward: run a perceptual hash scan before committing to any new cloud storage contract renewal, because duplicates inflating current storage figures will distort the capacity planning. Several Swiss cloud providers, including those with data centres in canton Zurich, now bundle deduplication audits into enterprise contracts — though pricing for those services varies considerably based on collection size and the degree of near-duplicate detection required beyond exact matches.

The Stadtarchiv's full deduplication report is expected to be published on its website by the end of July, which will be the first publicly available benchmark for Swiss municipal archives attempting the same exercise.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily Zurich

This article was produced by the The Daily Zurich editorial desk and covers news in Zurich. See our editorial standards for how we use AI.

The Daily Zurich brief

The day's Zurich news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily Zurich and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to Zurich news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily Zurich and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily Zurich

More in News

Enjoyed this story? Get tomorrow's briefing free.