The Daily Zurich

Zurich news, every day

News

Zurich's Digital Archives Push Forward on Duplicate Image Problem — Here's Where Things Stand This Week

A coordinated effort among the city's leading cultural institutions to identify and remove redundant digital images from public collections has hit a new milestone, but the work is far from finished.

By Zurich News Desk · Published 4 July 2026, 9:25 pm

3 min read

Zurich's Digital Archives Push Forward on Duplicate Image Problem — Here's Where Things Stand This Week
Photo: Photo by Elijah Cobb on Pexels

Swiss cultural institutions managing Zurich's vast publicly accessible digital image collections cleared a significant internal benchmark this week, advancing a multi-year effort to scrub tens of thousands of duplicate photographs, scans, and archival images from repositories used by researchers, journalists, and the general public. The push has been underway since early 2024, but July 2026 marks the first time partner organisations have formally reported progress against a shared deduplication standard.

The issue matters in practical terms. When the same image appears under two or more catalogue entries — common in digitisation projects where a photograph is scanned separately by different departments, or acquired through multiple donor streams — it inflates collection counts, skews search results, and wastes server storage that institutions pay for out of public funds. For an institution running tens of thousands of catalogue records, even a five-percent duplication rate represents a meaningful editorial and financial burden.

Who Is Doing the Work, and Where

The Stadt Zürich's Stadtarchiv, based at Neumarkt 4 in the Altstadt, has been the coordinating body for the deduplication project since its formal launch under the city's Digitale Strategie framework. It has been working alongside the Zentralbibliothek Zürich on Zähringerplatz and the Museum für Gestaltung on Ausstellungsstrasse, both of which maintain large independent image databases that overlap with municipal holdings. The three institutions collectively hold digitised records stretching back to the mid-nineteenth century, and the overlap problem became acute after each ran separate scanning campaigns between 2018 and 2023.

The current phase of the project uses perceptual hashing — a computational technique that generates a short fingerprint from an image's visual content rather than its file metadata — to flag near-identical copies across databases. Staff then review flagged pairs manually before any record is suppressed or merged, a step the institutions insist on to avoid accidentally deleting a genuinely unique variant, such as a photograph printed at a different crop or contrast from a shared negative.

ETH Zürich's Institute for Information Technology and Electrical Engineering has contributed technical advisory support to the methodology. ETH Zürich consistently ranks among the top ten universities globally for computer science, and its proximity on Rämistrasse gives the city institutions an unusually direct pipeline to applied research on exactly these kinds of classification problems.

Numbers and What They Mean for Users

The Zentralbibliothek's digitised image catalogue exceeded 1.2 million indexed items as of its last published annual report. Staff involved in the project have indicated informally — without citing verified deduplication counts — that early test runs on a subset of the photographic holdings returned flag rates that surprised internal reviewers. No institution has published a final duplicate-removal count for this week's milestone, and The Daily Zurich was unable to independently verify claims about scope before deadline.

What is confirmed: the Stadt Zürich's Stadtarchiv posted a brief update to its project page on Wednesday, 2 July 2026, noting that the shared deduplication protocol agreed between the three institutions had passed an internal validation review and would move into a broader rollout covering pre-1950 photographic records across all three collections. That phase is scheduled to run through the end of the third quarter of 2026.

For researchers working at the Lesesaal on Zähringerplatz or accessing the Stadtarchiv's online portal, the practical upshot is cleaner search results and fewer dead-end catalogue entries that turn out to be identical to an image already found elsewhere. Genealogists, urban historians, and architecture students at ETH and at the Zürcher Hochschule der Künste on Hafnerstrasse have been among the most vocal users pushing for this kind of catalogue hygiene.

The institutions plan to publish a joint technical report by October 2026, which will include the first publicly available statistics on how many records were merged, suppressed, or flagged for further review. Users who spot duplicates in the meantime can submit corrections through each institution's existing feedback forms — a low-tech safety valve that cataloguers say has already surfaced errors the automated tools missed. The broader lesson Zurich's archivists are drawing is that computational tools accelerate the identification phase, but human judgement remains the last check before anything is removed from the public record.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily Zurich

This article was produced by the The Daily Zurich editorial desk and covers news in Zurich. See our editorial standards for how we use AI.

The Daily Zurich brief

The day's Zurich news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily Zurich and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to Zurich news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily Zurich and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily Zurich

More in News

Enjoyed this story? Get tomorrow's briefing free.