The Daily Zurich

Zurich news, every day

News

Zurich's Digital Archives Push Forward on Duplicate Image Problem — Here's Where Things Stand This Week

ETH Zurich and the city's main cultural institutions are closing in on automated tools that can strip redundant photographs from swelling digital collections — but the work is proving harder than expected.

By Zurich News Desk · Published 4 July 2026, 8:44 pm

3 min read

Zurich's digitisation drive has a clutter problem. Across the city's major cultural repositories — from the Schweizerisches Nationalmuseum on Museumstrasse to the Stadtarchiv Zürich offices near the Rathaus — archivists are contending with digital collections bloated by thousands of duplicate and near-duplicate images, scanned multiple times across different projects over the past decade. This week, a working group coordinating between ETH Zürich's Data Archive unit and the Zentralbibliothek Zürich on Zähringerplatz confirmed it is piloting a new automated deduplication pipeline, targeting a backlog that internal assessments now put at several hundred thousand redundant image files.

The timing matters for several reasons. Swiss federal guidelines under the new Archivierungsverordnung, which came into effect in January 2026, set stricter standards for how public institutions manage digital storage — including obligations to audit for unnecessary duplication before expanding server infrastructure. Zurich's institutions are under pressure to comply before the next federal reporting cycle closes at the end of the third quarter. Storage costs and energy consumption are also under scrutiny: the Zentralbibliothek alone manages a digital collection that has grown to more than four million catalogued objects, according to figures the institution published in its 2025 annual report.

What the New Tools Actually Do

The deduplication system being tested uses perceptual hashing — a technique that generates a compact fingerprint for each image based on visual content rather than file metadata — combined with a secondary pass using convolutional neural network models trained on archival photograph types common in Swiss collections. The distinction matters because simple file-hash matching, which checks whether two files are byte-for-byte identical, catches only exact copies. Perceptual methods catch rescanned versions of the same photograph, slightly different crops, or images digitised under different lighting conditions in different years. Those near-duplicates are the majority of the problem.

ETH Zürich's chair for Information Science, which has been collaborating on the project since autumn 2025, has been running test batches on a subset of the Zentralbibliothek's historical postcard collection — some 80,000 items. Early results, shared internally this week, found a duplication rate of roughly 12 percent in that subset, higher than archivists had estimated when the audit began. The next phase extends the same pipeline to photographic holdings at the Stadtarchiv, with a planned start date of September 2026.

The practical challenge is not just detection. Once a duplicate or near-duplicate is flagged, a human archivist must decide which version to retain — a question that turns on image quality, provenance metadata, and whether different copies carry different annotation histories. That curatorial step cannot be automated away, and staffing it is where institutions say the real bottleneck sits. The Zentralbibliothek currently has three permanent posts dedicated to digital collections management, a number that has not changed since 2022 despite the collection's continued growth.

What Comes Next for Collections Across the City

The Stadtarchiv Zürich is expected to publish a short public report on the first phase of the audit before the summer recess ends in mid-August. That document will feed into a broader review by the cantonal culture department, which oversees funding allocations for digitisation work across Zurich's public institutions. A decision on whether to fund additional archivist posts — or contract specialist firms — is anticipated in the cantonal budget discussions scheduled for October 2026.

For researchers and members of the public who regularly use Zurich's digital portals, the immediate effect of the deduplication work should eventually be cleaner search results. Anyone who has searched the online catalogue at e-rara.ch or the Zürich cantonal image portal and been confronted with near-identical results clustered together will recognise the problem. The institutions are not putting a firm public completion date on the full deduplication project, which given the scale of the collections and the staffing constraints is a realistic if frustrating position. What has changed this week is that the work has moved from planning documents into live data — and that is a meaningful shift for collections that have been accumulating digital clutter for the better part of fifteen years.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily Zurich

This article was produced by the The Daily Zurich editorial desk and covers news in Zurich. See our editorial standards for how we use AI.

The Daily Zurich brief

The day's Zurich news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily Zurich and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to Zurich news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily Zurich and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily Zurich

More in News

Enjoyed this story? Get tomorrow's briefing free.