The Daily Zurich

Zurich news, every day

News

Zurich's Digital Archives Tackle Duplicate Image Problem as Institutions Push for Cleaner Records

A week of coordinated action by Zurich's leading cultural and research bodies to strip redundant duplicate images from public databases is reshaping how the city manages its visual heritage.

By Zurich News Desk · Published 4 July 2026, 9:16 pm

3 min read

Zurich's Digital Archives Tackle Duplicate Image Problem as Institutions Push for Cleaner Records
Photo: Photo by Mâide Arslan on Pexels

Three of Zurich's most prominent cultural institutions confirmed this week they have begun a joint technical sweep to identify and remove duplicate images from their shared digital catalogues, a project years in the making that has finally reached operational phase. The initiative, coordinated through a working group tied to ETH Zurich's library infrastructure, targets tens of thousands of redundant files stored across interconnected archive systems since the early digitisation push of the 2010s.

The timing matters. Swiss federal digitisation funding cycles reset in January 2027, and institutions that clean up their metadata and image records before the end of this calendar year stand to qualify for a new tranche of infrastructure grants. Sloppy databases with duplicated assets reduce a collection's assessed quality score under the federal evaluation framework, directly affecting how much money an institution receives.

What Happened This Week

Staff at the Zentralbibliothek Zürich, located on Zähringerplatz in the Altstadt, began running automated hash-comparison scripts across their photograph and illustration holdings on Monday, July 1. The tool flags images that are pixel-for-pixel identical or near-identical — scanned twice, uploaded from different departments, or inherited through collection mergers. By Thursday, the library's digital team had flagged more than 14,000 candidate duplicates from a holdings pool of roughly 380,000 digitised items, according to the project's internal progress tracker shared with partner institutions.

Simultaneously, the Museum Rietberg on Gablerstrasse in Enge launched its own parallel review. The museum's Asian and African art photography archive, which expanded significantly after a 2019 donation of historical expedition imagery, has long been suspected of carrying substantial duplication from the transfer process. A small team of two archivists is manually verifying the highest-confidence matches identified by the software before any file is permanently deleted — a safeguard built into the protocol to prevent accidental loss of variant versions that may carry independent scholarly value.

ETH Zürich's main library on Rämistrasse is the third institutional partner and is acting as the technical lead. The library has handled duplicate-detection projects before, most recently in 2023 when it cleared approximately 9,200 duplicate scientific diagram files from its open-access thesis repository. That earlier exercise reduced storage overhead and improved search result relevance for researchers querying the system.

Why Duplicates Accumulate and What It Costs

The problem is structural, not careless. Zurich's cultural institutions digitised collections in waves, often using different vendors, different file-naming conventions, and different metadata standards. When those collections were later merged into shared discovery platforms — as happened under the Swiss library consortium network IDS Zürich between 2015 and 2020 — duplicates from incompatible systems compounded. A single glass-plate photograph of the Grossmünster, for instance, might exist in six slightly different file versions across three institutions, each with partial metadata, none flagged as the canonical copy.

Storage is not free. Commercial cloud archive rates for cultural heritage institutions in Switzerland have risen sharply since 2022, and maintaining redundant high-resolution image files at 400 to 600 megabytes per scan adds up across hundreds of thousands of records. The federal evaluation metric used for grant scoring measures the ratio of unique, well-described records to total holdings — so duplicates drag the number down even if storage costs were irrelevant.

The three-institution working group expects to complete its first-pass review by September 30, ahead of the federal reporting deadline in October. After that, a human verification phase runs through November, with final deletion and catalogue reconciliation scheduled for December. Institutions that complete the full cycle will submit updated holdings data to the national Swiss portal Swisscollections before year-end, strengthening their case for the 2027 digitisation funding round. For anyone researching Zurich's visual history — at the Zentralbibliothek reading room, through the Museum Rietberg's online portal, or via ETH's open-access systems — cleaner search results and faster load times should follow sometime in early 2027.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily Zurich

This article was produced by the The Daily Zurich editorial desk and covers news in Zurich. See our editorial standards for how we use AI.

The Daily Zurich brief

The day's Zurich news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily Zurich and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to Zurich news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily Zurich and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily Zurich

More in News

Enjoyed this story? Get tomorrow's briefing free.