Three of Zurich's most prominent cultural institutions confirmed this week they have begun a joint technical sweep to identify and remove duplicate images from their shared digital catalogues, a project years in the making that has finally reached operational phase. The initiative, coordinated through a working group tied to ETH Zurich's library infrastructure, targets tens of thousands of redundant files stored across interconnected archive systems since the early digitisation push of the 2010s.
The timing matters. Swiss federal digitisation funding cycles reset in January 2027, and institutions that clean up their metadata and image records before the end of this calendar year stand to qualify for a new tranche of infrastructure grants. Sloppy databases with duplicated assets reduce a collection's assessed quality score under the federal evaluation framework, directly affecting how much money an institution receives.
What Happened This Week
Staff at the Zentralbibliothek Zürich, located on Zähringerplatz in the Altstadt, began running automated hash-comparison scripts across their photograph and illustration holdings on Monday, July 1. The tool flags images that are pixel-for-pixel identical or near-identical — scanned twice, uploaded from different departments, or inherited through collection mergers. By Thursday, the library's digital team had flagged more than 14,000 candidate duplicates from a holdings pool of roughly 380,000 digitised items, according to the project's internal progress tracker shared with partner institutions.
Simultaneously, the Museum Rietberg on Gablerstrasse in Enge launched its own parallel review. The museum's Asian and African art photography archive, which expanded significantly after a 2019 donation of historical expedition imagery, has long been suspected of carrying substantial duplication from the transfer process. A small team of two archivists is manually verifying the highest-confidence matches identified by the software before any file is permanently deleted — a safeguard built into the protocol to prevent accidental loss of variant versions that may carry independent scholarly value.
ETH Zürich's main library on Rämistrasse is the third institutional partner and is acting as the technical lead. The library has handled duplicate-detection projects before, most recently in 2023 when it cleared approximately 9,200 duplicate scientific diagram files from its open-access thesis repository. That earlier exercise reduced storage overhead and improved search result relevance for researchers querying the system.
Why Duplicates Accumulate and What It Costs
The problem is structural, not careless. Zurich's cultural institutions digitised collections in waves, often using different vendors, different file-naming conventions, and different metadata standards. When those collections were later merged into shared discovery platforms — as happened under the Swiss library consortium network IDS Zürich between 2015 and 2020 — duplicates from incompatible systems compounded. A single glass-plate photograph of the Grossmünster, for instance, might exist in six slightly different file versions across three institutions, each with partial metadata, none flagged as the canonical copy.
Storage is not free. Commercial cloud archive rates for cultural heritage institutions in Switzerland have risen sharply since 2022, and maintaining redundant high-resolution image files at 400 to 600 megabytes per scan adds up across hundreds of thousands of records. The federal evaluation metric used for grant scoring measures the ratio of unique, well-described records to total holdings — so duplicates drag the number down even if storage costs were irrelevant.
The three-institution working group expects to complete its first-pass review by September 30, ahead of the federal reporting deadline in October. After that, a human verification phase runs through November, with final deletion and catalogue reconciliation scheduled for December. Institutions that complete the full cycle will submit updated holdings data to the national Swiss portal Swisscollections before year-end, strengthening their case for the 2027 digitisation funding round. For anyone researching Zurich's visual history — at the Zentralbibliothek reading room, through the Museum Rietberg's online portal, or via ETH's open-access systems — cleaner search results and faster load times should follow sometime in early 2027.