Zurich's major cultural repositories are confronting a concrete bottleneck this week: tens of thousands of duplicate digital images clogging databases maintained by institutions from the Schweizerisches Nationalmuseum on Museumstrasse to the Zentralbibliothek Zürich on Zähringerplatz. The problem, long acknowledged in archival circles, has reached a point where routine digitisation workflows are stalling, according to internal review documents circulating among the city's cultural sector this month.
The timing matters. Switzerland's Federal Office of Culture extended its Digitalisierungsprogramm funding cycle in early 2026, committing resources through 2028. Institutions are now under pressure to demonstrate clean, deduplicated datasets before their next reporting deadline in September. Submitting collections riddled with redundant files risks triggering clawback provisions embedded in the grant agreements, which set minimum data-quality thresholds.
The Scale of the Problem in Zurich
At the Zentralbibliothek, digitisation staff have been working since January on a photo collection covering Zurich's industrial expansion between 1880 and 1940. The library holds more than 1.2 million digitised items across all formats — a figure the institution has cited publicly in its annual reports. Within that corpus, preliminary automated scans conducted this spring identified a duplicate-image rate of roughly 8 percent in certain photographic subcollections, meaning thousands of near-identical files consuming server space and distorting search results on the public-facing portal.
ETH Zurich's Data Management unit, based on the Hönggerberg campus, has been developing open-source tooling to address exactly this kind of problem. Their image-fingerprinting pipeline, which uses perceptual hashing to flag visually identical or near-identical files regardless of filename or metadata, has been tested internally since 2025. This week, a working group convened at the ETH main building on Rämistrasse to discuss extending access to that tooling to partner institutions across the canton. The meeting included representatives from the Stadtarchiv Zürich on Neumarkt and from Museum Rietberg in Rieterpark, both of which maintain large digitised collections with known duplication issues.
The practical stakes go beyond storage costs. When duplicate images appear in public catalogue systems, researchers waste time, automated rights-clearance tools misfire, and cross-institutional data-sharing agreements become harder to honour. Several Swiss cantons are party to a data-sharing compact signed in Bern in March 2025 that requires participating archives to meet deduplication standards before contributing records to a national discovery platform planned for launch in 2027.
What Happens Next for Zurich's Archives
The ETH working group is expected to publish a technical recommendation document by the end of July. If adopted, Zurich institutions would migrate to a shared deduplication workflow before the September reporting window closes. Staff training sessions are being pencilled in for late August at the Zentralbibliothek's seminar rooms.
The Stadtarchiv Zürich is running a parallel pilot using a commercial deduplication platform licensed through a consortium arrangement. That trial began on 1 June and runs for 90 days, meaning results will land in late August — just in time to inform the September submissions. Storage costs in professional-grade cloud archiving have fallen significantly in recent years, but the real expense remains staff time spent reviewing flagged duplicates that automated tools cannot resolve without human judgment.
For Zurich residents with a practical stake — local historians, genealogy researchers, and students at institutions like the Universität Zürich — the short-term effect is a temporary slowdown in new material appearing on public portals. The Zentralbibliothek's online catalogue, accessible through its Zähringerplatz reading rooms, is currently displaying a notice that certain photographic collections are under review. The library has not specified a date for when those collections will be fully reopened to digital search.
The broader lesson Zurich's institutions are drawing from this week's developments is straightforward: deduplication cannot be treated as a final step bolted onto the end of a digitisation project. It needs to be built into the workflow from the first scan. Getting that right before the 2027 national platform launch is now the central goal.