Zurich's flagship digital heritage initiative hit a significant milestone this week when Stadt Zürich's Department of Urban Development confirmed that a structured duplicate-image replacement programme, running across several municipal archives and libraries, had identified more than 340,000 redundant digital files in its first full audit cycle. The announcement, made Tuesday at the Stadtarchiv Zürich on Alfred-Escher-Strasse, marks the first time the city has put a concrete number on a problem archivists have been quietly managing for years.
The timing matters. Across Switzerland, public bodies are under pressure to reduce data-infrastructure costs after the federal government's renewed push for administrative efficiency ahead of next year's budget cycle. Zurich's archives are not alone: the Swiss Federal Archives in Bern flagged similar duplication issues in its 2025 annual report. But the city's scale — and the volume of material digitised during the Covid-era rush to bring collections online — makes the Zurich case particularly instructive.
How the Duplication Problem Grew
The root cause is straightforward. Between 2019 and 2023, at least four separate institutions — the Stadtarchiv, the Zentralbibliothek Zürich on Zähringerplatz, the Museum für Gestaltung in Aussersihl, and the Baugeschichtliches Archiv — ran independent digitisation drives, often scanning the same periodicals, maps and photograph collections without cross-checking against each other's holdings. Each institution stored files under different naming conventions, making automated deduplication difficult until now.
The new programme uses perceptual-hash matching software to compare image files across institutional databases, flagging near-identical copies regardless of filename or metadata. Files confirmed as exact or near-exact duplicates are not immediately deleted; they are tagged, logged, and a single canonical version is designated before the redundant copies are archived offline. The process is designed to protect against accidental loss — a concern raised repeatedly by historians who use the Zentralbibliothek's reading rooms on Zähringerplatz for newspaper research.
Storage is not a trivial expense. High-resolution archival images — typically scanned at 400 dpi or above to meet preservation standards — run to several megabytes each. At 340,000 duplicates, preliminary estimates suggest the city is carrying between 1.5 and 2 terabytes of directly redundant data, at a recurring annual infrastructure cost that one internal budget document cited in Tuesday's briefing put at roughly CHF 80,000 per year across affected systems. That figure is expected to fall substantially once the replacement cycle is complete, with a target completion date of the first quarter of 2027.
What Researchers and the Public Can Expect
For ordinary users of Zurich's digital portals — including the e-periodica platform and the city's own online photograph collection — the most immediate practical change will be improved search accuracy. Duplicate entries have long cluttered search results, particularly for historical images of Zürich-West and the old Industriequartier, where multiple institutions hold overlapping photographic records from the late nineteenth and early twentieth centuries.
ETH Zürich's Chair of Information Science has been involved in advising on the deduplication methodology, and the project is being tracked as a potential model for other Swiss cantons. Researchers working on urban history projects at the ETH main building on Rämistrasse have noted that cleaner metadata will directly improve the reliability of machine-learning tools being trained on archival image sets.
The programme is also relevant to the broader Swiss data-governance debate. Under the revised Federal Act on Data Protection, which came into force in September 2023, public bodies face stricter obligations around data accuracy and proportionality — storing multiple redundant copies of the same public-record image sits awkwardly with those principles, even if it is not a direct violation.
Residents and researchers who notice missing or altered catalogue entries in the city's online collections over the coming months should check the Stadtarchiv's published change log, updated fortnightly, before assuming material has been removed. The programme's coordinators say no publicly accessible image will be taken offline without a verified replacement entry in the canonical database — a process they expect to take until at least March 2027 to complete in full.