Zurich's municipal digitisation effort hit a concrete milestone this week when the Stadtarchiv Zürich and ETH Bibliothek jointly announced a coordinated push to identify and replace thousands of duplicate images across their shared public databases. The problem — years in the making — has cluttered search results, inflated storage costs, and in some cases led to the same photograph being catalogued under two or three contradictory descriptions.
The timing matters. Both institutions are deep into a broader infrastructure modernisation driven partly by the Swiss federal government's 2024 Open Government Data strategy, which set a deadline of 2027 for all publicly funded archives to meet interoperability standards. Duplicate records are a direct obstacle to that goal. When the same image carries two different metadata tags, automated systems used by researchers, journalists and city planners cannot reliably tell which version is canonical.
What Happened This Week
On Tuesday, July 1, the ETH Bibliothek's digital collections team began running a new deduplication algorithm across its e-pics platform, which holds more than 1.2 million images spanning Swiss scientific, architectural and urban history. According to publicly available documentation on the e-pics portal, the system flags near-identical image hashes and routes them to a human reviewer rather than deleting automatically — a safeguard insisted upon after an earlier automated purge in 2021 accidentally removed contextually distinct versions of the same negative.
The Stadtarchiv, located at Neumarkt 4 in the Altstadt, is running a parallel process covering its collection of historical Zurich street photography, including extensive holdings from the Limmatquai and Niederdorf districts. Staff there are working through an estimated 14,000 flagged pairs by the end of August, cross-referencing originals against the city's existing metadata in the GEVER document management system.
The two institutions are not merging their collections, but they are, for the first time, using a shared taxonomy developed at ETH Zürich's Institute for Information Security and Privacy. That alignment was piloted in February 2026 and covers roughly 200 standardised descriptive fields — enough to let both archives speak the same language when a photo of, say, the Grossmünster tower appears in both databases with different crop dimensions and date annotations.
Why Duplicates Accumulate — and What They Cost
The duplication problem is not unique to Zurich, but the city's specific history accelerates it. Repeated digitisation drives since the early 2000s, each using different scanning resolutions and file-naming conventions, layered copies on top of copies. The 2015 migration to cloud storage at the Rechenzentrum Zürich Nord added another generation of imports without full deduplication. Cloud storage costs for the municipal archive run to a publicly budgeted CHF 380,000 annually, a figure the city controller's office flagged in its 2025 annual report as a candidate for reduction through data hygiene.
Researchers at the University of Zurich's Institute for Computational Linguistics have separately estimated — in a 2025 working paper available through the university's ZORA repository — that between 8 and 12 percent of images in Swiss public digital archives are functional duplicates. Applied to the ETH Bibliothek's 1.2 million-image collection, that implies somewhere between 96,000 and 144,000 redundant files. Storage aside, the real cost is human: every duplicate that surfaces in a search result is one more item a researcher has to manually evaluate.
The practical upshot for anyone using Zurich's digital collections in the near term: expect search results on the e-pics platform and the Stadtarchiv's online portal to look different by September. Some familiar image IDs will be retired and redirected to canonical records. The ETH Bibliothek has published a transition guide on its website advising users who have cited specific image permalinks in academic work to check for forwarding redirects before submitting final manuscripts. The Stadtarchiv is offering two drop-in consultation sessions at Neumarkt 4 — scheduled for July 16 and July 30 — for institutions and individuals with existing licensing agreements who need to verify their holdings are unaffected.