Zurich's city administration confirmed this spring that its digitisation office has flagged more than 340,000 duplicate image files across the municipal photo archive, a sprawling collection that grew rapidly after a 2019 scanning drive pushed tens of thousands of analogue prints online. The deduplication process, handled partly through ETH Zurich's Visual Computing Lab in Hönggerberg, is expected to take until at least late 2027 to complete.
The issue matters now because Zurich is mid-way through a broader open-data push. The city's opendata.swiss portal, which publishes municipal datasets for public use, has been expanding its image holdings since 2022. Duplicate records inflate storage costs, degrade search results, and — critically in a city where the Stadtrat approved a digital governance charter in March 2025 — undermine the transparency goals that charter was meant to guarantee. With housing data, infrastructure maps, and construction permits increasingly visualised through image-linked records, a cluttered archive creates downstream errors in planning tools used by offices from Altstetten to Witikon.
What Zurich Is Actually Doing
The Stadtarchiv Zürich on Neumarkt is the operational centre of the cleanup. Staff there are working through a three-phase review: automated hash-matching to catch exact duplicates, perceptual hashing to identify near-identical scans with slight colour or resolution differences, and a manual curatorial layer for images of historical significance. The ETH Visual Computing Lab partnership, formalised in a memorandum signed in January 2026, provides the perceptual-matching algorithm at no direct licensing cost — a meaningful saving given the Stadtarchiv's annual digitisation budget, which city documents put at roughly CHF 1.2 million for 2026.
The programme also feeds into Zürich's responsibilities under the Swiss Federal Act on Archiving, which sets retention and quality standards for public records. Cantons and municipalities that allow large-scale data redundancy risk compliance complications during federal audits, an incentive that the city's IT department, based at the Stadthaus on Stadthausquai, has cited internally as a driver of the timeline.
How Zurich Compares to Peer Cities
Amsterdam's Stadsarchief began a comparable deduplication exercise in 2023, initially focused on its pre-1950 photographic holdings. The Dutch institution has publicly reported removing approximately 180,000 duplicate files from a collection roughly half the size of Zurich's — suggesting a faster per-file clearance rate, though Amsterdam's archive is more homogeneous in format, which simplifies automated matching. Vienna's Wiener Stadt- und Landesarchiv launched a deduplication project in late 2024 tied to that city's Smart City Wien 2025 framework, but the programme is still in its first phase and has not published comparable throughput figures.
Berlin's Landesarchiv, by contrast, has taken a different approach entirely: it has deprioritised deduplication in favour of raw digitisation volume, reasoning that storage costs are falling fast enough to make cleanup less urgent than access. That argument has critics among archival professionals elsewhere in Europe who argue it defers a problem rather than solving it. Zurich's method — parallel digitisation and deduplication — costs more upfront but is designed to avoid the kind of compounding backlog Berlin now faces.
Storage is not trivial even at Swiss prices. Enterprise-grade archival storage in Switzerland runs at roughly CHF 0.03 to CHF 0.05 per gigabyte per month for municipal contracts, according to published procurement frameworks. A collection carrying 340,000 redundant high-resolution image files can represent several terabytes of avoidable monthly cost, an expense that compounds over the multi-year lifecycle of a public archive.
For residents or researchers who use the Stadtarchiv's public reading room on Neumarkt, or who pull images through the opendata.swiss API, the practical improvement should become visible by mid-2027: faster search returns, fewer duplicate hits when querying construction or neighbourhood history records, and cleaner links when the archive feeds into planning visualisations. The city's IT office has said it will publish a progress report each January until the project closes. The next one is due in January 2027.