Zurich's city administration confirmed this spring that its ongoing audit of the Stadtarchiv's digitised photograph holdings had identified more than 14,000 duplicate image records since the programme launched in January 2025. The archive, housed in the former Steinfels soap factory complex on Hardturmstrasse, has been working through a backlog of scanned materials stretching back to the 1980s. The duplicates — redundant scans, near-identical frames and wrongly catalogued copies — were quietly consuming server capacity and distorting search results for researchers, journalists and members of the public.
The timing matters. Switzerland's revised federal data protection law, in force since September 2023, introduced tighter obligations around accuracy and minimisation of stored personal data. Any image that depicts an identifiable individual and exists in multiple redundant copies now carries compounded legal exposure for public bodies. Municipal archivists across German-speaking Switzerland have been racing to comply, and Zurich, with its comparatively generous IT budget and a direct line to ETH Zurich's data science unit for technical support, moved earlier than most.
What Zurich Is Actually Doing
The core tool is a perceptual hashing system procured through Stadt Zürich's Informatik department and integrated with the existing Scope archive management software used by the Stadtarchiv. Each image is assigned a compact numerical fingerprint; near-matches above a set similarity threshold are flagged for human review rather than deleted automatically. That human-in-the-loop requirement was a deliberate policy choice, according to internal documentation published on the city's open-government portal in March 2026. The archive has allocated two full-time positions to the review queue through the end of 2026, at a combined annual cost the city has publicly stated is just under CHF 180,000.
The Stadtbibliothek Zürich, which manages a separate but overlapping digitised image collection covering postcards, maps and press photography, joined the deduplication programme in October 2025 under a shared-services agreement. Together the two institutions hold roughly 1.3 million digitised items, a figure cited in the city's 2025 annual digitalisation report. The integration meant that cross-institutional duplicates — the same historical press photograph held in both collections under different catalogue numbers — could be caught for the first time.
How Zurich Compares to Amsterdam, Vienna and Beyond
Amsterdam's Stadsarchief began a comparable deduplication exercise in mid-2024, targeting its roughly 750,000 digitised images. The Dutch institution opted for a fully automated deletion pipeline with a post-hoc audit trail rather than per-item human review, a faster but legally riskier approach under Dutch privacy rules. Vienna's Wiener Stadt- und Landesarchiv, managing a collection of around 900,000 digitised photographs, launched its programme in February 2026 and has already processed about a third of its holdings, according to a progress note the archive posted on its public website in May.
Where Zurich holds an edge is interoperability. The city's system feeds deduplicated metadata back into the Swiss national cultural heritage portal Linked Open Data Service, maintained by the Swiss National Library in Bern, meaning clean Zurich records propagate outward. Amsterdam and Vienna both feed into Europeana, the pan-European aggregator, but neither has yet built a live feedback loop that updates the aggregator when a duplicate is resolved locally. That gap matters practically: a researcher pulling results from Europeana today may still encounter duplicates that Zurich removed from its own system months ago.
Singapore's National Archives, often cited in digital preservation circles for its early investment in AI-assisted cataloguing, began image deduplication in 2022 and has processed its entire digitised collection — approximately 1.1 million images — according to figures published in its 2024 annual report. That benchmark suggests Zurich's two-year window to complete its audit is achievable but not leisurely.
For researchers using the Stadtarchiv's online portal, the practical advice is straightforward: search results have already improved for materials processed since January 2025, particularly for pre-1950 photography. Collections catalogued under the old Staatsarchiv reference system before digitisation are still being worked through and may still return redundant hits. The archive's reading room on Neumarkt remains open for in-person consultation on Tuesdays through Saturdays, and staff can flag suspected duplicates directly to the deduplication queue through a form on the archive's website — a small feedback mechanism that, so far, no comparable European city has thought to build.