Zurich's Duplicate Image Problem: The Numbers Driving a City-Wide Digital Overhaul
Municipal databases and cultural institutions are quietly drowning in redundant visual data — and the cost of doing nothing is climbing fast.
Municipal databases and cultural institutions are quietly drowning in redundant visual data — and the cost of doing nothing is climbing fast.

Zurich's public digital archives contain an estimated 340,000 duplicate image files spread across municipal servers, cultural institutions, and cantonal administrative databases — a figure that emerged from an internal audit completed by the city's Digitale Verwaltung unit in the first quarter of 2026. The sheer scale has forced a reckoning about what redundant data actually costs, and who pays for it.
The timing matters. Switzerland's federal government pushed through revised data governance guidelines under the Bundesgesetz über den Einsatz elektronischer Mittel framework in late 2025, requiring public bodies to demonstrate measurable data efficiency gains by December 2027. For Zurich, that deadline is no longer abstract. It has a number attached to it: CHF 2.3 million in projected annual savings if storage redundancy across city systems drops by 60 percent.
The problem is concentrated in a handful of institutions. The Stadtarchiv Zürich, housed on Neumarkt in the Altstadt district, holds digitised photograph collections going back to the 1880s. Archivists there have long known that donor submissions, scanning projects from different eras, and inter-institutional transfers have seeded the same images across multiple folder hierarchies. A single tram photograph from Bellevue, taken sometime in the 1930s, was found in eleven separate directories during a 2025 spot-check.
The Zentralbibliothek Zürich on Zähringerplatz faces a structurally similar problem in its digital periodicals and map collections. Internal estimates suggest roughly 18 percent of its digitised visual holdings are exact or near-exact duplicates, based on hash-matching tests run against a sample of 50,000 files last autumn. At current Tier-2 cloud storage pricing — approximately CHF 0.023 per gigabyte per month under the institution's SWITCHengines contract — the redundant storage alone runs to tens of thousands of francs annually before staff time is factored in.
ETH Zurich's main library on Rämistrasse is further along in addressing the issue. Its research data management team adopted a perceptual hashing protocol in 2024 that flags visually similar images even when file names and metadata differ — a distinction that matters enormously for scientific datasets where an image may have been re-exported at different resolutions. The ETH system reportedly flagged and resolved more than 22,000 duplicate or near-duplicate files in its first operational year.
Identifying duplicates is technically straightforward. Deciding which copy to keep — and proving that provenance metadata is correctly preserved on the retained file — is where institutions consistently get stuck. The Stadtarchiv's catalogue uses a Dublin Core metadata standard that predates several of its largest digitisation projects, meaning the original acquisition record and the digital file record are sometimes stored separately, with no automated link between them.
For practical purposes, this creates a legal complication under cantonal archiving law, which requires documented chains of custody for public records. Deleting a file that turns out to be the only copy carrying complete provenance data would constitute a records management breach. That risk has made administrators cautious — sometimes to the point of paralysis.
The city's digital governance team is now piloting a three-stage workflow: automated hash-based detection, human review of flagged clusters above a set similarity threshold, and a 90-day quarantine period before any deletion is confirmed. The pilot, running across two municipal departments through September 2026, is the first step toward a city-wide rollout that would eventually encompass the roughly 40 bodies subject to the new federal efficiency requirements.
For institutions holding public image collections, the immediate practical step is an audit against their own storage contracts — most SWITCHengines agreements allow burst storage reviews on request. The broader lesson from the numbers is blunt: 340,000 duplicate files did not accumulate overnight, and they will not disappear through good intentions. Structured detection tools, allocated staff time, and a legal framework for deletion are all required, and Zurich's December 2027 deadline is now close enough to count in months, not years.
How does this story make you feel?
Spread the word
About this article
Published by The Daily Zurich
Daily brief
Free, in your inbox before 7am. Weekdays.
More in News