Zurich's municipal digital infrastructure is sitting on a growing problem. Across databases maintained by the city's Stadtarchiv on Neumarkt and the cantonal planning directorate on Walchestrasse, duplicate image files have accumulated over years of overlapping digitisation projects, and specialists say the redundancy is no longer trivial. Conservative internal estimates circulated among city IT staff put the proportion of duplicate or near-duplicate image assets in some departmental repositories at roughly 30 percent, consuming server capacity and slowing retrieval for planners, historians and the public alike.
The timing matters because Zurich is in the middle of a CHF 40 million overhaul of its civic data infrastructure, approved by the Gemeinderat in late 2024 and scheduled for phased completion by the end of 2027. With that money now being spent and contracts with tech vendors already signed, archivists and data engineers say the window to embed systematic duplicate-detection into the new architecture is closing fast. Retrofitting the same capability after the system goes live would, according to presentations made to the city's Digitalisierungsrat earlier this year, cost significantly more.
ETH Zurich's computer vision group, based on the Hönggerberg campus, has been consulted informally on the technical standards appropriate for large-scale image deduplication. Researchers there have published work on perceptual hashing and approximate nearest-neighbour search — the two main algorithmic approaches for finding images that are identical or near-identical without manually inspecting each file. The university's involvement remains advisory rather than contractual, but people familiar with the discussions say the Stadtarchiv has been studying the group's methodology papers since early 2025.
What the Specialists Are Saying
The debate inside city administration is not really about whether to act, but about which standard to adopt and who should pay for it. The Stadtarchiv wants a centralised deduplication layer applied before images enter any departmental system. The cantonal building and planning office, whose aerial photography archive alone runs to several hundred thousand files dating back to the 1950s, has historically managed its own storage independently and is reluctant to cede control of its image pipeline to a central municipal authority.
Data specialists at the Zurich University of Applied Sciences — ZHAW — in Winterthur, which has a close working relationship with Zurich city departments on digital-governance projects, have argued in published position papers that the city should adopt an open-standard metadata framework as a first step, rather than committing to a single vendor's deduplication engine. Their position is that without common metadata, even sophisticated algorithms will flag legitimate variant images — different resolutions or colour profiles of the same underlying photograph — as duplicates and delete assets that have archival value.
That distinction matters practically. The Stadtarchiv holds visual documentation of Zurich's Langstrasse neighbourhood going back to the postwar period, including images that exist in multiple scanned versions because different digitisation projects used different equipment. A blunt deduplication sweep could delete what looks like a duplicate but is in fact the only high-resolution copy of a particular scan.
What Happens Next
The Digitalisierungsrat is expected to hear a formal proposal on image deduplication standards before the summer recess ends in mid-August 2026. If the council endorses a pilot framework, the Stadtarchiv has indicated it could run a test deduplication process on a defined subset of its holdings — likely the post-1990 urban planning photography collection — before the end of the year. That pilot would generate performance data to inform a city-wide policy decision expected in the first quarter of 2027.
For Zurich residents, the practical stakes are straightforward. The city's open-data portal on open.zh.ch, which serves journalists, researchers and citizens exercising their rights under cantonal transparency rules, has been criticised for slow image downloads caused in part by bloated, redundant storage. A successful deduplication programme would reduce that friction directly. Given that the CHF 40 million infrastructure budget is already committed and running, officials and experts broadly agree that the decision can no longer be deferred to the next budget cycle.