Zurich's public institutions are sitting on a growing problem buried in their servers: thousands of duplicate digital images clogging archival systems, driving up infrastructure costs, and making records searches slower and less reliable. The issue has surfaced in discussions at the Stadtarchiv Zürich on Alfred-Escher-Strasse and at ETH Zurich's scientific data management units, where staff report that redundant image files — scanned documents, research photographs, urban planning maps — now account for a significant and measurable share of total storage consumption.
The timing is not accidental. Switzerland's revised federal data management guidelines, which took effect in January 2026, require cantonal and municipal institutions to demonstrate active data hygiene practices as a condition of federal co-financing for digital infrastructure. For Zurich, a city that has invested heavily in smart-city platforms and open government data portals over the past five years, the pressure to clean up duplicated content is now administrative as well as technical.
What the Institutions Are Saying
Officials at the Stadtarchiv have signalled that the archive's current deduplication protocols, last updated in 2021, are no longer adequate for the volume of material being ingested. The archive receives digitised records from across the city administration, including planning submissions routed through the Amt für Städtebau — the office responsible for building permits in districts from Wiedikon to Schwamendingen. When departments scan and re-scan the same planning documents at different stages of review, duplicate image files accumulate without any automatic flag being raised.
At ETH Zurich, where the library system manages research image data across faculties spanning the Zentrum campus and the Hönggerberg campus, data librarians have been piloting hash-based deduplication software since March 2026. Hash matching compares a unique digital fingerprint generated for each image file; if two files share an identical fingerprint, one copy is flagged for removal or consolidation. Early results from the pilot have not been made public, but the program is due to report to ETH's IT steering committee before the end of the third quarter.
The broader context matters here. Zurich's housing shortage — the Wohnungsnot crisis that has pushed average monthly rents for a three-room apartment in Kreis 3 above CHF 2,800 in recent surveys — has forced the Amt für Städtebau to process a record volume of variance applications and rezoning submissions since 2024. Each application generates its own document trail. When those documents are scanned multiple times, or uploaded by different departments in parallel, the duplication problem compounds. Storage is not free: enterprise-grade cloud storage contracted by the canton of Zurich runs at rates that independent IT procurement analysts have pegged at between CHF 0.02 and CHF 0.05 per gigabyte per month depending on redundancy tier, meaning even modest reductions in duplicate data translate into measurable budget savings over a multi-year horizon.
What Comes Next
The Swiss Federal Archives in Bern have been in contact with cantonal archivists across German-speaking Switzerland about standardising deduplication criteria, though no binding national standard has yet been published. For Zurich, the practical next step is expected to come through the city's Informatik-Dienste, the municipal IT department headquartered near Stadelhofen, which is reportedly preparing a tender for deduplication tooling to be issued before the end of 2026.
For ordinary residents and researchers, the most visible effect of a successful cleanup would be faster search results on the Stadt Zürich open data portal, which currently hosts tens of thousands of image assets tied to planning, culture, and urban history. Duplicate entries in that portal's image layer have been a recurring complaint among users of the GIS-Zürich mapping platform, where redundant aerial photographs from different survey years sometimes appear as separate, unlinked records.
The technical fixes are well understood. The harder question is governance: who decides which copy is the authoritative one, and who takes responsibility when a deletion turns out to have removed something that mattered. That question is now firmly on the table in offices along Alfred-Escher-Strasse and on the Hönggerberg. Getting an answer, sources familiar with Swiss federal data policy suggest, will require coordination that no single algorithm can substitute for.