Zurich's municipal digital infrastructure is sitting on a problem that has been building since at least 2018: tens of thousands of duplicate image files scattered across public archive systems, departmental servers and the city's open-data portal at data.stadt-zuerich.ch. The Stadt Zürich Informatik directorate confirmed in a published procurement notice earlier this year that it was seeking specialised deduplication tooling for its document management environment — a quiet administrative step that signals how far the problem has grown.
The issue matters now because the city is mid-way through a digitisation push tied to its Smart City Zürich strategy, which targets full interoperability of municipal data systems by 2028. Bloated image repositories slow indexing, inflate cloud storage contracts and, in some cases, return contradictory results when journalists, researchers or residents query public records. The timing is awkward: Zurich has positioned itself as a model for transparent urban governance, and messy back-end data undercuts that reputation.
A Problem Decades in the Making
The roots go back to the early 2000s, when individual departments at the Stadthaus on Stadthausquai began digitising paper records independently, with no common file-naming convention. Planning documents from the Amt für Städtebau, press photographs from Stadt Zürich Kommunikation, and infrastructure surveys from Tiefbauamt were all scanned by separate teams using different software. When the city consolidated platforms around 2014, those siloed archives were merged without a systematic deduplication pass. Files migrated, and duplicates migrated with them.
The problem compounded through two later waves of digitisation. The first came with the 2017 rollout of the city's unified content management system, when older departmental intranets were retired and their contents bulk-uploaded. The second came during the 2020-2021 remote-working period, when staff working from home routinely downloaded, edited and re-uploaded image assets, generating version-proliferation that the system logged as distinct files. By conservative internal estimates referenced in the 2025 procurement documents — which are publicly accessible through the Simap.ch platform — the duplication rate in certain image libraries exceeded 30 percent.
ETH Zürich's Data Management unit, based at Rämistrasse 101, has been tracking similar patterns in research data repositories and has noted in published methodology papers that image deduplication in large institutional archives typically requires a combination of perceptual hashing and metadata reconciliation. Neither technique is new, but deploying them at municipal scale, across legacy systems with inconsistent metadata schemas, is substantially more complex than the commercial off-the-shelf tools can handle without customisation.
What Comes Next for the City's Repositories
The Stadt Zürich Informatik tender, published in the first quarter of 2026, specifies a phased approach. Phase one, scheduled to run through the end of 2026, targets the open-data image portal and the press archive held at Stadtarchiv Zürich on Neumarkt. Phase two, planned for 2027, will address internal planning and infrastructure image stores. A third phase covering historical photographic collections — some dating to the late 19th century — has no confirmed start date yet.
For residents and researchers who regularly query the city's public data, the practical upshot is that search results from the Stadtarchiv portal should become more reliable once the first phase completes. Academics at the University of Zurich's Institute for Computational Linguistics, at Andreasstrasse 15, have flagged duplicate images as a recurring nuisance when building training datasets from public municipal sources — a complaint that has circulated in data-science circles for at least three years.
The cost of inaction is not purely administrative. Cloud storage contracts for municipal image data cost the city a material sum annually, and internal audits seen by this newspaper suggest that eliminating confirmed duplicates in just the press archive and open-data portal could reduce that specific storage footprint by roughly a quarter. The city has not published a firm figure for total storage spend on image assets. What is clear is that the procurement process is now underway, the first deadline is the end of this calendar year, and after two decades of accumulated digital clutter, Zurich is finally running the cleanup pass it probably should have run in 2014.