Zurich's public institutions are sitting on a quietly expensive problem. Across municipal servers, university repositories and cantonal databases, duplicate image files now account for a measurable share of total digital storage consumption — and the bill is growing faster than most IT departments will publicly admit.
The issue is not unique to Switzerland, but local conditions make it particularly acute here. ETH Zurich, ranked among the world's top ten universities in the QS 2025 standings, generates enormous volumes of imaging data through research in materials science, medical diagnostics and satellite remote sensing. The university's IT Services division manages petabyte-scale storage infrastructure on the Hönggerberg campus. When a single research group uploads the same high-resolution microscopy scan in multiple formats — standard practice in collaborative workflows — the duplication compounds fast.
What the Storage Figures Actually Show
Industry benchmarks from enterprise data management studies consistently put duplicate file rates at between 25 and 40 percent of total unstructured data in large research and public-sector environments. Applied to an institution the scale of ETH Zurich, which reported over 20 petabytes of research data under management in recent infrastructure planning documents, even the lower end of that range represents five petabytes of redundant content — storage that must be purchased, cooled, backed up and maintained.
The City of Zurich's own digital transformation agenda, outlined under the Smart City Zurich programme coordinated from the Stadthaus on Stadthausquai, identifies data efficiency as a priority for the 2024–2030 planning cycle. The housing registry alone — critical context given Zurich's Wohnungsnot crisis, where vacancy rates have hovered below one percent for several consecutive years — processes tens of thousands of property images annually through the Amtshaus IV building on Lindenhofstrasse. Staff upload facade photographs, floor plan scans and inspection images, often multiple times across different administrative workflows, with no automated deduplication layer in place.
Stadtarchiv Zürich, located on Neumarkt, faces the same structural challenge. The archive has been digitising historical photograph collections for over a decade, and archivists working with analogue originals routinely produce multiple scans at different resolutions for different end uses. Without systematic hash-matching to flag identical or near-identical files, the same image can live in three or four folders simultaneously. Storage costs for municipal archives in Swiss cities typically run at CHF 80 to CHF 120 per terabyte per year once power, hardware refresh and staff time are factored in — meaning even modest duplication volumes translate into five-figure annual overheads that could be redirected elsewhere.
Deduplication Tools Exist — So Why the Lag?
The technology to fix this is not new. Hash-based deduplication, perceptual hashing for near-duplicate detection, and AI-assisted image clustering tools have been commercially available since at least 2015. Several of these systems are already embedded in enterprise storage platforms used by Swiss banks, including the combined UBS entity that absorbed Credit Suisse following the March 2023 emergency merger. Financial institutions adopted aggressive data deduplication partly because regulatory requirements under FINMA oversight make bloated, disorganised data stores a compliance liability, not merely a budget nuisance.
Public institutions have moved more slowly, constrained by procurement rules, fragmented IT governance and, in the case of Zurich's cantonal bodies, the direct-democracy budget process that requires major IT expenditures to survive public scrutiny. A deduplication project that saves CHF 200,000 in storage annually but costs CHF 350,000 to implement is a harder sell at a Gemeinderat session than it looks on a spreadsheet.
The practical path forward for Zurich's institutions involves three distinct steps: an initial audit to establish a baseline duplication rate across each major repository, a pilot deployment of open-source perceptual hashing tools on a bounded dataset — ETH's materials science image library would be a logical test case given its volume and internal accessibility — and a phased rollout tied to existing hardware refresh cycles to avoid stand-alone capital expenditure. Institutions that have run similar audits in comparable European research cities have reported recovering between 20 and 35 percent of active storage capacity within eighteen months. For Zurich, where server space and the energy to run it both carry a premium, that recovery rate is worth taking seriously before the next budget cycle opens in autumn 2026.