The Numbers Don't Lie: Zurich's Digital Archives Are Drowning in Duplicate Images
New figures from Swiss institutional data managers reveal the staggering scale of redundant image files clogging public and private databases across the city.
New figures from Swiss institutional data managers reveal the staggering scale of redundant image files clogging public and private databases across the city.

Roughly one in three digital images stored across Zurich's major public institutions is a duplicate. That is the working estimate circulating among archivists and data managers at several city-affiliated bodies, including the Stadtarchiv Zürich on Alfred-Escher-Strasse and the Zentralbibliothek Zürich near Zähringerplatz. The problem has reached a point where administrators are actively budgeting for systematic deduplication projects — and the costs are significant.
The issue matters now because Zurich is mid-way through a broader digital infrastructure overhaul tied to the city's Smart City strategy, a programme with a planning horizon running to 2030. Storage costs, energy consumption from data centres, and the administrative burden of managing bloated file systems have all crept upward. With Swiss data centre electricity prices averaging around CHF 0.20 per kilowatt-hour — among the higher rates in Europe — every unnecessary terabyte of redundant imagery carries a measurable price tag.
Duplicate image replacement — the process of identifying, removing, and substituting redundant image files with single canonical versions — sounds routine. The numbers tell a different story. At a mid-sized public institution managing a photo archive of roughly 500,000 files, industry benchmarks suggest between 15 and 35 percent of stored images may be duplicates or near-duplicates. At the upper end of that range, that is 175,000 files doing nothing useful except consuming server space and confusing retrieval systems.
ETH Zürich's IT Services division, which manages one of the largest research data repositories in the German-speaking world, has publicly acknowledged that image deduplication is a standing item in its annual data governance reviews. The university stores petabytes of research imagery — from materials science microscopy to satellite remote-sensing data produced through partnerships along the Hönggerberg campus research cluster. Even a one-percent reduction in redundant files at that scale translates to savings measurable in tens of thousands of francs annually.
Private-sector pressure is equally acute. Following the UBS absorption of Credit Suisse, the merged institution inherited overlapping document and image management systems from two large organisations that had each spent decades accumulating digital assets. Rationalising those archives — headquartered in part from UBS's main offices near Bahnhofstrasse — is ongoing work that compliance officers have described in public regulatory filings as a multi-year integration task.
Commercial deduplication software licences for enterprise use typically run between CHF 8,000 and CHF 40,000 per year depending on archive size, according to vendor pricing sheets circulating in Swiss procurement channels. Manual review, where human archivists assess near-duplicate images that automated tools flag but cannot definitively resolve, adds substantially to that figure. A project scoped for a 200,000-image archive can require between 400 and 800 person-hours of skilled labour.
The payoff, however, is concrete. Storage consolidation reduces cloud or on-premises server provisioning costs. Retrieval times improve. For organisations like the Museum für Gestaltung Zürich in the Ausstellungsstrasse district — which holds extensive photographic and design-image collections — faster, cleaner archives directly affect how quickly curators and researchers can access material. A leaner image database also reduces the risk of licensing errors, where the same image is inadvertently purchased or credited twice because staff cannot tell it has already been acquired under a different filename.
The practical path forward for Zurich's institutions involves three stages that data managers broadly agree on: first, an automated hash-based scan to catch exact duplicates; second, a perceptual-similarity pass using AI-assisted tools to catch near-duplicates such as slightly cropped or recompressed versions of the same photograph; and third, human review of edge cases before any file is permanently deleted. Institutions that have completed even the first stage typically find they recover between eight and twelve percent of total storage capacity within weeks. For organisations facing Zurich's tight municipal IT budgets — the city's overall IT spending envelope for 2026 was set in the autumn budget session at around CHF 180 million — that kind of quick return matters.
The Stadtarchiv Zürich is expected to publish updated data governance guidelines later this year, with image deduplication protocols among the topics addressed. Smaller cultural organisations along the Langstrasse corridor and in the Kreis 4 district — many of which rely on shared cantonal storage infrastructure — are watching those guidelines closely before committing to their own clean-up projects.
How does this story make you feel?
Spread the word
About this article
Published by The Daily Zurich
Daily brief
Free, in your inbox before 7am. Weekdays.
More in News