The Daily Zurich

Zurich news, every day

News

Zurich's Digital Archives Are Drowning in Duplicate Images — and the Numbers Tell a Costly Story

From city hall document servers to ETH Zurich's research repositories, redundant image files are consuming storage budgets and slowing down the institutions that can least afford it.

By Zurich News Desk · Published 4 July 2026, 8:36 pm

3 min read

Zurich's Digital Archives Are Drowning in Duplicate Images — and the Numbers Tell a Costly Story
Photo: Photo by Mâide Arslan on Pexels

Zurich's public institutions collectively store hundreds of thousands of digital images across fragmented server systems — and a growing share of that data is simply the same file saved twice, or ten times, under different names. The problem has a name in IT circles: duplicate image accumulation. The cost, measured in server capacity, staff hours, and energy consumption, is now drawing attention from procurement offices at the city level.

The issue matters now for a specific reason. Switzerland's federal data protection law revision, which came into force in September 2023, placed new obligations on public bodies to audit and rationalise the personal data they hold — including images. That compliance deadline has passed, but internal reviews triggered by it are still under way across Zurich's cantonal administration, and what those reviews are finding is making budget managers uncomfortable.

The Scale of the Problem in Zurich's Institutions

ETH Zurich, ranked among the top ten universities globally in the QS World University Rankings, operates research data repositories that collectively run into the petabyte range. A significant fraction of that storage is occupied by image datasets used in machine-learning and materials science research — fields where version control is notoriously inconsistent. Researchers routinely save multiple exports of the same source image at different resolutions or with minor colour corrections applied, each copy registered as a distinct file. Internal estimates from comparable European technical universities suggest duplicate and near-duplicate images can account for between 20 and 35 percent of total image storage in active research environments, though ETH Zurich has not published its own figure.

At the Stadtarchiv Zürich on Neumarkt, archivists have been digitising physical photograph collections since the late 1990s. The archive holds more than 1.5 million digitised images, according to figures available through the city's open data portal. Deduplication software has been applied to parts of the collection, but the archive's annual report has previously noted that legacy batch imports — particularly from the 2005-to-2012 period, when scanning was contracted out to third parties — introduced systematic duplication because contractors submitted both raw and post-processed versions of the same negative without flagging the relationship between files.

The financial dimension is not trivial. Enterprise cold storage in Switzerland runs at roughly CHF 0.02 to CHF 0.05 per gigabyte per month depending on the service tier and provider, according to published rate cards from Swiss data centre operators including NTT Ltd's Zurich facility in Glattpark, Opfikon. For an institution holding 100 terabytes of image data, a 25 percent duplicate rate represents 25 terabytes of redundant storage — translating to between CHF 6,000 and CHF 15,000 per year in direct costs, before factoring in backup cycles, which typically mirror the primary storage footprint and double the expense.

Deduplication Tools and What Zurich Is Doing About It

The Zürcher Kantonalbank, which overhauled its document management infrastructure following the broader Swiss banking sector's post-UBS-Credit Suisse merger compliance push, has publicly described investments in automated data hygiene tooling, though it has not disclosed specific deduplication metrics. The bank operates its main data processing centre in the city's Altstetten district.

Several Zurich-based firms working in the digital asset management space — including agencies concentrated around the Escher-Wyss-Platz tech corridor in Zürich-West — now offer perceptual hashing services specifically designed for image libraries. Unlike simple checksum matching, which catches only byte-for-byte identical files, perceptual hashing detects visually similar images even when file size or metadata differs. Pricing for such services for mid-sized institutional clients typically starts at CHF 800 per month for a managed software-as-a-service arrangement.

For any Zurich institution now working through a data audit, practitioners recommend establishing a baseline using open-source tools such as digiKam or rdfind before committing to a commercial platform. The Swiss Federal Archives in Bern published a practical guidance note on image deduplication methodology in February 2025, freely available through its website, that Zurich cantonal archivists have been circulating internally. The first step, according to that guidance, is always the same: count what you actually have before deciding how to cut it.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily Zurich

This article was produced by the The Daily Zurich editorial desk and covers news in Zurich. See our editorial standards for how we use AI.

The Daily Zurich brief

The day's Zurich news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily Zurich and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to Zurich news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily Zurich and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily Zurich

More in News

Enjoyed this story? Get tomorrow's briefing free.