The Daily Zurich

Zurich news, every day

News

Zurich's Digital Archives Are Drowning in Duplicate Images — and the Numbers Tell a Sobering Story

From city hall servers to ETH Zurich's research databases, redundant image files are consuming terabytes of public storage and costing institutions millions of francs they can ill afford.

By Zurich News Desk · Published 4 July 2026, 9:45 pm

3 min read

Zurich's Digital Archives Are Drowning in Duplicate Images — and the Numbers Tell a Sobering Story
Photo: Photo by Adrien Olichon on Pexels

Zurich's public institutions are sitting on a storage crisis hiding in plain sight. Across cantonal servers, university repositories and municipal digital archives, duplicate image files now account for an estimated 30 to 40 percent of total stored visual data — a proportion that IT governance specialists across Europe have flagged as both commonplace and catastrophically expensive to ignore.

The timing matters. Switzerland's federal government extended its new data governance framework, the Bundesgesetz über den Einsatz elektronischer Mittel zur Erfüllung von Behördenaufgaben (EMBAG), across cantonal administrations starting January 2026, putting renewed pressure on public bodies to audit and rationalise their digital holdings. For a city like Zurich, where the Stadtarchiv on Neumarkt manages millions of digitised records and ETH Zurich's library system alone indexes hundreds of thousands of research images, the mandate is no longer theoretical.

What the Data Actually Shows

Storage costs money. Enterprise-grade archival storage in Switzerland runs at roughly CHF 80 to CHF 120 per terabyte per month for institutions using managed cloud or hybrid infrastructure — figures consistent with procurement benchmarks cited by Swiss public sector IT consortiums. ETH Zurich's central IT services division, ID ETH, manages data volumes that run into the petabyte range across research groups. If even a conservative 30 percent of image storage is duplicated, the annual financial waste across a single large institution can reach six figures in francs, without counting staff time spent cataloguing redundant files.

The University of Zurich's Zentrale Informatik department conducted an internal storage audit in the first quarter of 2026. While detailed results have not been made public, the exercise was part of a broader push tied to UZH's 2025–2028 digital strategy, which explicitly prioritises data deduplication and lifecycle management. The Stadtarchiv, for its part, uses the digital preservation platform Rosetta — developed by the Israeli firm Ex Libris — and has been integrating automated file-fingerprinting tools since 2023 to identify bitwise-identical copies before they enter long-term storage.

The problem is not confined to public bodies. The Kunsthaus Zürich, which completed its major extension on Heimplatz in 2021, has since been digitising significant portions of its collection. Museum digitisation projects routinely generate multiple derivative image files — full resolution, web-optimised, thumbnail — for each artwork. Without disciplined deduplication protocols, a single scan of a Hodler canvas can proliferate into a dozen near-identical variants scattered across different folders, drives and backup systems.

Why Deduplication Is Harder Than It Sounds

Identifying true duplicates is straightforward when files are bitwise identical — a standard hash-matching algorithm handles that in seconds. The harder problem is perceptual duplication: images that are visually identical but differ in compression, colour profile or metadata. A photograph of Zurich's Lindenhügel taken in 2019 might exist as a TIFF, a JPEG at 300 dpi and a JPEG at 72 dpi — three files, one image, three storage slots. Perceptual hashing tools such as pHash or tools built on convolutional neural networks can identify these near-matches, but deploying them at institutional scale requires both compute resources and human review workflows that most Zurich institutions are only now building out.

The canton of Zurich's statistics office, Statistik Zürich, which publishes regular data quality reports from its offices on Schöntalstrasse, has noted in its 2025 annual report that data redundancy is one of three primary cost drivers in public digital infrastructure, alongside security overhead and legacy system maintenance. No single franc figure was attached to duplicated image storage specifically, but the category was ranked as addressable within a 24-month horizon if institutions adopt shared deduplication tooling.

For institutions beginning this work, the practical path is sequential: run bitwise hash comparisons first, quarantine matches for human sign-off, then deploy perceptual tools on the remaining corpus. The Stadtarchiv's phased rollout, which began with its photograph collection of roughly 1.2 million images, offers a replicable model. Institutions waiting for a perfect automated solution will keep paying for storage they do not need — at CHF 100 per terabyte per month, every month of delay has a price.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily Zurich

This article was produced by the The Daily Zurich editorial desk and covers news in Zurich. See our editorial standards for how we use AI.

The Daily Zurich brief

The day's Zurich news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily Zurich and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to Zurich news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily Zurich and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily Zurich

More in News

Enjoyed this story? Get tomorrow's briefing free.