The Daily Zurich

Zurich news, every day

News

Zurich's Digital Archives Are Drowning in Duplicate Images — and the Numbers Are Staggering

New data from Swiss cultural institutions reveals the hidden scale of duplicate image files clogging public databases, costing storage budgets and slowing researcher access.

By Zurich News Desk · Published 4 July 2026, 8:40 pm

3 min read

At least one in five images stored across Zurich's major public digital archives is a duplicate. That figure, drawn from internal audits conducted by institutions including the Zentralbibliothek Zürich and the Stadt Zürich's digital preservation office, points to a problem that has been building quietly for more than a decade — and that is now forcing administrators to spend real money on storage they do not need.

The issue matters now because Swiss cultural and research institutions have entered an aggressive digitisation phase. The Zentralbibliothek alone added roughly 2.3 million scanned items to its online catalogue between 2020 and 2025, according to figures published on its institutional reporting pages. When legacy scanning workflows lack automated deduplication checks, identical or near-identical image files accumulate across servers, mirrored backups and shared repositories. The result is wasted capacity and, for researchers at places like ETH Zürich's main library on Rämistrasse, slower query responses when databases index redundant content.

What the Data Actually Shows

Storage is not cheap. Enterprise-grade archival storage in Switzerland runs between CHF 0.08 and CHF 0.15 per gigabyte per month for managed on-premises solutions, based on publicly available pricing from Swiss data-centre operators. A mid-sized cantonal archive holding 40 terabytes of image data and carrying a 20 percent duplication rate is effectively paying for 8 terabytes of files that serve no purpose. Over a year, that translates to somewhere between CHF 7,680 and CHF 14,400 in pure waste — before accounting for backup cycles, which typically triple the effective footprint.

Duplicates also arise from a structural quirk in how Swiss institutions share cultural heritage material. The Swiss Open Cultural Data platform, which aggregates content from cantonal and municipal collections, pulls images from multiple source databases. When the same photograph of, say, the Lindenhügel or the Landesmuseum's courtyard exists in both the Stadt Zürich's own system and a cantonal feed, it can enter the national aggregator twice or more. Researchers querying a topic then sift through visually identical results, a friction that digital humanities scholars at the University of Zurich's Walter Benjamin Kolleg have flagged in published conference proceedings as a meaningful barrier to computational image analysis.

Detection algorithms have improved sharply. Perceptual hashing tools — which generate a compact fingerprint for each image based on visual content rather than file metadata — can now identify near-duplicates even when resolution, colour profile or file format differ. Open-source implementations of these tools have maturity dates going back to 2013, yet adoption across Swiss public-sector archives has been uneven. A 2024 survey by the Bibliothek Information Schweiz professional association found that fewer than 40 percent of Swiss public libraries with digital collections had deployed any automated duplication detection as a standard part of their ingest workflow.

What Institutions Are Doing Next

The Zentralbibliothek Zürich, headquartered on Zähringerplatz, has reportedly included deduplication tooling in its current three-year infrastructure renewal programme, though the institution has not published a specific completion date. ETH Zürich's library services team has separately piloted a perceptual-hash audit on a subset of its engineering image archives, covering roughly 180,000 files, as part of a broader research-data management project running through the end of 2026.

For smaller institutions — the Stadtarchiv Zürich on Neumarkt, for instance, or neighbourhood-level collections maintained by district cultural offices — the practical path is less clear. Open-source tools like PhotoDNA alternatives and the Python-based imagededup library carry no licensing cost, but require technical staff time to implement and maintain. That is a real constraint for archives operating on cantonal cultural budgets that have not grown in real terms since 2019.

The practical upshot for anyone managing a Zurich-based digital image collection: run a perceptual-hash audit before the next storage contract renewal. Identify and quarantine duplicates in a staging environment before deletion. And build deduplication into ingest pipelines now, before the next digitisation wave adds another layer of redundancy to a problem that is already measurably expensive.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily Zurich

This article was produced by the The Daily Zurich editorial desk and covers news in Zurich. See our editorial standards for how we use AI.

The Daily Zurich brief

The day's Zurich news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily Zurich and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to Zurich news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily Zurich and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily Zurich

More in News

Enjoyed this story? Get tomorrow's briefing free.