The Daily Zurich

Zurich news, every day

News

Zurich's Digital Archives Are Drowning in Duplicate Images — and the Numbers Tell a Damming Story

From the Stadtarchiv to ETH Zurich's image libraries, the hidden cost of duplicate digital files is piling up in server space, staff hours, and taxpayer money.

By Zurich News Desk · Published 4 July 2026, 8:58 pm

3 min read

Zurich's Digital Archives Are Drowning in Duplicate Images — and the Numbers Tell a Damming Story
Photo: Photo by Fran Zaina on Pexels

Zurich's public institutions are sitting on millions of redundant digital image files, and the scale of the problem is only now becoming clear as archivists and IT departments start running systematic audits. Preliminary figures from a working group at ETH Zurich's IT Services division, presented to faculty heads in June 2026, found that image duplication rates across university research repositories ranged from 18 to 34 percent depending on the department — meaning roughly one in four stored files is a copy of something already catalogued elsewhere on the same network.

The issue is not unique to academia. Municipal administrators at the Stadtarchiv Zürich on Neumarkt have been grappling with the same structural problem since undertaking a digitisation push that accelerated sharply between 2020 and 2024. That campaign converted hundreds of thousands of physical documents and photographs into digital formats, often with multiple scan operators working from overlapping source material. The result: ballooning storage demands at a moment when the city's IT budget is already stretched by housing-data systems tied to the ongoing Wohnungsnot response.

What Duplication Actually Costs

Storage is cheap until it isn't. Enterprise-grade archival storage in Switzerland currently runs at roughly CHF 0.04 to CHF 0.07 per gigabyte per month for cold-tier solutions, according to pricing published by Swiss data centre operators in early 2026. That sounds trivial. But a mid-sized institutional archive holding 500 terabytes of image data — a realistic figure for a city the size of Zurich — faces monthly storage bills that climb well past CHF 20,000 once redundancy, backups, and compliance retention layers are factored in. Strip out the duplicate 25 percent, and the savings approach CHF 5,000 to CHF 6,000 monthly, or more than CHF 60,000 a year.

ETH Zurich's computer science department has been developing perceptual hashing tools — algorithms that identify near-identical images even when filenames, metadata, or compression levels differ — as part of a broader research initiative under its Distributed Systems Group in Rämistrasse. A pilot run of one such tool across three faculty image libraries in spring 2026 flagged more than 41,000 duplicate pairs within a dataset of approximately 190,000 files. Removing confirmed duplicates from that pilot reduced the total dataset size by 22 percent. The exercise took two staff members roughly six weeks of part-time processing work.

The Zentralbibliothek Zürich on Zähringerplatz faces a related but distinct challenge. Its digitised newspaper and periodical archive, built up incrementally since 2008, contains overlapping contributions from at least four separate digitisation contractors over the years. A 2025 internal review — details of which were presented at a library conference in Basel in November of that year — identified entire edition runs that had been scanned twice, sometimes at different resolutions, without either version being formally flagged as a duplicate in the catalogue system.

Pressure to Act Is Growing

The incentive to clean up these repositories is sharpening for several reasons. Switzerland's revised Data Protection Act, which came into full force in September 2023, placed new obligations on institutions holding personal data — including historical photographs that may depict identifiable individuals. Keeping duplicate files multiplies the compliance surface area: each redundant copy is technically a separate instance of potentially regulated material that must be accounted for in data inventories.

Cantons are also watching cloud migration costs closely. Zurich's cantonal government has a stated target of migrating 60 percent of public-sector IT workloads to hybrid cloud infrastructure by the end of 2027, a goal outlined in its 2024 digitalisation strategy. Bloated archives loaded with duplicate images make that transition measurably more expensive and slower.

For institutions planning their next fiscal year, the practical guidance from IT governance specialists has become consistent: run a duplication audit before committing to expanded storage contracts. Tools capable of processing 100,000 image files in under 48 hours are now widely available, several of them open-source. The cost of the audit — in staff time and licensing — almost always pays back within the first year of storage savings. In Zurich, where precision and fiscal discipline are institutional habits rather than aspirational goals, the data is already making the argument for itself.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily Zurich

This article was produced by the The Daily Zurich editorial desk and covers news in Zurich. See our editorial standards for how we use AI.

The Daily Zurich brief

The day's Zurich news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily Zurich and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to Zurich news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily Zurich and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily Zurich

More in News

Enjoyed this story? Get tomorrow's briefing free.