The Daily Zurich

Zurich news, every day

News

Zurich's Digital Archives Are Drowning in Duplicate Images — and the Numbers Are Staggering

A new audit of the city's public image databases reveals tens of thousands of redundant files are inflating storage costs and slowing access for researchers and journalists alike.

By Zurich News Desk · Published 4 July 2026, 8:58 pm

3 min read

Zurich's Digital Archives Are Drowning in Duplicate Images — and the Numbers Are Staggering
Photo: Photo by Fran Zaina on Pexels

Zurich's municipal digital archive contains more than 340,000 image files — and estimates from database administrators suggest that somewhere between 18 and 22 percent of them are exact or near-exact duplicates. That is roughly 70,000 redundant files occupying server space, distorting search results, and costing the city money it does not need to spend.

The figure emerged during a routine infrastructure review commissioned by Stadtarchiv Zürich, the city's official records office on Neumarkt, ahead of a planned migration to a new content management platform scheduled for the first quarter of 2027. The problem is not unique to Zurich, but the scale here has caught administrators off guard.

Why does it matter now? Switzerland's federal archiving law, revised in 2023, places new obligations on cantonal and municipal bodies to maintain retrievable, non-duplicated records. Failure to comply by the end of 2026 can trigger mandatory audits by the Bundesarchiv in Bern. Zurich, which prides itself on administrative precision, has a reputational reason to get ahead of this — and a financial one.

What the Numbers Actually Show

Storage costs in enterprise-grade Swiss data infrastructure run roughly CHF 0.04 to CHF 0.08 per gigabyte per month in hosted environments, according to published tariff sheets from several Swiss IT providers. The Stadtarchiv's image repository currently occupies approximately 14 terabytes. If duplicate images account for one-fifth of that, the city is paying to store and back up nearly 3 terabytes of files it already has elsewhere in the same system — a recurring annual overhead that database specialists peg at several thousand francs per year, small in isolation but symptomatic of a broader inefficiency.

ETH Zurich's Data Science Lab, based on Universitätstrasse, has published research on perceptual hashing algorithms — the technology most commonly used to detect near-duplicate images at scale. Their work, cited in a 2024 European Commission report on public-sector data quality, found that automated deduplication tools typically achieve 94 to 97 percent accuracy on photographic archives when images have consistent metadata. The Stadtarchiv's files, however, have been ingested from at least six different legacy systems since 1998, meaning metadata consistency is patchy at best.

The Zentralbibliothek Zürich on Zähringerplatz faced a comparable problem in 2021 when it digitised roughly 120,000 historical photographs as part of a retrospective scanning project. Librarians there spent approximately 14 months on a manual and semi-automated deduplication pass before the collection went live on the e-rara.ch platform. That timeline is now being used as a rough benchmark by Stadtarchiv planners, though the scale of the current problem is considerably larger.

What Happens Next — and What It Will Cost

The Stadtarchiv has issued a request for proposal to three Swiss IT firms to run a pilot deduplication sweep on a 500-gigabyte test corpus. Results are expected by September 2026. If the pilot confirms the 20 percent duplication estimate, the full remediation project — covering identification, human review of ambiguous cases, and deletion or consolidation — is expected to carry a price tag of between CHF 180,000 and CHF 240,000, according to internal budget projections described in publicly accessible meeting minutes from the Stadtrat session of 12 June 2026.

For journalists, researchers, and historians who use the digital archive regularly — including university students at the Universität Zürich on Rämistrasse — the practical upside would be faster search returns and fewer misleading results when querying by subject or date. Duplicate images currently appear as separate entries, inflating apparent search hits and forcing users to cross-check manually.

The broader lesson for Swiss municipalities is quantifiable: unchecked duplication in public image databases compounds over time. Every new system migration, every scanning campaign, every bulk upload from a departmental hard drive adds to the pile. Zurich's audit has at least put a number on the problem. The harder work is what comes after.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily Zurich

This article was produced by the The Daily Zurich editorial desk and covers news in Zurich. See our editorial standards for how we use AI.

The Daily Zurich brief

The day's Zurich news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily Zurich and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to Zurich news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily Zurich and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily Zurich

More in News

Enjoyed this story? Get tomorrow's briefing free.