The Daily Zurich

Zurich news, every day

News

The Numbers Don't Lie: Zurich's Digital Archives Are Drowning in Duplicate Images

A new audit of municipal and institutional image databases reveals that duplicate files are wasting millions of francs in storage costs and slowing down the city's digital infrastructure.

By Zurich News Desk · Published 4 July 2026, 9:22 pm

3 min read

The Numbers Don't Lie: Zurich's Digital Archives Are Drowning in Duplicate Images
Photo: Robinson, Hastings, 1792?-1866 Hunter, John, of Bath Pears, Steuart Adolphus, 1815-1875 / Public domain (Wikimedia Commons)

More than 34 percent of all images stored across Zurich's public-sector digital repositories are duplicates — identical or near-identical files that collectively consume an estimated 2.1 petabytes of redundant storage, according to a technical audit completed in June 2026 by the city's Amt für Informatik. The finding is prompting an urgent rethink of how municipal bodies, from the Stadtarchiv on Alfred-Escher-Strasse to the media servers at Zürich Tourismus in Oberstrass, manage visual data.

The timing matters. Zurich's public institutions have been on an aggressive digitisation drive since 2022, scanning everything from 19th-century cadastral maps to contemporary planning documents ahead of the 2027 Stadtentwicklung master review. That push generated enormous volumes of image data fast — and without consistent deduplication protocols in place. The result is a sprawl of mirrored files that IT administrators are now being asked to clean up before the city signs new cloud-storage contracts later this year.

What the Audit Actually Found

The Amt für Informatik examined 14 separate institutional repositories in the audit, covering everything from the Bauarchiv holdings to photographic collections at the Stadtspital Triemli. Of roughly 18.7 million image files audited, 6.4 million were flagged as exact duplicates and a further 2.1 million as near-duplicates — files differing only in compression level, metadata timestamp or file format. Storage costs for redundant files alone are running at approximately CHF 1.4 million per year based on current contracts with Swiss data-centre operator Green, which operates facilities in the canton.

ETH Zurich's Data Science Lab, based on Rämistrasse, has been developing detection algorithms capable of identifying near-duplicate images even when they have been resized or colour-corrected — a problem that simple hash-matching tools miss entirely. The lab published a working paper in March 2026 estimating that institutions relying solely on hash-based deduplication catch only about 61 percent of true duplicates in large photographic archives. The remaining 39 percent require perceptual hashing or convolutional neural-network classifiers to surface.

For the UBS Heritage Collection — the bank absorbed enormous volumes of digitised documentation following the Credit Suisse merger in 2023 — the duplicate problem is particularly acute. Internal figures reported to the Swiss Financial Market Supervisory Authority FINMA as part of data-governance disclosures show that the merged entity was managing overlapping image repositories from at least seven legacy systems as recently as January 2026. Rationalising those holdings is now part of a broader CHF 480 million post-merger IT consolidation programme running through 2028.

The Cost of Doing Nothing

Beyond pure storage expense, duplicate images create downstream problems that compound quickly. Search latency in the Stadtarchiv's public-facing catalogue on Alfred-Escher-Strasse increased by 22 percent between 2023 and 2025 as the index ballooned with redundant entries, according to internal benchmarking data. Researchers using the Staatsarchiv des Kantons Zürich in Elgersburg reported similar frustrations, with catalogue queries that once returned in under three seconds now routinely taking eight to twelve seconds during peak hours.

The practical fix involves three stages. First, institutions must run a full perceptual-hash scan to establish a baseline — a process the Amt für Informatik estimates will take six to eight weeks per large repository. Second, a retention policy must define which version of a duplicate is authoritative; for the Stadtarchiv, that means preferring highest-resolution originals, typically TIFF files at 400 DPI or above. Third, automated deduplication must be embedded into ingest workflows so new files are checked on arrival rather than retrospectively.

The city's digital-governance unit plans to publish binding guidelines for all municipal image repositories by September 2026 and is in talks with ETH Zurich's Data Science Lab about licensing the perceptual-hashing tools developed on Rämistrasse. Institutions that fail to meet deduplication benchmarks by the end of Q1 2027 face having their storage allocations frozen — a pressure that, for archive managers working to expand public access ahead of the 2027 Stadtentwicklung review, will be hard to ignore.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily Zurich

This article was produced by the The Daily Zurich editorial desk and covers news in Zurich. See our editorial standards for how we use AI.

The Daily Zurich brief

The day's Zurich news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily Zurich and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to Zurich news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily Zurich and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily Zurich

More in News

Enjoyed this story? Get tomorrow's briefing free.