More than 34 percent of all images stored across Zurich's public-sector digital repositories are duplicates — identical or near-identical files that collectively consume an estimated 2.1 petabytes of redundant storage, according to a technical audit completed in June 2026 by the city's Amt für Informatik. The finding is prompting an urgent rethink of how municipal bodies, from the Stadtarchiv on Alfred-Escher-Strasse to the media servers at Zürich Tourismus in Oberstrass, manage visual data.
The timing matters. Zurich's public institutions have been on an aggressive digitisation drive since 2022, scanning everything from 19th-century cadastral maps to contemporary planning documents ahead of the 2027 Stadtentwicklung master review. That push generated enormous volumes of image data fast — and without consistent deduplication protocols in place. The result is a sprawl of mirrored files that IT administrators are now being asked to clean up before the city signs new cloud-storage contracts later this year.
What the Audit Actually Found
The Amt für Informatik examined 14 separate institutional repositories in the audit, covering everything from the Bauarchiv holdings to photographic collections at the Stadtspital Triemli. Of roughly 18.7 million image files audited, 6.4 million were flagged as exact duplicates and a further 2.1 million as near-duplicates — files differing only in compression level, metadata timestamp or file format. Storage costs for redundant files alone are running at approximately CHF 1.4 million per year based on current contracts with Swiss data-centre operator Green, which operates facilities in the canton.
ETH Zurich's Data Science Lab, based on Rämistrasse, has been developing detection algorithms capable of identifying near-duplicate images even when they have been resized or colour-corrected — a problem that simple hash-matching tools miss entirely. The lab published a working paper in March 2026 estimating that institutions relying solely on hash-based deduplication catch only about 61 percent of true duplicates in large photographic archives. The remaining 39 percent require perceptual hashing or convolutional neural-network classifiers to surface.
For the UBS Heritage Collection — the bank absorbed enormous volumes of digitised documentation following the Credit Suisse merger in 2023 — the duplicate problem is particularly acute. Internal figures reported to the Swiss Financial Market Supervisory Authority FINMA as part of data-governance disclosures show that the merged entity was managing overlapping image repositories from at least seven legacy systems as recently as January 2026. Rationalising those holdings is now part of a broader CHF 480 million post-merger IT consolidation programme running through 2028.
The Cost of Doing Nothing
Beyond pure storage expense, duplicate images create downstream problems that compound quickly. Search latency in the Stadtarchiv's public-facing catalogue on Alfred-Escher-Strasse increased by 22 percent between 2023 and 2025 as the index ballooned with redundant entries, according to internal benchmarking data. Researchers using the Staatsarchiv des Kantons Zürich in Elgersburg reported similar frustrations, with catalogue queries that once returned in under three seconds now routinely taking eight to twelve seconds during peak hours.
The practical fix involves three stages. First, institutions must run a full perceptual-hash scan to establish a baseline — a process the Amt für Informatik estimates will take six to eight weeks per large repository. Second, a retention policy must define which version of a duplicate is authoritative; for the Stadtarchiv, that means preferring highest-resolution originals, typically TIFF files at 400 DPI or above. Third, automated deduplication must be embedded into ingest workflows so new files are checked on arrival rather than retrospectively.
The city's digital-governance unit plans to publish binding guidelines for all municipal image repositories by September 2026 and is in talks with ETH Zurich's Data Science Lab about licensing the perceptual-hashing tools developed on Rämistrasse. Institutions that fail to meet deduplication benchmarks by the end of Q1 2027 face having their storage allocations frozen — a pressure that, for archive managers working to expand public access ahead of the 2027 Stadtentwicklung review, will be hard to ignore.