Zurich's public digital archives contain an estimated 30 to 40 percent redundant image data, according to internal assessments circulating among city IT departments this spring. That figure — quietly alarming to archivists and budget officers alike — has pushed the question of duplicate image management from a back-office nuisance to a genuine fiscal concern ahead of the city's autumn budget deliberations.
The timing matters because Zurich's digital infrastructure spending is under scrutiny. The city council approved a CHF 180 million IT modernisation package in late 2024, and department heads are now being pressed to demonstrate where savings can be found inside that envelope. Duplicate image data — photographs, scanned documents, and graphic assets stored in multiple locations simultaneously — represents one of the most tractable targets.
Where the Redundancy Lives
The problem clusters in a handful of institutions. Stadtarchiv Zürich, housed on Neumarkt in the Altstadt, manages millions of historical image records. Archivists there have been piloting deduplication software since February 2026, working through a backlog of scanned photographs and municipal planning documents that accumulated across two separate server migrations between 2018 and 2022. The migrations, intended to modernise storage, inadvertently produced parallel copies of entire folder structures that were never systematically reconciled.
ETH Zürich, ranked among the world's top ten universities and based on the Hönggerberg campus as well as its main building on Rämistrasse, faces a comparable issue at far greater scale. The university's research data repositories store petabytes of scientific imaging data — microscopy, satellite imagery, materials analysis — and a 2025 internal audit flagged that roughly 22 percent of image assets in the humanities and social sciences data centre were exact or near-exact duplicates. At CHF 0.023 per gigabyte per month for institutional cold storage, the costs compound quickly across terabyte-range datasets.
City-linked health institutions are also in the frame. UniversitätsSpital Zürich on Rämistrasse uses medical imaging archives that, by the nature of clinical workflow, routinely generate duplicate DICOM files — the standard format for radiology images. Radiologists often export copies to multiple internal systems for review, and without automated deduplication running at ingestion, those files persist indefinitely. Hospital IT administrators have described the problem as structural rather than exceptional, though they have not put a specific cost figure to it publicly.
What Deduplication Actually Costs — and Saves
The commercial deduplication software market offers tools ranging from CHF 8,000 to upwards of CHF 90,000 for enterprise licences, depending on the volume of data being processed and whether the tool includes near-duplicate detection — the harder problem of finding images that are visually identical but technically distinct files, perhaps saved at different resolutions or compression levels. Open-source alternatives exist but typically require significant staff time to configure and maintain.
Stadtarchiv Zürich's pilot, which ran from February through May 2026, processed approximately 1.2 million image files. Archivists identified and flagged around 340,000 duplicates for review — roughly 28 percent of the pilot dataset. Not all flagged files were deleted; some redundant copies serve as backup or provenance records. But the exercise freed an estimated 4.7 terabytes of primary storage and reduced catalogue query times by measurable margins.
For context, Swiss Federal Statistics Office data from 2024 showed that public sector digital storage costs in Switzerland grew by 14 percent year-on-year, driven partly by legacy data accumulation rather than new data generation. Deduplication is increasingly cited in federal IT guidance as a first-line efficiency measure before new storage infrastructure is procured.
The practical upshot for Zurich's institutions is straightforward: any department facing budget pressure in the autumn 2026 cycle has an incentive to complete a deduplication audit before October, when preliminary spending figures go to committee. Institutions that can demonstrate storage savings will be better placed to defend their broader IT requests. For Stadtarchiv and ETH alike, the numbers are making the case that cleaning up the past is cheaper than endlessly expanding capacity to hold it.