Zurich's public institutions collectively hold tens of millions of digitised images across archives, university repositories and municipal databases — and a significant share of that storage is occupied by the same photograph saved multiple times under different file names. The scale of the duplication problem, long treated as a minor administrative nuisance, is now attracting serious attention from data managers and budget controllers across the city.
Digital asset management specialists working with Swiss public institutions estimate that duplicate or near-duplicate image files can account for between 20 and 35 percent of total image storage in large, multi-department archives. For an institution maintaining several petabytes of data, that proportion translates directly into wasted infrastructure spend and slower retrieval times for researchers.
Why the Problem Has Grown Worse Since 2020
The acceleration of remote work from 2020 onward pushed many Zurich institutions to digitise faster than their data governance frameworks could handle. Staff uploading images from home devices, often through improvised file-sharing workflows, created parallel naming conventions and redundant folder structures that proved difficult to reconcile later. The consolidation pressures that followed the UBS takeover of Credit Suisse in 2023 added another layer: two large financial institutions, each with its own marketing and communications image libraries running to hundreds of thousands of files, had to merge those assets under unified governance for the first time.
At ETH Zurich, which consistently ranks among the world's top ten universities for research output, the challenge is particularly acute. The institution's communications and research departments maintain separate image databases that were not originally designed to talk to each other. ETH's IT Services division has been piloting automated deduplication tools since early 2025, working through the university's main Hönggerberg campus server infrastructure. The pilot covers image assets linked to research publications dating back to 2010.
The city's own Stadtarchiv, housed near the Neumarkt in the Altstadt district, has been grappling with a related issue as it migrates historical photographic collections from legacy storage formats. Archivists there are working through approximately 1.2 million scanned images from the pre-digital municipal photography collection, a process that has exposed extensive overlap between departmental donations made at different points over two decades.
The Cost in Francs and Hours
Cloud storage pricing in Switzerland runs significantly higher than in neighbouring Germany or France, partly because of domestic data-residency requirements under Swiss law. Enterprise-tier storage on Swiss-hosted infrastructure can cost between CHF 0.04 and CHF 0.08 per gigabyte per month depending on contract terms, according to publicly available pricing from providers operating under Swiss data sovereignty frameworks. For an archive sitting on even a modest 500 terabytes of image data, eliminating a 25 percent duplication rate would free roughly 125 terabytes — a saving that compounds over annual contract cycles.
The human cost matters too. Researchers at institutions like the Zentralbibliothek Zürich on Zähringerplatz report spending measurable time cross-referencing image results when catalogue searches return multiple visually identical files with conflicting metadata. A 2024 internal review at a comparable European national library — the Bibliothèque nationale de France — found that reference staff spent an average of 11 minutes per complex image query resolving duplicate results, a figure Swiss librarians cite as a reasonable benchmark when making the case internally for deduplication investment.
Automated perceptual hashing tools, which generate a short fingerprint from an image's visual content rather than its file name or metadata, have become the standard technical solution. These tools can flag near-duplicates even when one copy has been cropped, recoloured or saved in a different file format. Several Zurich institutions are evaluating open-source implementations alongside commercial platforms, with procurement decisions expected before the end of the 2026 budget year.
For institutions yet to start, data managers recommend a staged approach: audit total image volume first, run a sample deduplication pass on a single department's holdings, and use that sample to model storage savings before committing to a full rollout. The arithmetic, most find, makes the case on its own.