Zurich's public institutions are sitting on tens of thousands of duplicate digital images — redundant files clogging servers, inflating storage costs, and undermining the reliability of civic and scientific archives. The issue, long treated as a housekeeping afterthought, is now forcing its way onto the agenda of institutions from the Stadtarchiv Zürich on Neumarkt to the image repositories maintained by ETH Zürich in Hönggerberg.
The timing is not accidental. After years of rapid digitisation drives — accelerated by the pandemic-era push to move collections online between 2020 and 2023 — administrators across Swiss cultural and research institutions are confronting the downstream consequences of bulk scanning and automated uploads. Files get ingested twice. Cataloguing errors duplicate entire batches. Metadata conflicts mean the same photograph appears under different accession numbers, sometimes in different collections.
What the Experts Are Saying
Technology specialists at ETH Zürich's Scientific IT Services division have been examining the problem as part of broader research data management initiatives. The institution, which consistently ranks among the world's top ten universities in engineering and technology, manages image repositories spanning decades of research photography, satellite data, and architectural documentation. Practitioners in the field describe the core challenge as a tripartite problem: detection, deletion, and governance — knowing which files are duplicates, deciding which version to keep, and establishing rules to prevent the problem recurring.
At the Zentralbibliothek Zürich on Zähringerplatz, which holds digitised collections covering everything from medieval manuscripts to 20th-century press photography, staff have been piloting hash-based deduplication tools since late 2024. The method generates a unique digital fingerprint for each image file; identical fingerprints flag exact duplicates for review. Near-duplicate detection — catching images that differ only by compression artefact or minor crop — requires more computationally intensive perceptual hashing algorithms, a distinction that matters enormously when the archive in question holds original photographic negatives of historical significance.
City administrators at Stadtliegenschaften Zürich, the municipal property and facilities office, face a more prosaic version of the same headache. Internal communications reviewed as part of a broader digital infrastructure review indicate that shared drives used by multiple departments for building documentation photography had accumulated significant redundancy by 2025, with some project folders containing three or more versions of the same site photograph at different compression levels. The cost of cloud storage for municipal operations is not trivial: enterprise cloud storage in Switzerland typically runs between CHF 0.02 and CHF 0.04 per gigabyte per month at scale, and large institutions can hold petabytes of image data.
Governance, Not Just Technology
Specialists in digital preservation are consistent on one point: the technical fix is the easier half of the problem. The harder work is institutional. Without clear upload protocols, defined single points of ingest, and staff training, automated deduplication tools clean the archive once — and the duplicates grow back within months.
The Swiss Federal Archives in Bern published updated digital preservation guidelines in March 2026, recommending that all federally funded institutions implement deduplication audits on a rolling 24-month cycle. Cantonal institutions in Zurich are not legally bound by that recommendation but are expected to align their practices with it, particularly those receiving federal digitisation grants under the Memoriav umbrella programme, which supports audiovisual and photographic heritage preservation across Switzerland.
For Zurich's major institutions, the practical next steps look similar regardless of scale: an initial audit to establish the current duplication rate, selection of an appropriate detection methodology, and a governance review to tighten ingest procedures. The Zentralbibliothek's pilot is expected to produce a public findings report by the end of the third quarter of 2026, which archivists at smaller cantonal institutions — including the Staatsarchiv des Kantons Zürich on Winterthurerstrasse — are watching closely before committing to their own tool procurement. The broader message from every specialist involved is straightforward: clean archives cost less to run, search faster, and are more trustworthy. That argument, at least, is not a hard sell in a city that takes its administrative reputation seriously.