Zurich's public digital archives contain an estimated 40 percent redundant image files — duplicate scans, re-uploaded photographs, and mirror copies spread across incompatible systems — according to internal assessments circulated among city cultural institutions this spring. The figure represents not just a storage headache but a growing financial drain at a moment when municipal IT budgets are already under pressure.
The problem has sharpened into focus because several of Zurich's largest cultural bodies are currently mid-migration. The Stadtarchiv Zürich on Alfred-Escher-Strasse and the Zentralbibliothek at Zähringerplatz are both consolidating legacy digitisation projects started between 2015 and 2019 into unified cloud-based platforms. When cataloguers began the consolidation work in earnest in early 2026, they encountered a duplication rate far higher than anticipated.
What the Numbers Actually Show
The scale matters. A single high-resolution archival scan of a nineteenth-century city map can run to 800 megabytes. Multiply that by tens of thousands of duplicate files and the storage costs compound fast. Cloud storage for institutional archives in Switzerland currently runs at roughly CHF 0.022 per gigabyte per month through standard Swiss-hosted providers — a figure that sounds negligible until a collection holds two or three copies of every item across three separate legacy systems.
The Zentralbibliothek alone holds more than 3.5 million digitised items across its various collections, a number drawn from its own published collection statistics. If even a fraction of those items carry duplicates at the resolution archivists use, the redundant storage cost compounds into tens of thousands of francs annually — money that could fund additional cataloguing positions or public access terminals.
ETH Zurich's Data Archive group, based on the Hönggerberg campus, has been researching automated deduplication methods as part of a broader Swiss National Science Foundation-supported project examining digital preservation standards. Researchers there have noted that the problem is not unique to Zurich — comparable institutions in Basel and Bern face similar structural issues — but Zurich's density of overlapping cultural mandates, with municipal, cantonal and federal collections sometimes covering identical source material, makes the duplication rate particularly acute here.
Why Fixing It Is Harder Than It Sounds
Deduplication is not simply a matter of deleting identical files. Archivists must first confirm that two apparently identical image files are genuinely the same scan and not two separate photographs of the same object taken at different times or under different lighting conditions. That distinction matters legally and intellectually. Swiss federal archiving law, updated under the Archivierungsgesetz provisions that came into force in 2023, requires institutions to demonstrate provenance chains for every retained and every deleted item.
The Stadtarchiv has been piloting a perceptual hashing tool since January 2026 — software that generates a compact numerical fingerprint from image content rather than file metadata — to flag probable duplicates for human review. Early results from the pilot, covering roughly 120,000 image files, flagged approximately 28,000 candidate duplicates. Staff then manually confirmed around 19,500 of those as genuine redundancies. That is a 16 percent confirmed duplication rate within the pilot sample alone.
The financial logic for action is straightforward. Reducing confirmed duplicates across the Stadtarchiv's full digital holdings could cut storage costs by an estimated CHF 35,000 to CHF 60,000 over a five-year horizon, based on the pilot's extrapolated figures and current Swiss cloud-hosting rates. More importantly, leaner catalogues improve search performance for researchers using the archive's public portal — a practical benefit for the historians, journalists and genealogists who log thousands of searches against the system each month.
The Zentralbibliothek is expected to publish its own deduplication pilot results by the end of the third quarter of 2026. City archivists and library officials will then face a decision on whether to procure a shared deduplication infrastructure across both institutions — a procurement process that, under Swiss public tendering rules, would need to be advertised through the federal Simap platform if the contract value exceeds CHF 230,000. That threshold, and the timeline for any joint tender, will likely define how quickly Zurich gets its digital house in order.