The City of Zurich's central media archive, managed under the Stadtarchiv on Alfred-Escher-Strasse, holds tens of thousands of photographs documenting everything from the 1990s tram expansion along Langstrasse to the 2005 flooding of the Sihl river. Archivists there have spent much of the past eighteen months doing something unglamorous but necessary: hunting down and removing duplicate image files that have quietly multiplied across shared servers for more than a decade.
The problem did not appear overnight. It is the accumulated consequence of how public institutions, universities, and private firms across the Zurich metropolitan area built their digital infrastructure through the 1990s and 2000s — in silos, without shared standards, and with almost no coordination on file naming or metadata. When systems were eventually merged or migrated, duplicates came along for the ride.
How the Duplication Crisis Took Root
The pattern is familiar to anyone who has worked in digital asset management. ETH Zurich, ranked among the world's top ten research universities, began digitising its photographic and scientific image collections in earnest after 2000. Multiple departments ran parallel digitisation projects with overlapping source material. By the time the university's IT services attempted a consolidated repository around 2015, internal audits identified thousands of near-identical files occupying redundant storage — different resolutions, slightly different crops, the same underlying image logged under separate accession numbers.
The same dynamic played out at Zentralbibliothek Zürich on Zähringerplatz, which holds one of the largest collections of historical Zurich photographs in the country. A migration project begun in 2018 to align its catalogue with the Swiss national heritage metadata standard ISAD(G) revealed that an estimated 12 to 15 percent of digitised image entries contained at least one exact or near-exact duplicate already present in the system. That figure, cited in the library's 2021 annual report, translated to thousands of wasted gigabytes and, more practically, to search results that returned the same photograph four or five times with no clear indication of which version was canonical.
The problem extends well beyond heritage institutions. Zurich's Kantonsspital on Rämistrasse, one of the largest hospital complexes in Switzerland, manages enormous volumes of medical imaging data. While clinical images are governed by strict DICOM standards that include some duplication safeguards, the administrative and communications photography stored on general-purpose servers has been subject to far less rigour. IT procurement teams at several city departments confirmed in written responses to budget inquiries in 2024 that redundant image storage was a contributing factor in unexpectedly high cloud storage costs — though precise figures were not disclosed publicly.
What Changed, and What Comes Next
The Swiss federal government's push toward a unified e-Government strategy, formalised in the 2020–2023 action plan under the Digitale Verwaltung Schweiz programme, created new pressure on cantonal and municipal bodies to audit and rationalise their digital holdings. For Zurich, that pressure arrived at the same moment as a broader reckoning with the city's housing and infrastructure budgets — Wohnungsnot has consumed political attention and fiscal space, leaving less tolerance for administrative inefficiency elsewhere.
Several institutions have turned to automated deduplication tools. The Stadtarchiv piloted software in 2024 capable of identifying perceptual duplicates — images that are visually identical but differ in file format, resolution, or compression — rather than relying solely on hash-matching, which only catches byte-for-byte copies. The distinction matters because most real-world duplicates are not perfect clones; they are the product of repeated re-scanning, re-exporting, or reformatting over years of system migrations.
The practical advice for smaller organisations in the canton — municipal libraries, school archives, NGOs operating out of Zürich West or Aussersihl — is straightforward: implement a controlled vocabulary for file naming before the next system migration, not after. The cost of retroactive deduplication at scale runs into tens of thousands of francs in staff time alone. Getting the metadata right at the point of ingest is far cheaper. The Stadtarchiv has indicated it will publish updated technical guidance for partner institutions later this year.