Zürich's public institutions are sitting on a quiet but costly problem. Duplicate images — identical or near-identical photographs stored multiple times across government servers, university archives, and cultural databases — are consuming terabytes of storage and complicating search and retrieval workflows. The issue has moved from an IT backroom complaint to a policy conversation, with archivists, data officers, and digital preservation specialists now pressing for coordinated solutions.
The timing matters. Switzerland's federal data strategy, updated in early 2025, places renewed pressure on cantons and cities to demonstrate efficient use of digital infrastructure. For Zürich, which manages one of the most densely networked urban digital ecosystems in the German-speaking world, redundant data is not merely untidy — it carries a measurable cost in energy, licensing, and staff hours.
Where the Problem Shows Up
ETH Zürich, ranked among the top ten universities globally, operates vast research image repositories spanning fields from materials science to urban planning satellite imagery. According to digital infrastructure discussions within Swiss academic circles, the challenge of managing duplicate visual data has grown sharply since research teams began collaborating across institutions and uploading assets to shared platforms without centralised deduplication protocols in place.
At the Stadtarchiv Zürich on Alfred-Escher-Strasse, archivists have long contended with digitisation backlogs. When physical documents and photographs are scanned across different projects — sometimes years apart — the same item can enter the digital system more than once, tagged with different metadata, making it harder to locate the authoritative version. The archive holds records stretching back centuries, and its digitisation effort, ongoing since the early 2000s, has not always been paired with systematic duplicate detection.
The Zentralbibliothek Zürich, which sits near the Predigerkirche in the Hochschulen district, faces a comparable situation in its photographic and map collections. Staff there have described the deduplication challenge as one that compounds annually: each new digitisation sprint, each partnership with an external institution, adds fresh risk of redundancy.
What the Experts Are Recommending
Digital preservation professionals and data scientists broadly agree on the technical remedies: perceptual hashing algorithms, which can identify visually identical or near-identical images even when file formats or resolution differ, combined with centralised asset management platforms. The question in Zürich, as elsewhere, is governance — who owns the deduplication mandate, who pays for implementation, and how institutions coordinate without surrendering autonomy over their collections.
The Swiss Federal Archives in Bern has piloted perceptual hash-based deduplication within its own holdings since 2023. Observers in the digital heritage sector regard that pilot as a reference point for cantonal and municipal institutions, including those in Zürich. The technology is not experimental; what remains unresolved is the institutional framework for rolling it out at scale across a city where dozens of separate bodies manage their own image assets.
Storage is not cheap. Enterprise-grade archival storage in Switzerland runs at significantly higher costs per terabyte than EU averages, partly due to energy and real estate prices. For a city institution maintaining compliance with Swiss data sovereignty requirements — meaning storage must remain on Swiss soil — every duplicated gigabyte carries a real budget line. Industry benchmarks suggest that unmanaged duplication in large institutional image archives can account for between 15 and 30 percent of total storage volume, though figures vary widely depending on digitisation history and intake controls.
What happens next depends in part on Zürich's ongoing Smart City strategy, which the city administration has developed through its Amt für Stadtentwicklung. That strategy includes digital infrastructure efficiency as a stated priority for the period through 2028. Whether image deduplication becomes a formal sub-programme — with dedicated budget and inter-institutional coordination — or remains a patchwork of individual departmental efforts is the central open question heading into the second half of 2026.
For institutions weighing next steps, specialists point to a practical starting point: a shared audit. Mapping which bodies hold what image assets, and where overlaps already exist, costs far less than storage remediation after the fact. The Stadtarchiv and Zentralbibliothek have both initiated internal reviews; the harder work of aligning those reviews across Zürich's broader public digital infrastructure is still ahead.