Zurich's municipal archives, libraries and cultural institutions are sitting on tens of thousands of duplicate digital image files — identical or near-identical photographs, scans and graphics that have accumulated across their servers since the early 2010s, inflating storage costs, confusing researchers and quietly undermining the city's ambitions to build a coherent public digital heritage platform.
The issue is not a single dramatic failure. It is the compound result of roughly 15 years of digitisation work carried out independently by institutions that rarely talked to one another. Stadt Zürich Präsidialdepartement, which oversees cultural affairs, has been attempting since late 2024 to coordinate a remediation effort across the Stadtarchiv on Neumarkt, the Zentralbibliothek Zürich on Zähringerplatz and several smaller municipal collections. Getting them onto a shared technical standard has proved harder than anticipated.
How the Problem Grew
The root cause is straightforward: each institution built its own digitisation pipeline. When a photograph of, say, the Lindenhügel or a nineteenth-century view of the Limmatquai was scanned by two different departments — sometimes years apart, sometimes with different resolution settings — both files entered separate content-management systems with different metadata tags. Neither system flagged the redundancy. Over successive server migrations, duplicates multiplied.
ETH Zürich's Chair of Information Science has been studying this pattern in Swiss cultural collections since at least 2022, examining how metadata fragmentation during the digitisation boom of the 2010s created structural inefficiencies that now cost institutions measurable sums in storage, staff time and retrieval errors. The broader European context matters here: the EU's Europeana project, which aggregates digital cultural heritage from member-state institutions, began rejecting or flagging submissions from collections with high duplicate rates, creating reputational as well as technical pressure on Swiss contributors.
Zurich's housing crisis has dominated local political headlines for years, and it might seem strange to draw a comparison — but the dynamic is not entirely different. Both problems share a single root: the city grew fast, institutions made locally rational decisions under immediate pressure, and nobody was coordinating the long view. The Wohnungsnot crisis emerged from decades of insufficient planning integration across Kreise 3, 4 and 5. The duplicate-image problem emerged from insufficient data integration across the Stadtarchiv, the Zentralbibliothek and affiliated collections. In both cases, the cost of fixing the problem later has turned out to be substantially higher than preventing it would have been.
What Remediation Actually Involves
Removing duplicate images from a public archive is not as simple as running a delete command. Staff must verify that apparent duplicates are truly identical in content — not merely similar — and that neither version carries unique provenance metadata the other lacks. The Zentralbibliothek alone holds a digital collection that has grown to several million files. Even with automated perceptual hashing tools, which compare image fingerprints rather than raw pixel data, human review remains necessary for a significant proportion of flagged pairs.
The Stadtarchiv on Neumarkt is understood to be piloting a deduplication workflow that combines automated detection with a structured staff review queue, though the institution has not published a completion timeline. The Präsidialdepartement allocated funding in the 2025 budget cycle for digital infrastructure consolidation across city cultural institutions — a line item that covers both the deduplication effort and the longer-term goal of migrating collections onto a unified asset-management platform.
For researchers, students at ETH Zürich or the Universität Zürich on Rämistrasse, and members of the public using the city's online portals, the practical advice for now is straightforward: if you retrieve an image from one of the city's digital archives and cannot find the same record confirmed in a second source, treat the metadata with some caution. Cross-reference against the Bild+Ton collection where possible, and flag apparent duplicates through the Stadtarchiv's public feedback form. The institutions say they want that input. The cleanup will be faster if users help catch what the algorithms miss.