The Daily Zurich

Zurich news, every day

News

How Zurich's Digital Archives Got Buried Under Millions of Duplicate Images — and What Happened Next

From city hall scan projects to ETH research databases, a decade of unchecked digital duplication has created a storage and access crisis that administrators are only now beginning to untangle.

By Zurich News Desk · Published 4 July 2026, 9:26 pm

3 min read

How Zurich's Digital Archives Got Buried Under Millions of Duplicate Images — and What Happened Next
Photo: Photo by Mâide Arslan on Pexels

Zurich's public institutions are sitting on a problem that has been quietly compounding since roughly 2014: digital image archives bloated by duplicate files, redundant scans, and poorly catalogued visual assets that now consume significant server capacity and make reliable document retrieval increasingly difficult. The issue spans municipal offices on Stadthaus Quai, the cantonal archive on Winkelriedstrasse, and major research repositories at ETH Zurich, Switzerland's flagship technical university on Rämistrasse.

The timing matters. Switzerland's federal government has been pushing cantonal and municipal bodies toward unified digital-record standards under the eCH framework since 2020, with compliance deadlines tightening through 2026. Zurich, as the country's largest city by population, is a test case. If its archival infrastructure cannot pass interoperability audits, funding for further digitalisation projects — some of them tied to the broader Swiss eGovernment strategy — is at risk.

How the Duplication Problem Grew

The roots lie in a series of well-intentioned but poorly coordinated scan drives. Between 2015 and 2022, at least three separate digitisation programmes ran concurrently across Zurich's administrative landscape. Stadtarchiv Zürich, the city's official record-keeper on Neumarkt, ran its own scan-and-upload pipeline. ETH-Bibliothek, the university library that holds one of Central Europe's largest scientific image collections, operated separately. So did the Zentralbibliothek Zürich on Zähringerplatz, which digitised tens of thousands of historical photographs and maps.

Each institution used different metadata schemas, different file-naming conventions, and — critically — different deduplication protocols. In practice, this meant the same image could be ingested by two or three systems under different identifiers, appearing to database administrators as distinct assets. Interviews conducted for previous reporting on Swiss eGovernment spending suggested the problem was known internally as early as 2018 but was never formally escalated.

ETH Zurich's IT services division has publicly acknowledged in institutional reports that large-scale research datasets frequently contain duplication rates of 15 to 30 percent when aggregated across collaborative projects. For image-heavy collections — medical imaging data, remote-sensing satellite files, and historical cartography — those rates can run higher. Applied to Zurich's combined public-sector image holdings, which one 2023 cantonal technology audit estimated at more than 4.8 petabytes across all institutions, the redundancy is not a marginal inefficiency. It is a structural one.

What Deduplication Actually Requires

Fixing the problem is not simply a matter of running a script. Images that look identical to a human eye may carry different embedded metadata — scan dates, operator IDs, copyright flags — that make automatic deletion legally risky. The cantonal archive on Winkelriedstrasse, for instance, must comply with the Bundesgesetz über die Archivierung, the federal archiving law that governs retention obligations. Deleting a file that turns out to be the sole surviving copy of a document would breach that law, regardless of how many apparent duplicates existed alongside it.

The practical approach that archivists and IT administrators have been moving toward involves perceptual hashing — a technique that generates a fingerprint for each image based on visual content rather than file metadata — combined with human review at the final stage. Several Swiss cantonal bodies piloted this method in 2024. Zurich's own Stadtarchiv began a structured deduplication review in the first quarter of 2026, with a target completion window running through the end of the year.

For institutions like the Zentralbibliothek on Zähringerplatz, which serves both researchers and the public through its online portal, the practical stakes are immediate. Search results that return the same image four times under different catalogue numbers erode trust in the system and waste staff time on manual triage. The library's digitisation team has been working with a phased cleanup schedule since January 2026.

The broader lesson from Zurich's experience is straightforward: rapid digitalisation without enforced metadata standards generates technical debt that accumulates faster than any single institution can address alone. Cantonal coordination, a common eCH-compliant metadata profile, and shared deduplication tooling are the tools administrators are now assembling. Whether the timeline holds will become clear before the year is out.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily Zurich

This article was produced by the The Daily Zurich editorial desk and covers news in Zurich. See our editorial standards for how we use AI.

The Daily Zurich brief

The day's Zurich news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily Zurich and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to Zurich news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily Zurich and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily Zurich

More in News

Enjoyed this story? Get tomorrow's briefing free.