Zurich's public institutions are sitting on a growing problem: thousands of duplicate photographs, scans, and digital images clogging municipal and academic archives — and the people responsible for managing them are increasingly vocal about the cost.
The issue has sharpened in 2026 partly because of a city-wide digitisation push that accelerated after the 2020 pandemic and has not slowed since. The Stadtarchiv Zürich, located on Neumarkt in the Altstadt, has been a focal point. Archivists there have publicly discussed the challenge of ingesting large batches of donated photographic material — often scanned multiple times by different departments — without a unified deduplication protocol in place. The result is ballooning storage overhead and, more seriously, inconsistent cataloguing that can mislead researchers.
ETH Zürich's library services have raised similar concerns. The institution's image collections, used by researchers across Rämistrasse and beyond, reportedly contain substantial proportions of near-identical files generated during bulk scanning projects. Librarians and digital preservation specialists have flagged this at recent professional conferences as a systemic, not incidental, problem.
What the Experts Are Telling Administrators
Digital preservation professionals in Switzerland are converging on a few points. First, the problem is not unique to Zurich — institutions in Basel and Bern face comparable situations — but Zurich's scale makes it more acute. The city's Informatik-Dienste, the municipal IT directorate that oversees infrastructure for dozens of city departments, has been in consultation with archivists about deploying hash-based deduplication tools that compare image files at the binary level rather than by filename or metadata alone.
The stakes extend beyond storage costs. Swiss federal regulations under the Archivgesetz require that public records be preserved in a form that is authentic and traceable. Duplicate images with conflicting metadata tags can raise legal questions about which version constitutes the authoritative record — a concern that legal archivists at the Universität Zürich's rechtswissenschaftliche Fakultät have discussed in published guidance this year.
Costs are not trivial. Cloud and on-premise storage for large institutional archives in Switzerland runs, by industry estimates for 2025-2026, at roughly CHF 0.02 to CHF 0.05 per gigabyte per month for primary storage tiers. For an institution holding several hundred terabytes of image data — a plausible figure for a combined academic and municipal archive — duplicate content estimated at even 15 to 20 percent of the total represents a meaningful recurring expense.
Calls for a Shared City Standard
The loudest calls right now are for coordination. Specialists working with both the Zentralbibliothek Zürich, on Zähringerplatz, and with private donors who contribute historical photographs to the city's collections have argued that Zurich needs a binding technical standard for image ingest — one that runs automated duplicate checks at the point of upload rather than attempting retrospective cleaning of archives that already contain millions of files.
The Swiss Memory Institutions working group, which includes representatives from cantonal and municipal archives across German-speaking Switzerland, has a working document on deduplication standards circulating among members for comment this summer. A final version is expected before the end of 2026. Whether Zurich's Stadtarchiv will formally adopt whatever emerges as a city-wide mandate is a question that sits with the Stadtrat's Departement der Industriellen Betriebe and the Präsidialdepartement, which oversees cultural institutions.
For researchers, the practical advice from archivists is straightforward: when using image collections at any Zurich institution, verify the file's provenance metadata before citing it, and flag apparent duplicates to the catalogue team. The Stadtarchiv has an online contact form for exactly this purpose. Small as that step sounds, archivists say crowd-sourced error reports from users have historically been among the most reliable ways to identify problematic entries in large collections — a reminder that the solution to a technical problem often still runs through human attention.