Zurich's public digital infrastructure has a clutter problem. Thousands of duplicate image files — sometimes four or five copies of the same photograph — have accumulated across the city's networked archives, costing storage budget and slowing public access to records held by institutions from the Stadtarchiv Zürich on Alfred-Escher-Strasse to the Zentralbibliothek on Zähringerplatz. A structured replacement and deduplication programme, coordinated through the city's digital services office, is now in its second operational phase.
The problem did not emerge overnight. It is the product of roughly fifteen years of institutional digitisation drives that were well-intentioned but poorly synchronised. When individual departments — health, transport, planning — launched their own scanning projects in the late 2000s and early 2010s, they each built separate repositories with different metadata standards. Files migrated between systems without centralised logging. A photograph of the Lindenhügel taken in 1987 might exist in three departments' servers under three different filenames, with no automated flag to catch the overlap.
The Institutional Paper Trail
The Stadtarchiv Zürich began formally cataloguing the scope of the duplication in 2022, when a routine capacity audit found that an estimated 18 percent of image files across linked municipal repositories were redundant copies. That figure, drawn from an internal review cited in the city's 2023 annual digital governance report, translated into several terabytes of unnecessary storage load on servers managed by the city's IT-Dienste department. At Zurich's commercial cloud storage rates — which the city has publicly reported benchmarking against a cost of roughly CHF 0.023 per gigabyte per month — the cumulative overhead runs into tens of thousands of francs annually.
The Zentralbibliothek's digital collection presents a related but distinct version of the same issue. The library's digitisation partnership with ETH Zürich, formalised under a cooperation agreement signed in 2019, created shared access to image databases but initially lacked a unified deduplication protocol. Staff at both institutions were uploading high-resolution scans of historical Zurich street photography — much of it sourced from the same donor collections — without a real-time cross-check against what the other partner had already ingested. The result was parallel catalogues with significant overlap, particularly for material covering Zürich's Altstadt and the Langstrasse quarter.
Switzerland's broader shift toward the Linked Open Data framework, championed nationally by institutions including the Swiss Federal Archives in Bern, gave Zurich's effort a structural backbone. The Federal Archives' LINDAS data service, which went into broader public use from 2021, established interoperability standards that Zurich's municipal archivists have since been retrofitting their own systems to meet. That process — slow, granular, requiring human review of machine-flagged duplicates — is what the current clean-up programme formalises.
What the Clean-Up Involves
The replacement phase, which the city's digital services office began rolling out in January 2026, works in two steps. Automated scripts first identify files with matching hash values or near-identical pixel signatures across repositories. Human archivists then review flagged pairs to confirm which version carries the most complete metadata before the inferior copy is retired and replaced with a canonical reference link. The Stadtarchiv estimates the manual review stage alone will require approximately 2,400 staff hours spread across the calendar year.
For residents and researchers who use the city's public-facing image portals — including the online search interface at e-periodica.ch and the Bilddatenbank Zürich — the practical benefit will be faster search results and fewer broken or conflicting entries. Institutions in other Swiss cities, including Geneva's Archives d'État, have faced similar deduplication challenges and are watching Zurich's phased approach as a potential model.
The programme is scheduled for completion by December 2026. Archivists working on the project have indicated that a post-clean-up audit is already being planned for the first quarter of 2027, designed to test whether the new intake protocols — mandatory hash-checking before any image is ingested into a shared repository — are preventing the problem from rebuilding itself from scratch.