Zurich's public digital archives and research institutions are moving this week to implement stricter duplicate image replacement protocols, a shift driven by mounting pressure on storage infrastructure and a formal review completed at the end of June 2026. The change affects how thousands of digitised photographs, scanned documents and visual records are catalogued and maintained across the city's major repositories.
The timing matters. Switzerland's broader push to modernise public records — anchored in the revised Federal Act on Archiving, which updated provisions for digital asset management — has put every cantonal institution on notice to demonstrate efficient data stewardship. For Zurich, which runs multiple overlapping public image collections across city, cantonal and university networks, the duplicate problem has been an open secret for years. This week, that changes on paper at least.
What Happened This Week
On Tuesday, ETH Zurich's Scientific IT Services division confirmed it had completed a pilot deduplication sweep across one of its internal research image repositories, identifying redundant files that had accumulated through successive research project handovers since at least 2019. The university has not released specific figures publicly, but the pilot covered several terabytes of material, according to documentation circulated internally and referenced in a Wednesday briefing at the ETH main building on Rämistrasse 101.
The city's own Stadtarchiv Zürich, located on Neumarkt in the Altstadt, separately announced it had begun a technical audit of its digitised photographic collection — a project running in parallel with the cantonal archive on Winterthurerstrasse 170 in Oerlikon. Both institutions are applying a perceptual hashing method, a technique that identifies visually near-identical images even when file names or metadata differ. The audit is expected to conclude before the end of the third quarter of 2026.
Why does this matter to the average Zürich resident? Public archive storage costs are funded through the cantonal budget. Redundant files inflate those costs directly. Industry benchmarks suggest that large institutional image collections can carry duplicate rates of between 15 and 30 percent — meaning roughly one image in five may be an unnecessary copy consuming server space and, by extension, public money. The Stadtarchiv has not yet confirmed its own rate, but the audit was triggered in part by a 2025 internal report noting storage expenditure had risen faster than the volume of genuinely new acquisitions would justify.
The Broader Pressure on Zurich's Institutions
The deduplication push sits inside a wider conversation about digital infrastructure in a city still processing the financial and reputational weight of the UBS-Credit Suisse merger and its aftermath. Public institutions have faced consistent pressure to demonstrate lean operations. The Zurich cantonal government's 2026 efficiency review, launched in March, explicitly included digital asset management as a line item — a detail that gave archive administrators additional political cover to request the technical resources needed for a proper clean-up.
For researchers at ETH and at the Universität Zürich's main campus on Rämistrasse, the practical stakes are different but real. Duplicate images embedded in datasets used for machine learning and computer vision research can skew model training results. A dataset with 20 percent duplicate content does not simply waste storage — it can statistically distort outputs, a problem that has drawn attention in international computer science literature over the past three years.
The protocols being rolled out this week set a new baseline: any image ingested into a managed repository must be checked against a hash index before storage is allocated. Existing collections will be reviewed on a rolling schedule, with priority given to the largest and oldest holdings first. Staff at both the Stadtarchiv and the cantonal archive on Winterthurerstrasse are receiving updated guidance this month.
Residents or researchers who work regularly with the city's digital collections should expect some access interruptions to specific subcollections during the audit windows. The Stadtarchiv has posted updated access schedules on its website, and the ETH library system is directing queries to its central helpdesk at the Hauptbibliothek on Rämistrasse. The practical advice is straightforward: check before planning a research visit, and flag any suspected duplicate records through the standard submission form — both institutions have confirmed that user-reported duplicates will be fast-tracked into the current review cycle.