Zurich Digital Archives Push Forward on Duplicate Image Replacement This Week
ETH Zurich and city institutions are accelerating efforts to clean up redundant digital image libraries, with new automated tools now entering live testing.
ETH Zurich and city institutions are accelerating efforts to clean up redundant digital image libraries, with new automated tools now entering live testing.

A coordinated effort to identify and replace duplicate images across Zurich's major public digital archives moved into a new phase this week, with ETH Zurich's Scientific IT Services division confirming the rollout of automated deduplication software across three institutional repositories. The work, years in the making, now has a concrete operational timeline after testing began on Monday, June 30.
The push matters because digital storage costs across Swiss public institutions have climbed sharply. Redundant image files — identical or near-identical photographs stored multiple times under different filenames — account for a disproportionate share of that overhead. At ETH Zurich's image archive alone, preliminary internal audits conducted earlier this year identified tens of thousands of duplicate or near-duplicate entries across collections dating back to the 1990s digitisation projects. Replacing low-resolution duplicates with single high-quality masters frees storage, reduces retrieval confusion and cuts the licensing complications that arise when multiple copies of an image carry different metadata tags.
The new software, built on perceptual hashing algorithms, compares image fingerprints rather than file names or sizes, meaning it can catch duplicates even when images have been cropped, recompressed or renamed. ETH Zurich's Hönggerberg campus infrastructure team began live testing on a batch of roughly 80,000 images drawn from the university's publicly accessible research image database. Results from the first four days showed a provisional duplicate rate of around 14 percent in that sample — a figure that, if it holds across the full archive, would represent a significant reduction in stored data volume once replacements are completed.
The Stadt Zürich Stadtarchiv on Alfred-Escher-Strasse is running a parallel but smaller-scale pilot. Staff there have been manually verifying automated flags before any files are deleted or replaced, a precaution driven by the archive's mandate under the Stadtrats-approved records retention policy from 2021. The Stadtarchiv's digitisation unit is cross-referencing flagged duplicates against provenance records to ensure that no historically significant variant — an image taken from a slightly different angle on the same day, for instance — is lost in the cleanup.
Zentralbibliothek Zürich, on Zähringerplatz, is watching both pilots before committing its own collections to the same process. The library holds one of Switzerland's largest photographic heritage collections, and its digital librarians have expressed internal caution about fully automated replacement without human review checkpoints. No public-facing service changes at ZB are expected before late 2026.
Swiss data centre costs are not trivial. Commercial colocation in the Zurich region currently runs between CHF 800 and CHF 1,200 per rack unit per month depending on contract length and redundancy specifications, according to publicly available pricing from providers operating in the Glatttal corridor. For public institutions running archives on budget cycles approved by cantonal authorities, that calculus is direct: fewer duplicate files means smaller storage footprints and lower annual renewal costs.
There is also a standards dimension. The Swiss federal government's eCH-0160 standard for archival file formats, updated in 2024, recommends single-master storage principles for image assets. Institutions that want to align with federal interoperability requirements — relevant for grant eligibility and cross-institutional data sharing — have an administrative incentive to complete deduplication work before the next compliance review cycle, which runs in the first quarter of 2027.
For researchers and members of the public who regularly access Zurich's digital image collections, the practical upshot should eventually be faster search results and cleaner metadata — fewer instances of the same photograph appearing multiple times in a results page under slightly different titles. ETH Zurich's Scientific IT Services team has indicated that user-facing changes to the public search interface on the university's image portal will not go live until internal deduplication is at least 80 percent complete, a threshold they project reaching by December 2026 at the current processing rate. The Stadtarchiv pilot, being manually supervised, will take longer; staff there have not set a public completion date. Anyone with questions about specific collections can contact the Stadtarchiv's reading room at its Alfred-Escher-Strasse location during regular opening hours, Tuesday through Saturday.
How does this story make you feel?
Spread the word
About this article
Published by The Daily Zurich
Daily brief
Free, in your inbox before 7am. Weekdays.
More in News