The Daily Zurich

Zurich news, every day

News

Zurich's Digital Archives Push Forward on Duplicate Image Cleanup — Here's What Changed This Week

A coordinated effort across Zurich's public institutions to eliminate redundant digital images is reshaping how the city manages its visual heritage and municipal data infrastructure.

By Zurich News Desk · Published 4 July 2026, 8:57 pm

3 min read

Zurich's Digital Archives Push Forward on Duplicate Image Cleanup — Here's What Changed This Week
Photo: Photo by Natalia Sevruk on Pexels

Switzerland's largest city moved decisively this week on a long-running problem in its digital archives: tens of thousands of duplicate images clogging databases maintained by institutions from the Zentralbibliothek Zürich on Zähringerplatz to the Stadtarchiv in Neumarkt. Technicians confirmed on Thursday that a new automated deduplication pipeline had completed its first full pass across the municipal image repository, flagging more than 34,000 redundant files for review.

The timing matters. Zurich has been building toward a unified open-data platform — part of the broader Smart City Zurich programme — and duplicate image records have been one of the most persistent obstacles to clean, searchable public datasets. Every redundant photograph or scanned document occupies server space, slows retrieval speeds, and can confuse automated cataloguing tools that assign metadata and copyright tags. With the city's IT directorate, Stadt Zürich Informatik, under pressure to consolidate legacy systems before a scheduled infrastructure migration in late 2026, the window to clean house is narrow.

What the Deduplication Drive Actually Involves

The process is less glamorous than it sounds. Staff at the Zentralbibliothek have been working since early June to cross-reference image hashes — essentially digital fingerprints — against holdings at the Stadtarchiv and the photographic collections managed by Zurich Tourism on Stampfenbachstrasse. Where identical hashes appear across two or more databases, the system flags the duplicate rather than deleting it automatically; a human archivist still reviews each case before any file is removed or merged.

That cautious approach reflects hard lessons. A 2023 pilot at the Stadtarchiv involving scanned maps of the Limmatquai district resulted in roughly 400 files being incorrectly marked as duplicates because two different scans of the same map — made decades apart and showing different states of conservation — shared near-identical pixel data. The error was caught before deletion, but it prompted a rule change: hashing alone is no longer sufficient for sign-off. Archivists now require a second-stage visual comparison on any file older than 1950.

ETH Zürich's computer science department has been an informal technical partner in the project. Researchers from the university's Data Analytics Lab on Rämistrasse have contributed open-source image-comparison tools adapted from academic work on visual similarity detection. No formal contract between ETH and the city has been announced, but the collaboration has been acknowledged publicly in Stadt Zürich Informatik documentation circulated this spring.

Storage Costs and the Practical Stakes

Municipal cloud storage in Switzerland is not cheap. Enterprise-grade data hosting under Swiss-jurisdiction contracts — a requirement for sensitive public records — runs at a substantial premium compared with international alternatives. Industry benchmarks suggest Swiss public-sector organisations pay between CHF 0.04 and CHF 0.07 per gigabyte per month for compliant archival storage, meaning a repository of several hundred terabytes carries an annual bill comfortably into six figures.

The 34,000 files flagged this week represent only the first pass. Officials expect the review process to continue through August, with a second automated pass targeting image collections ingested before 2015 — the period before the city standardised its file-naming conventions. At current review rates, archivists estimate the full cleanup will take until October at the earliest.

For Zurich residents and researchers, the practical benefit is a faster, more reliable public image search through the city's online portal, which logged more than 280,000 search queries in 2025. Institutions outside the municipal umbrella — private galleries in the Kreis 5 arts district, for instance, or the design schools clustered around Limmatstrasse — have been watching the project closely. Several have expressed informal interest in applying the same deduplication pipeline to their own digital collections, according to notes from a June workshop hosted by the Zentralbibliothek.

The next formal progress update is scheduled for September, when Stadt Zürich Informatik is expected to publish a technical report on the first phase. Archivists say that anyone with queries about specific collections — particularly photographs of the Altstadt or Zürich's industrial waterfront — should submit requests directly to the Stadtarchiv before the second automated pass begins, to flag any records they believe may have been mislabelled.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily Zurich

This article was produced by the The Daily Zurich editorial desk and covers news in Zurich. See our editorial standards for how we use AI.

The Daily Zurich brief

The day's Zurich news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily Zurich and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to Zurich news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily Zurich and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily Zurich

More in News

Enjoyed this story? Get tomorrow's briefing free.