The Daily Zurich

Zurich news, every day

News

Zurich's Digital Archives Push Forward on Duplicate Image Problem—Here's Where It Stands This Week

City institutions and ETH Zurich researchers are grappling with a surge of duplicated visual data clogging cultural and scientific repositories, and a new coordination effort is trying to fix it.

By Zurich News Desk · Published 4 July 2026, 8:28 pm

3 min read

Zurich's Digital Archives Push Forward on Duplicate Image Problem—Here's Where It Stands This Week
Photo: Photo by Magda Ehlers on Pexels

Thousands of duplicate images are sitting inside Zurich's public digital archives, and the institutions responsible for managing them are finally moving to do something about it. This week, a working group drawn from ETH Zurich's library division, the Zentralbibliothek Zürich on Zähringerplatz, and Stadt Zürich's own digitisation programme held its third coordination session since May, focusing on automated detection tools that can flag redundant files before they get ingested into long-term storage.

The issue is not trivial. When photograph collections, scientific data sets and institutional records are digitised at scale, duplicate files accumulate fast—sometimes the same image scanned twice at different resolutions, sometimes metadata errors that register a single scan as multiple distinct objects. For archives already under pressure from budget constraints, every redundant file wastes server capacity, slows search interfaces and complicates the legal clearance process. For a city with Zurich's ambitions in digital public infrastructure, it matters that the problem is getting worse before it gets better.

What Changed This Week

The trigger for the latest push was a technical audit completed at the end of June by ETH Zurich's Scientific IT Services unit, based at the Hönggerberg campus. The audit examined roughly 1.2 million image files held across three connected institutional repositories and found that an estimated 8 to 12 percent of those files were either exact duplicates or near-duplicates differing only in compression or colour-profile settings. The figure is consistent with findings from comparable European digitisation projects, including one reported by the German National Library in 2024, which identified duplication rates above 9 percent in newly ingested photographic collections.

That audit result pushed the Zentralbibliothek, which holds city newspaper archives dating back to the eighteenth century, to accelerate its evaluation of perceptual hashing software—tools that generate a compact fingerprint for each image so that near-identical files can be matched even when pixel values differ slightly. A procurement decision is expected before the end of August 2026, with a budget envelope understood to be in the range of standard cantonal IT contracts, though no figure has been formally approved or announced publicly.

ETH Zurich's library has been running a parallel pilot since April using open-source detection tools on its historical map collection, which is stored and accessible at the main library building on Rämistrasse. Preliminary results from that pilot were shared with the working group this week, and staff there have been cataloguing which categories of material generate the highest duplication rates—hand-coloured maps scanned under different light conditions came up repeatedly as a problem category.

Why the Housing Crisis and Climate Agenda Add Urgency

There is a practical dimension that goes beyond library management. Zurich's acute housing shortage—Wohnungsnot—has forced municipal planners to rely more heavily on digitised architectural records and historical land-use imagery when evaluating development applications in built-up neighbourhoods like Albisrieden and Schwamendingen. Duplicated or misidentified image files in planning databases can slow those processes at exactly the moment when the city needs faster decision-making. The same pressure applies to climate-monitoring data collected through ETH Zurich's environmental sensing networks, where duplicate image records from drone surveys can distort analysis.

Swiss federal archiving standards, governed by the Bundesgesetz über die Archivierung, require institutions receiving federal funding to meet specified data-integrity benchmarks. A revision of associated technical guidelines is expected from Bern later this year, and Zurich institutions want their deduplication workflows in place before those updated standards take effect.

For researchers and civil servants who use these archives daily—pulling building permits, checking historical flood-plain images, verifying pharmaceutical trial documentation stored by university spin-offs near the Technopark on Technoparkstrasse—the practical advice for now is straightforward: flag suspected duplicates through each institution's existing catalogue feedback function rather than simply ignoring them. The working group has confirmed it is actively using those flags to calibrate detection thresholds. A public update on the deduplication project is scheduled for September 2026 at the Zentralbibliothek.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily Zurich

This article was produced by the The Daily Zurich editorial desk and covers news in Zurich. See our editorial standards for how we use AI.

The Daily Zurich brief

The day's Zurich news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily Zurich and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to Zurich news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily Zurich and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily Zurich

More in News

Enjoyed this story? Get tomorrow's briefing free.