The Daily Zurich

Zurich news, every day

News

Zurich's Digital Archives Are Drowning in Duplicate Images — and Officials Say the Fix Is Overdue

From city hall to ETH Zurich, administrators and archivists are calling for coordinated action on a sprawling problem that wastes storage, distorts search results, and risks burying genuine historical records.

By Zurich News Desk · Published 4 July 2026, 8:44 pm

3 min read

Zurich's Digital Archives Are Drowning in Duplicate Images — and Officials Say the Fix Is Overdue
Photo: Photo by Mâide Arslan on Pexels

Zurich's public institutions are sitting on millions of redundant digital image files — duplicates stacked across servers at city departments, cantonal archives, and university libraries — and the people responsible for managing those collections say the situation has reached a breaking point. The problem, long treated as a low-priority IT headache, is now drawing attention from archivists, data managers, and civic technology specialists who argue that the unchecked accumulation is costing real money and undermining the reliability of public record systems.

The timing matters. Across Switzerland, the 2023 Federal Archives Act revision set new benchmarks for long-term digital preservation, pushing cantonal institutions to demonstrate that their holdings are clean, deduplicated, and retrievable. Zurich, as the country's most populous canton and home to two of its flagship research institutions, faces particular pressure to show it can meet those standards. Several institutions are now mid-way through digitisation drives that, without a deduplication strategy, are projected to compound the underlying problem rather than solve it.

What the Experts Are Saying

At ETH Zurich, whose library on Rämistrasse holds one of the largest scientific image repositories in the German-speaking world, digital preservation specialists have been publicly arguing since at least early 2025 that automated hash-based deduplication tools need to be embedded into intake workflows rather than applied retroactively. The distinction is significant: retrospective cleaning of a collection that already runs to tens of terabytes is orders of magnitude more expensive than preventing duplicates at the point of ingestion.

The University of Zurich's main library on Winterthurerstrasse has been piloting a content-aware scanning system since the spring semester of 2026, applying perceptual hashing — a technique that detects near-identical images even when file names or metadata differ — to its digitised photograph collections. Librarians involved in the project have described the early results in internal presentations as striking: preliminary scans of one digitised newspaper archive returned a duplication rate that, by their own accounting, exceeded 30 percent of total stored image files. That figure has not been published officially, and The Daily Zurich could not independently verify it, but it aligns with ranges cited in published European digital library research.

The city's own Stadtarchiv Zürich, housed near Neumarkt in the old town, is subject to the same pressures. The archive's digitisation budget for 2025-2026 was set at CHF 1.4 million, according to the city's published budget documents — a figure that specialists note would stretch significantly further if duplicates currently occupying server capacity were systematically removed before new material is ingested.

Coordination Remains the Hard Part

The core difficulty is not technical. Deduplication software is mature, widely available, and relatively affordable. The harder problem, as data governance professionals in Zurich and elsewhere have consistently noted, is coordination between institutions that operate on different legal mandates, different procurement cycles, and different definitions of what constitutes a true duplicate versus a legitimately distinct version of an image.

Stadtentwicklung Zürich, the city's urban development office on Amtshaus IV near Stadthaus, is among the departments that has begun internal conversations about shared metadata standards — a prerequisite, specialists say, for any cross-institutional deduplication effort to work. Without a common standard, one organisation's original file is another's unrecognised copy.

The Swiss Federal Archives in Bern has published guidance encouraging cantonal institutions to adopt the PREMIS metadata standard for preservation records, though uptake across Zurich's fragmented public sector has been uneven. Cantonal IT representatives are expected to present a coordination proposal to the Zurich cantonal government before the end of the third quarter of 2026, according to the legislative calendar published by the Kantonsrat.

For institutions still building their strategies, specialists recommend three immediate steps: conduct a baseline audit using open-source perceptual hashing tools before the next ingestion cycle begins; establish a shared vocabulary distinguishing archival originals from working copies; and designate a named data steward — not just an IT department — with authority to enforce deduplication protocols. The institutions that skip the audit stage, archivists warn, tend to discover the true scale of the problem only after spending money they could have saved.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily Zurich

This article was produced by the The Daily Zurich editorial desk and covers news in Zurich. See our editorial standards for how we use AI.

The Daily Zurich brief

The day's Zurich news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily Zurich and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to Zurich news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily Zurich and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily Zurich

More in News

Enjoyed this story? Get tomorrow's briefing free.