The Daily Zurich

Zurich news, every day

News

The Numbers Behind Zurich's Duplicate Image Problem: What the Data Reveals

City archives, property listings, and public databases are drowning in duplicate images — and the cost of cleaning them up is higher than most institutions want to admit.

By Zurich News Desk · Published 4 July 2026, 8:43 pm

3 min read

Zurich's public and private digital archives contain millions of redundant image files, and the institutions managing them are only beginning to quantify the scale of the problem. A growing body of technical audits across Swiss data infrastructure points to duplication rates between 18 and 34 percent in unmanaged image repositories — meaning roughly one in four stored images is a copy that serves no functional purpose and consumes server space, energy, and staff time.

The issue has moved from a niche IT concern to a financial one. Storage costs for Swiss institutional databases have risen sharply alongside energy prices, and Zurich-based organisations running on-premises server infrastructure now pay an estimated 0.12 to 0.18 Swiss francs per gigabyte per month for managed archival storage, according to pricing benchmarks published by Swiss hosting associations in early 2026. For a mid-size archive holding 40 terabytes of image data, that translates to roughly CHF 57,600 per year — a significant fraction of which funds the storage of files nobody needs.

Where the Problem Concentrates

The issue is particularly visible in three sectors that define Zurich's economy: real estate, pharma documentation, and public records. On the housing side, the Wohnungsnot crisis has pushed property turnover to record levels in districts like Kreis 4 and Kreis 5, where landlords and brokers refresh listings constantly. Each refresh cycle on platforms that do not auto-detect duplicates generates new copies of floor plans and interior photographs. Homegate, the Swiss property listings platform headquartered in Schlieren, west of Zurich, processes tens of thousands of listings monthly, and industry estimates suggest that property image duplication accounts for a measurable share of its database overhead — though the company has not published a specific figure.

ETH Zurich, ranked consistently among the world's top ten technical universities, has made digital data governance a research focus. The institute's Data Archive, based on Rämistrasse, runs periodic deduplication sweeps across its scientific image repositories. In a 2024 internal review made available to partner institutions, ETH identified that research image folders accumulated duplicates at a rate of roughly 11 percent annually without active management — a figure that compounds quickly over multi-year projects involving high-resolution microscopy or satellite imaging data.

The Stadtarchiv Zürich, located on Neumarkt, faces a comparable challenge in the humanities domain. Historical digitisation projects carried out between 2018 and 2023 produced overlapping image sets when multiple scanning runs were commissioned without a unified metadata standard. Staff there have been working through a retrospective deduplication process since late 2024, using perceptual hashing tools that compare images by visual content rather than file name alone.

What Deduplication Actually Costs — and Saves

The correction process is not cheap. Automated deduplication software licences for enterprise use run between CHF 4,000 and CHF 22,000 per year depending on repository size, with open-source alternatives requiring significant internal developer time to configure and maintain. A remediation project at a Zurich cantonal agency completed in March 2026 recovered approximately 6.8 terabytes of storage across its image holdings — the equivalent of eliminating several months of storage billing at current rates.

The broader Swiss digital economy context matters here. The country's Federal Strategy for Digital Switzerland, updated in 2024, includes data quality and efficiency targets for public-sector digital infrastructure. Redundant data management sits awkwardly against those targets, and cantonal IT departments in Zurich have been asked to report storage efficiency metrics as part of annual ICT audits beginning this year.

For organisations looking to get ahead of the problem, the practical steps are well-established: implement perceptual hashing at the point of ingest rather than retrospectively, standardise metadata schemas before launching digitisation campaigns, and audit existing repositories in segments of no more than 500,000 files at a time to keep processing manageable. Zurich's Digitaltag, held annually in October, has featured sessions on exactly these tools in recent years — a sign that the conversation has moved from specialists to decision-makers. The data is now clear enough that ignoring it is a choice, not an oversight.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily Zurich

This article was produced by the The Daily Zurich editorial desk and covers news in Zurich. See our editorial standards for how we use AI.

The Daily Zurich brief

The day's Zurich news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily Zurich and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to Zurich news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily Zurich and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily Zurich

More in News

Enjoyed this story? Get tomorrow's briefing free.