Why Bad Data Quality is a GDPR Risk

Most GDPR compliance programmes focus on consent and documentation. Fewer address the data quality problems that quietly undermine both. Here is why they should.

Data quality and GDPR compliance intersection

The standard narrative around GDPR compliance runs something like this: define your lawful bases, document your processing activities in the ROPA, deploy consent management, appoint a DPO if required, and run a DPIA for high-risk processing. That checklist is not wrong. But it leaves out a class of problem that sits underneath every item on it — and that problem is data quality.

Poor data quality is not just an operational inconvenience. Under the GDPR, it is a compliance failure with a specific statutory basis. Art. 5(1)(d) requires that personal data be accurate and, where necessary, kept up to date — and that every reasonable step is taken to erase or rectify inaccurate data without delay. That is a data quality obligation, stated plainly in the Regulation itself.

The Accuracy Principle Is Not a Checkbox

Many organisations treat accuracy as a point-in-time concern: they clean their data before a GDPR audit, or when a data subject makes a correction request under Art. 16. But Art. 5(1)(d) establishes an ongoing duty. An email address field populated three years ago from a form submission that has since been superseded is an accuracy problem that exists right now, regardless of when it was last reviewed.

The practical consequence is that data quality monitoring is not optional infrastructure. It is the mechanism by which an organisation fulfils a named obligation under the Regulation. A CRM that holds stale customer records, a data warehouse column where email format validation was turned off during a pipeline migration, a behavioural analytics table that conflates records because a join key was changed upstream without downstream notification — all of these are potential Art. 5(1)(d) failures, and none of them are visible to a legal team relying on documentation alone.

Where Quality Gaps Undermine Your Other GDPR Obligations

The accuracy principle is only the most direct connection between data quality and compliance. There are at least three others that DPOs frequently underestimate.

Storage limitation and retention enforcement

Art. 5(1)(e) requires that personal data not be kept longer than necessary for its processing purpose. Most organisations have a documented retention policy. Far fewer have a data pipeline that reliably identifies which records have passed their retention window — particularly in data warehouses where historical data is kept indefinitely for analytical purposes without a corresponding exception justification. A data quality monitoring layer that tracks record age against purpose classification is what bridges the policy and the reality.

Data subject access requests

Under Art. 15, a data subject is entitled to a copy of all personal data an organisation holds about them. If that data is siloed across systems that were never mapped, or if a database column changed type and records were corrupted in the conversion, the resulting DSR response is incomplete — potentially unlawfully so. We see this pattern regularly when working with growing companies whose data estate has expanded faster than their documentation.

Consider a mid-size SaaS business operating out of Zurich in late 2024. Their engineering team had migrated their CRM integration to a new message queue, and in doing so, a user identifier field was split into two: a legacy numeric ID and a new UUID. Neither field was updated in the system that processed DSR queries. The result was that access requests generated by the old identifier returned only a subset of the data held — a partial disclosure that was technically a violation of Art. 15(1).

DPIA accuracy assumptions

A Data Protection Impact Assessment under Art. 35 requires, among other things, an assessment of the necessity and proportionality of the processing and the risks to data subjects. That risk assessment is only as good as the data landscape it describes. If your DPIA was written against a data model that has since drifted — new fields added, old fields repurposed, classification tags removed by an automated schema tool — then the residual risk assessment is no longer accurate. A DPIA prepared in good faith a year ago may be materially incorrect today.

Why Compliance Teams Often Cannot See This

The honest reason data quality problems escape compliance scrutiny is structural: DPOs are not typically embedded in the data engineering workflow. They document processing activities based on interviews with product and engineering teams, supplemented by system architecture diagrams that may or may not reflect the current state of the infrastructure. Schema changes happen in pull requests. Retention logic lives in pipeline code. Data classification changes are made in a metadata catalog that no one told the legal team existed.

We are not saying that DPOs should become data engineers. That is not a realistic or necessary expectation. What we are saying is that a compliance programme that has no feedback loop from the data engineering layer is operating on a model of the data estate that will increasingly diverge from reality over time.

The connection between data observability tooling — pipeline monitors, schema change alerts, data freshness checks, column-level lineage — and compliance obligations is not an interesting engineering concept. It is the practical mechanism by which Art. 5 obligations are met in a production data environment.

A Note on Proportionality

None of this requires that every organisation deploy a full-scale observability platform. A 40-person company with three data sources and a weekly reporting pipeline faces a different engineering reality than a 300-person company with 15 systems, a streaming ETL layer, and a data science team producing derived personal data from multiple sources. The principle is the same; the implementation should match the scale.

What does not scale well is the manual alternative: periodic data audits, spreadsheet-based ROPA updates that rely on engineers remembering to tell the DPO about schema changes, and reactive data quality remediation triggered only by visible failures. Growth makes manual processes unsustainable, and in the context of GDPR, unsustainable eventually means non-compliant.

Starting Points for DPOs

If your organisation does not yet have data quality monitoring in place, the compliance case for starting is straightforward. The practical starting point is not a large infrastructure project — it is a conversation between the DPO and the data engineering team about where personal data flows and what currently alerts when those flows break or change.

Three questions worth asking in that conversation: Does anyone get notified when a column that holds personal data changes its data type? Is there a record of when a data subject's record was last validated against a primary source? If a DSR were received today, could you enumerate every system holding data about that individual and return a complete, accurate dataset within the one-month window Art. 12(3) specifies?

If the honest answer to any of those is no, or "probably not," you have identified a compliance gap that sits upstream of documentation. That is where to start.