Overview
In the wake of the public discussion around generative AI systems such as ChatGPT, the European Data Protection Board (EDPB) initiated a coordinated review by national supervisory authorities. The aim was to clarify key data protection questions relating to Large Language Models (LLMs).
The opinion is particularly relevant for organisations, as it sets concrete benchmarks for:
- Web scraping
- Training data processing
- Data subject rights
- Accuracy of outputs
- Age verification
This article summarises the key points and derives practical implications for AI projects.
1. Web Scraping as a Training Source
A central topic was the use of publicly accessible data from the internet.
Key Statement
Publicly accessible data remains personal data within the meaning of the GDPR.
This means:
- A legal basis is required
- Transparency obligations apply
- Data subject rights must be ensured
Public Does Not Mean Freely Usable
The mere public accessibility does not justify unrestricted further processing.
2. Legal Basis for Model Training
The EDPB emphasises:
- Each training phase constitutes a processing operation
- Legitimate interest requires careful balancing
- Sensitive data increases the requirements
Particularly for large-scale web scraping, the following must be assessed:
- Reasonable expectations of data subjects
- Intensity of interference
- Safeguards in place
3. Data Subject Rights with LLMs
A central audit point was:
- How can access, erasure or rectification rights be implemented?
Challenge
LLMs do not store data in a traditional database format but as statistical representations.
The EDPB nevertheless requires:
- Processes for reviewing individual requests
- Mechanisms for data erasure or suppression
- Transparent communication about technical limitations
Technical Limitations
Technical complexity does not automatically exempt from legal obligations.
4. Accuracy of AI Outputs
Generative AI can:
- Generate false statements
- Falsely represent individuals
- Reproduce incorrect facts
The EDPB points out:
- Controllers must take appropriate measures
- Provide clear notices about possible errors
- Offer correction mechanisms
5. Age Verification
Particular attention was given to:
- Protection of minors
- Access restrictions
- Age-appropriate usage
For AI systems with broad accessibility:
- Appropriate age verification must be implemented
- Risks to children must be minimised
6. Delineation of Responsibilities
The EDPB emphasises precise role clarification:
| Role | Description |
|---|---|
| Model provider | Training and model provision |
| Integrator | Integration into own product |
| Deployer | Concrete deployment |
Each role carries its own responsibility.
7. Transparency Requirements
For generative AI, the following are particularly required:
- Clear information about AI usage
- Description of data categories
- Notice of potential error-proneness
- Explanation of logic at an understandable level
Practical Implications for Organisations
1. Review Training Data
- Document provenance
- Identify sensitive data
- Verify legal basis
2. Operationalise Data Subject Rights
- Processes for handling access requests
- Review erasure mechanisms
- Define escalation paths
3. Enhance Transparency
- Update privacy notices
- Explicitly identify AI usage
- Make error-proneness transparent
4. Consider Protection of Minors
- Review age verification
- Assess usage scenarios
Connection to the EU AI Act
The EDPB opinion complements:
- Transparency obligations
- Risk management requirements
- Documentation obligations
Particularly for GPAI models, parallel requirements exist.
Common Misconceptions
| Assumption | Reality |
|---|---|
| "Web scraping is permitted" | Only with a legal basis |
| "LLMs do not store personal data" | Inferences can be personal data |
| "Inaccurate outputs are technically unavoidable" | Organisational obligations exist |
Strategic Recommendation
Organisations should not view generative AI in isolation but rather:
- As a data protection-relevant overall system
- With a clear governance structure
- With documented risk analysis
Proactive transparency significantly reduces regulatory risk.
Need help implementing?
Work with Creativate AI Studio to design, validate and implement AI systems — technically sound, compliant and production-ready.
Need legal clarity?
For specific legal questions on the AI Act and GDPR, specialized legal advice focusing on AI regulation, data protection and compliance structures is available.
Independent legal advice. No automated legal information. The platform ai-playbook.eu does not provide legal advice.
Next Steps
- Review training data sources and legal basis.
- Implement data subject rights processes.
- Revise transparency information.
- Consider protection of minors.
- Integrate EDPB requirements into your AI governance.
Need help implementing?
Work with Creativate AI Studio to design, validate and implement AI systems — technically sound, compliant and production-ready.
Need legal clarity?
For specific legal questions on the AI Act and GDPR, specialized legal advice focusing on AI regulation, data protection and compliance structures is available.
Independent legal advice. No automated legal information. The platform ai-playbook.eu does not provide legal advice.