AI Data Privacy: What You Need to Know
A lawyer used ChatGPT to help write a brief. It invented fake case citations. He submitted them to court without checking. He was sanctioned.
That story made headlines because it was embarrassing. But there's a scarier story that didn't get as much attention: law firms feeding confidential client information into AI tools and having no idea where that data went.
When you use AI, you're giving it data. Sometimes that data is sensitive. And the privacy implications are real, even if they're not obvious.
Where Does Your Data Go?
When you type something into ChatGPT, Claude, or any other AI tool, that text goes to a server somewhere. What happens next depends entirely on the provider's policies.
Some things that might happen:
- Training - Your inputs might be used to train future versions of the model. That means your confidential information becomes part of a system that others use.
- Storage - Your conversations might be stored, for how long and for what purpose varies.
- Human review - Some providers have humans review conversations for quality control or safety. Your private inputs might be read by employees.
- Third parties - Data might be shared with partners, subprocessors, or other entities.
None of this is necessarily bad. But you need to know it's happening.
What the Major Providers Actually Do
Here's the current landscape (policies change, always check current terms):
OpenAI (ChatGPT)
Free and Plus tiers: Your conversations may be used for training unless you opt out in settings. ChatGPT Enterprise and API usage have stronger privacy by default.
Anthropic (Claude)
Similar structure. Free tier has fewer guarantees than paid business tiers. Enterprise agreements can include specific data handling commitments.
Microsoft (Copilot)
Depends heavily on which version. Consumer Copilot has different terms than enterprise Microsoft 365 Copilot.
Google (Gemini)
Consumer versions may use data for training. Workspace versions have enterprise data protection.
The pattern: consumer versions are looser with data, enterprise versions are tighter. You often get what you pay for.
The Risks You Need to Manage
Let's be specific about what can go wrong:
Confidential Information Exposure
If your employee pastes client contracts into a consumer AI tool, that information might end up in training data. In theory, pieces could surface in responses to other users.
Real example: Samsung employees pasted proprietary source code into ChatGPT. Samsung then banned the tool entirely. The damage was already done.
Regulatory Violations
Healthcare data has HIPAA requirements. Financial data has specific regulations. European personal data has GDPR requirements. Putting this data into AI tools that don't meet compliance standards can trigger violations.
This isn't theoretical. Companies have received inquiries from regulators about AI tool usage.
Discovery in Litigation
If you're sued, your AI conversations might be discoverable. That prompt where you explored whether something was legal before doing it anyway? Potentially relevant evidence.
Third-Party Obligations
Your contracts with clients, partners, and vendors often have confidentiality clauses. Putting their information into third-party AI tools might violate those agreements.
Creating a Sensible AI Data Policy
You need written guidelines for your organization. Here's what to include:
Classification of Data
Define what's okay to put into AI tools and what isn't:
- Green (OK) - Public information, general questions, non-sensitive internal data
- Yellow (Caution) - Internal processes, general business information, requires judgment
- Red (Never) - Customer personal data, financial details, health information, trade secrets, confidential communications
Approved Tools
Specify which AI tools are sanctioned for business use. Using enterprise tiers with proper contracts is different from employees using free consumer accounts.
Review Requirements
For sensitive use cases, require review before deploying AI. Legal should look at customer-facing AI implementations. Security should review integrations that touch sensitive systems.
Training Requirements
Everyone using AI tools should understand the basics of data privacy. Make this part of onboarding and regular training.
Technical Protections
Policy is important, but technical controls help too:
Data Sanitization
Before putting text into AI, strip out personally identifiable information. Replace names with placeholders. Remove account numbers. This limits exposure even if other protections fail.
Enterprise Contracts
If AI is critical to your business, get proper enterprise agreements with data processing addendums, not terms-of-service accounts. These give you actual legal protections and opt-outs from training data usage.
On-Premises or Private Cloud
For highly sensitive applications, consider self-hosted AI models. They're more expensive and less capable, but your data never leaves your control.
Access Logging
Track who's using AI tools and for what. If something goes wrong, you need to know the scope.
What About Your Own AI Systems?
If you're building AI into your products, you have additional responsibilities:
Training Data Rights
Do you have the right to use data for AI training? Customer data collected for one purpose might not be usable for training. Read your privacy policies. They might need updating.
Disclosure
In many jurisdictions, you need to tell users when they're interacting with AI. Some require disclosure when AI is making decisions about them.
Bias and Fairness
AI decisions that affect people (credit, employment, housing) face increasing regulatory scrutiny. You may need to demonstrate fairness and provide explanations.
Data Retention
How long do you keep AI training data? How long do you keep conversation logs? This should align with your broader data retention policies and regulatory requirements.
The Vendor Question
When evaluating AI vendors or tools, ask:
- Is our data used for training? Can we opt out?
- Where is data processed and stored? Which jurisdictions?
- Who has access to our data? Under what circumstances?
- What happens to our data if we cancel?
- What security certifications do they have? SOC 2? ISO 27001?
- Will they sign our data processing agreement?
If vendors can't answer these questions clearly, that's a red flag.
Practical Steps for Tomorrow
You don't need to solve everything at once. Start here:
- Audit current usage - Find out what AI tools your employees are already using. You might be surprised.
- Define red lines - Identify data that should never go into AI tools. Communicate this clearly.
- Upgrade to enterprise - If AI is important to your business, get proper business accounts with real contracts.
- Train your team - Make sure everyone understands the basics of what they can and can't put into AI.
- Review regularly - AI capabilities and regulations are both evolving. Check your policies every six months.
AI privacy isn't about avoiding AI. It's about using it responsibly. Get the policies and protections right, and you can capture the benefits without creating unnecessary risk.