RESEARCH

Privacy-Preserving RAG - PII-Safe Retrieval Without Redaction Debt

18 Sept 2025

Executive Summary

We explored how to run retrieval-augmented generation (RAG) over operational documents that may contain sensitive data (names, emails, addresses, identifiers) without creating a brittle “redaction pipeline” that becomes operational debt.

The work proposes a privacy-preserving retrieval design that:

reduces the likelihood of sensitive data entering model context,
provides clear provenance and auditability, and
remains compatible with real-world indexing and ingestion workflows.

Abstract

Organizations adopt RAG to make internal knowledge usable, but operational corpora often contain PII. Naive approaches either:

ship raw text to models (privacy risk), or
attempt universal redaction (fragile and expensive, especially with mixed formats).

We propose a layered approach: minimize exposure by controlling what is retrieved, how it is represented, and how answers are generated. The system uses metadata segmentation, selective quoting, token-level safety checks, and “sensitive span” handling that preserves utility without relying on one perfect redactor.

Threat Model (Practical)

We focus on reducing:

accidental disclosure in generated responses,
unnecessary sensitive spans in model context,
over-broad retrieval that drags in irrelevant private data.

We assume standard cloud controls (IAM, encrypted storage) exist, and target the remaining “last mile” failure modes where the model sees or outputs sensitive fragments.

Approach

1) Content Segmentation + Metadata Boundaries

Documents are segmented into retrievable units and labeled with sensitivity metadata (e.g., policy, operational, customer, finance). Retrieval is constrained so high-risk segments require stronger justification (or human approval).

2) Retrieval Guardrails (Before the Model)

Strict filters on document class and source
Minimum-evidence rules for sensitive categories
Query rewriting to reduce “identifier hunting” patterns

3) Sensitive Span Handling

Rather than attempting perfect redaction everywhere, we:

identify high-confidence sensitive spans,
replace them with stable placeholders for reasoning,
optionally re-insert only when explicitly required and allowed.

4) Answer Policy

The answer system prefers:

paraphrase over verbatim quotes,
citations/provenance where supported,
refusal or escalation when the user request is privacy-unsafe.

Methodology

We treat privacy as a systems property rather than a single model feature:

Policy-first design: define what the assistant must never do (e.g., disclose identifiers) and implement enforcement in retrieval and answer layers.
Progressive hardening: start with coarse segmentation and tighten only where leakage risk is highest.
Defense-in-depth: multiple small checks that each reduce exposure, rather than one brittle “perfect redactor.”

Evaluation Design

Metrics:

Privacy leakage rate (sensitive spans in output)
Utility (answer relevance judged by SMEs)
Retrieval precision (fraction of retrieved chunks relevant)
Operational cost (indexing time, false positives in sensitive span detection)

Results

In a pilot evaluation on representative internal corpora:

Sensitive-span leakage in outputs decreased materially under conservative defaults.
Retrieval precision improved when enforcing metadata boundaries (less “accidental private context”).
Operational overhead stayed bounded because the approach avoids a mandatory full-corpus redaction rewrite.

Key qualitative finding: the biggest safety gains came from retrieval narrowing (what the model sees), not only from output filtering.

Governance and Controls

Maintain an explicit policy matrix (allowed / not allowed) for sensitive categories.
Log provenance for each answer (retrieved document IDs and policy outcomes).
Implement an escalation path for identity-targeting queries.

Deployment Notes

Start with coarse metadata segmentation; refine only the high-risk categories.
Use “safe defaults”: if a query targets identity details, escalate or refuse.
Log retrieval provenance and policy outcomes for audits.

Commercial Application

This work is applicable to:

support knowledge bases with customer communications
finance and operations assistants
regulated environments where content must remain provably controlled

Licensable Outcomes

Privacy-aware retrieval policy: metadata constraints + query rewriting patterns
Sensitive-span handling module: placeholder strategy + optional rehydration
Operational playbook: governance, audits, incident response, and rollout gates

Deliverables

Privacy-aware retrieval policy and reference implementation
Sensitive-span handling module with test corpus
Monitoring dashboard spec (leakage indicators, drift, escalations)

Limitations and Next Work

Sensitivity detection is imperfect; design must tolerate false positives and false negatives.
Future work: incorporate formal privacy testing (attack prompts) into CI for regression prevention.

Evaluation Date: September 2025
Status: Design validated; typically deployed as an extension to an existing RAG stack

Back to Research Contact

Sign in