Personal Data in Documents

When you upload a document to Ask.School, it is automatically scanned for personal data before being added to your chatbot’s knowledge base. This helps prevent sensitive information — such as student names or NHS numbers — from being included in chatbot responses.

How It Works

Every document goes through a personal data check during processing:

  1. You upload a document (PDF, DOCX, TXT, etc.)
  2. Ask.School extracts the text content
  3. An AI-powered scanner checks the text for personal data entities
  4. If personal data is found, the document is flagged instead of being added to the knowledge base
  5. If no personal data is found, the document is marked as Ready and added normally

Flagged documents are not used by chatbots until you review and approve them.

What Gets Detected

The personal data scanner looks for:

Entity Type Examples
Person names Student names, staff names, parent names
NHS numbers UK National Health Service numbers

The scanner is designed to catch common personal data in school documents. It uses AI-powered entity recognition, so it may occasionally flag general references (like historical figures or place names) as personal data. Review flagged documents to confirm whether the detection is accurate.

What a Flagged Document Looks Like

When a document is flagged, it appears in your Documents list with:

  • An orange “Flagged” badge instead of the usual green “Ready” badge
  • A warning message: “Contains personal data — review before adding to knowledge base”
  • An orange-highlighted background row
  • An Approve button to manually approve the document after review

Flagged documents also appear under the Flagged tab in the document status filters, making them easy to find.

Reviewing a Flagged Document

When you see a flagged document:

  1. Download the document using the download button to review its contents.
  2. Check what was detected — consider whether the personal data is genuinely sensitive or a false positive (e.g. a headteacher’s name in a welcome letter vs. a list of student names).
  3. Decide what to do:
    • Approve the document — Click Approve if the personal data is acceptable (e.g. the headteacher’s published name, or publicly available staff contact details). The document will be added to the knowledge base and chatbots can use it.
    • Remove and re-upload — If the document contains genuinely sensitive data (e.g. student names, medical information), remove the personal data from the original file and upload a cleaned version.
    • Delete the document — If the document shouldn’t be in the system at all, delete it.

Important: Approving a flagged document means the chatbot may include information from that document in its responses. Make sure you are comfortable with everything in the document being potentially surfaced to users.

Common Scenarios

Documents that are usually safe to approve

  • Welcome letters signed by the headteacher — the head’s name is public information
  • Staff directories with published contact details — already on the school website
  • Prospectus mentioning named department heads — intentionally public
  • Policy documents referencing the DSL by name — this is a statutory requirement

Documents that should be redacted first

  • Behaviour reports mentioning students by name
  • SEN/EHCP documents with student details
  • Medical or allergy lists with student names
  • Safeguarding case notes or referrals
  • Staff HR documents with personal details
  • Parent contact lists with phone numbers or addresses

Documents that work well without personal data

  • Generic policies (uniform, behaviour, admissions) — rarely contain names
  • Term dates and timetables — no personal data
  • Curriculum guides — subject content only
  • FAQs — question-and-answer format, no names needed

PII Whitelist

If your school regularly uploads documents that contain the same names (e.g. the headteacher in every policy), Ask.School maintains a PII whitelist that can be configured to automatically allow certain terms. Your school’s own profile information (school name, contact details) is automatically whitelisted so it won’t trigger false flags.

Contact support if you need help configuring your whitelist.

Chatbot Conversation Protection

Personal data protection isn’t limited to documents. Ask.School also monitors chatbot conversations in real time:

  • Input scanning — When a user sends a message containing sensitive data (credit card numbers, NHS numbers, etc.), the system can detect and handle it appropriately
  • Output scanning — Before the chatbot sends a response, it is checked to ensure it doesn’t expose sensitive data like credit card numbers, NHS numbers, or other protected information

This works alongside Safeguarding Alerts to provide comprehensive protection across your chatbot platform.

Good to Know

  • Personal data scanning happens automatically — you don’t need to enable it
  • The scanner runs on every new document upload, including documents attached to School Knowledge topics
  • Flagged documents can still be downloaded and viewed by administrators — they are just prevented from being used by chatbots until approved
  • The scanning uses AI-powered entity recognition, which is highly accurate but not infallible — always review flagged documents manually
  • All detected personal data entities are logged but encrypted at rest

Next Steps