Chatbot Guardrails
Ask.School includes a comprehensive set of guardrails that run automatically on every chatbot conversation. These protect your school community by filtering harmful content, preventing misuse, and keeping conversations appropriate for a school environment.
All guardrails are enabled by default — you don’t need to configure anything for them to work.
How Guardrails Work
Guardrails check messages at two stages:
- Input — When a user sends a message, it is scanned before the chatbot processes it
- Output — When the chatbot generates a response, it is scanned before being shown to the user
If a guardrail detects an issue, one of two things happens:
- Block — The message is stopped entirely and the user sees a safety message instead
- Flag — The message is logged for review but allowed through (used for lower-risk detections)
Every guardrail violation is recorded with a timestamp, the detected content, and the action taken.
Active Guardrails
Content Moderation
Every message is checked against a comprehensive moderation system that detects:
- Hate speech and threatening language
- Harassment and bullying
- Self-harm content and instructions
- Violence and graphic content
- Sexual content (with heightened sensitivity for content involving minors)
- Illicit activity
When detected, the user sees a message like: “I’m sorry, but I’m unable to respond to that message as it may contain inappropriate content.”
NSFW Filter
A separate filter catches workplace-inappropriate content including profanity, explicit material, and language unsuitable for a school environment. This runs on both user messages and chatbot responses.
When detected: “I’m unable to respond to that type of message. Please keep our conversation appropriate for a school setting.”
Off-Topic Detection
The chatbot is kept focused on school-related topics. If a user tries to use the chatbot for unrelated purposes (e.g. asking it to write an essay, play a game, or discuss topics unrelated to the school), it is gently redirected.
When detected: “That’s a bit outside what I can help with! I’m here to answer questions about the school. Try asking about term dates, uniform, events, admissions, or homework.”
Jailbreak Detection
Detects attempts to bypass the chatbot’s safety instructions through techniques like role-play exploits, instruction overrides, or social engineering. This is particularly important for school chatbots where students may test the system’s boundaries.
When detected: “I’m designed to help with school-related questions and I need to stay within my guidelines. Could you rephrase your question?”
Prompt Injection Protection
Detects attempts where user input tries to override the chatbot’s system instructions or extract internal configuration. This prevents technical attacks that could make the chatbot behave unexpectedly.
When detected: “I’m here to help with school-related questions. Could you rephrase your question so I can assist you?”
Personal Data Protection
Sensitive personal data is detected and handled in both directions:
User messages (input) — If a user shares sensitive data such as credit card numbers, NHS numbers, or financial account details, the data is masked and they see: “For your safety, please don’t share personal information such as phone numbers, email addresses, or ID numbers in the chat.”
Chatbot responses (output) — Before a response is sent, it is scanned for sensitive data like credit card numbers, NHS numbers, and financial information. If found, the data is removed: “Some personal information has been masked in this response to protect privacy.”
Your school’s own profile information (school name, published contact details) is automatically whitelisted so the chatbot can share it normally.
You can add additional school-specific terms to your PII Whitelist below.
See Personal Data in Documents for how documents are scanned during upload.
Hallucination Detection
After the chatbot generates a response, it is checked against the school’s actual knowledge base to catch factual inaccuracies. If the system detects that a response may contain invented or unsupported claims, it is flagged.
When detected: “I’m not confident that my answer is fully accurate based on the information available to me. Please check with the school office for the most up-to-date information.”
URL Filtering
For security, URLs in messages are checked against safety rules:
- User messages — Links in user input are blocked to prevent phishing or spam: “For security reasons, I’m unable to process messages containing links.”
- Chatbot responses — Any URLs the chatbot generates are verified. Unrecognised links are removed: “A link in my response was removed for security reasons. Please visit the school’s official website for verified links and resources.”
PII Whitelist
Some terms that look like personal data are actually fine for your chatbot to share — staff names that already appear on your website, the school’s switchboard number, names of school houses or buildings. The PII Whitelist lets you tell the personal-data filter that those specific terms are safe.
How to Get There
From the School Dashboard, click Whitelist in the left sidebar under Settings.
The PII Whitelist editor
What you can whitelist
The editor accepts two types of entries:
- Terms — exact words or phrases (e.g. “Mrs Smith”, “Westbrook House”, “01234 567890”).
- Patterns — regular-expression patterns for things like internal staff codes or formatted reference numbers. Most schools won’t need patterns; they’re there for IT teams that have a standard format they want to allow through.
You can also add notes explaining why something is on the list — useful when someone else reviews the list later.
When to add a term
Add a term whenever the filter is wrongly hiding something the chatbot ought to be able to share. Common examples:
- Staff names already published on the school website (e.g. the headteacher, designated safeguarding lead, named teachers)
- The school’s main phone number, address, and email — already covered by your school profile, but additional regional or department numbers can be added here
- Names of houses, buildings, blocks, sites
- Trip names, club names, scheme names that include a person’s name
Don’t add:
- Pupil names or pupil-identifiable data
- Personal phone numbers, personal email addresses, or home addresses of staff
- Anything that wasn’t already public on your school’s own website
Activating the whitelist
The Active toggle at the top of the editor turns the whole whitelist on or off. While it’s active, every entry in the list is allowed through the personal-data filter on both user messages and chatbot responses. Toggle it off to revert to the default filter behaviour.
Click Save to apply changes.
If a chatbot keeps masking a name or number you've published yourself, the PII Whitelist is almost always the right fix. Add the term, save, and the next reply will include it.
How This Relates to Safeguarding
Guardrails and Safeguarding Alerts are separate but complementary systems:
| Guardrails | Safeguarding Alerts | |
|---|---|---|
| Purpose | Prevent harmful content from being sent or received | Flag conversations that suggest a child may be at risk |
| Action | Automatic — blocks or masks content in real time | Notification — creates an alert for staff to review |
| Scope | Content quality and safety | Welfare and child protection |
| Who sees it | The user gets a safe response message | Staff with safeguarding permissions get an email alert |
Both systems run simultaneously on every conversation. A single message could trigger both a guardrail (e.g. blocking explicit content) and a safeguarding alert (e.g. flagging a child in distress).
Good to Know
- All guardrails are enabled by default and run on every chatbot, including public ones
- Guardrails work on both authenticated and anonymous conversations
- The system uses multiple AI models for detection, each tuned for accuracy in a school context
- Guardrail violations are encrypted and logged for audit purposes but are not visible to end users
- Response messages are designed to be age-appropriate and non-alarming
- Guardrails cannot be disabled by school administrators — they are a platform-wide safety feature
- The off-topic filter uses each chatbot’s system prompt to understand what is considered “on topic”, so it adapts to different chatbot purposes (e.g. a reception chatbot vs. an IT helpdesk chatbot)
Next Steps
- Personal Data in Documents — How document uploads are scanned for personal data
- Safeguarding Alerts — How conversation monitoring creates welfare alerts
- Conversation Monitoring — Browse and review all chatbot conversations
- Creating Chatbots — Set up chatbot system prompts and instructions