Chatbot Guardrails

Ask.School includes a comprehensive set of guardrails that run automatically on every chatbot conversation. These protect your school community by filtering harmful content, preventing misuse, and keeping conversations appropriate for a school environment.

All guardrails are enabled by default — you don’t need to configure anything for them to work.

How Guardrails Work

Guardrails check messages at two stages:

Input — When a user sends a message, it is scanned before the chatbot processes it
Output — When the chatbot generates a response, it is scanned before being shown to the user

If a guardrail detects an issue, one of two things happens:

Block — The message is stopped entirely and the user sees a safety message instead
Flag — The message is logged for review but allowed through (used for lower-risk detections)

Every guardrail violation is recorded with a timestamp, the detected content, and the action taken.

Active Guardrails

Content Moderation

Every message is checked against a comprehensive moderation system that detects:

Hate speech and threatening language
Harassment and bullying
Self-harm content and instructions
Violence and graphic content
Sexual content (with heightened sensitivity for content involving minors)
Illicit activity

When detected, the user sees a message like: “I’m sorry, but I’m unable to respond to that message as it may contain inappropriate content.”

NSFW Filter

A separate filter catches workplace-inappropriate content including profanity, explicit material, and language unsuitable for a school environment. This runs on both user messages and chatbot responses.

When detected: “I’m unable to respond to that type of message. Please keep our conversation appropriate for a school setting.”

Off-Topic Detection

The chatbot is kept focused on school-related topics. If a user tries to use the chatbot for unrelated purposes (e.g. asking it to write an essay, play a game, or discuss topics unrelated to the school), it is gently redirected.

When detected: “That’s a bit outside what I can help with! I’m here to answer questions about the school. Try asking about term dates, uniform, events, admissions, or homework.”

Jailbreak Detection

Detects attempts to bypass the chatbot’s safety instructions through techniques like role-play exploits, instruction overrides, or social engineering. This is particularly important for school chatbots where students may test the system’s boundaries.

When detected: “I’m designed to help with school-related questions and I need to stay within my guidelines. Could you rephrase your question?”

Prompt Injection Protection

Detects attempts where user input tries to override the chatbot’s system instructions or extract internal configuration. This prevents technical attacks that could make the chatbot behave unexpectedly.

When detected: “I’m here to help with school-related questions. Could you rephrase your question so I can assist you?”

Personal Data Protection

Sensitive personal data is detected and handled in both directions:

User messages (input) — If a user shares sensitive data such as credit card numbers, NHS numbers, or financial account details, the data is masked and they see: “For your safety, please don’t share personal information such as phone numbers, email addresses, or ID numbers in the chat.”

Chatbot responses (output) — Before a response is sent, it is scanned for sensitive data like credit card numbers, NHS numbers, and financial information. If found, the data is removed: “Some personal information has been masked in this response to protect privacy.”

Your school’s own profile information (school name, published contact details) is automatically whitelisted so the chatbot can share it normally.

You can add additional school-specific terms to your PII Whitelist below.

See Personal Data in Documents for how documents are scanned during upload.

Hallucination Detection

After the chatbot generates a response, it is checked against the school’s actual knowledge base to catch factual inaccuracies. If the system detects that a response may contain invented or unsupported claims, it is flagged.

When detected: “I’m not confident that my answer is fully accurate based on the information available to me. Please check with the school office for the most up-to-date information.”

URL Filtering

For security, URLs in messages are checked against safety rules:

User messages — Links in user input are blocked to prevent phishing or spam: “For security reasons, I’m unable to process messages containing links.”
Chatbot responses — Any URLs the chatbot generates are verified. Unrecognised links are removed: “A link in my response was removed for security reasons. Please visit the school’s official website for verified links and resources.”

PII Whitelist

Some terms that look like personal data are actually fine for your chatbot to share — staff names that already appear on your website, the school’s switchboard number, names of school houses or buildings. The PII Whitelist lets you tell the personal-data filter that those specific terms are safe.

How to Get There

From the School Dashboard, click Whitelist in the left sidebar under Settings.

The PII Whitelist editor with a list of whitelisted terms The PII Whitelist editor

What you can whitelist

The editor accepts two types of entries:

Terms — exact words or phrases (e.g. “Mrs Smith”, “Westbrook House”, “01234 567890”).
Patterns — regular-expression patterns for things like internal staff codes or formatted reference numbers. Most schools won’t need patterns; they’re there for IT teams that have a standard format they want to allow through.

You can also add notes explaining why something is on the list — useful when someone else reviews the list later.

When to add a term

Add a term whenever the filter is wrongly hiding something the chatbot ought to be able to share. Common examples:

Staff names already published on the school website (e.g. the headteacher, designated safeguarding lead, named teachers)
The school’s main phone number, address, and email — already covered by your school profile, but additional regional or department numbers can be added here
Names of houses, buildings, blocks, sites
Trip names, club names, scheme names that include a person’s name

Don’t add:

Pupil names or pupil-identifiable data
Personal phone numbers, personal email addresses, or home addresses of staff
Anything that wasn’t already public on your school’s own website

Activating the whitelist

The Active toggle at the top of the editor turns the whole whitelist on or off. While it’s active, every entry in the list is allowed through the personal-data filter on both user messages and chatbot responses. Toggle it off to revert to the default filter behaviour.

Click Save to apply changes.

If a chatbot keeps masking a name or number you've published yourself, the PII Whitelist is almost always the right fix. Add the term, save, and the next reply will include it.

How This Relates to Safeguarding

Guardrails and Safeguarding Alerts are separate but complementary systems:

	Guardrails	Safeguarding Alerts
Purpose	Prevent harmful content from being sent or received	Flag conversations that suggest a child may be at risk
Action	Automatic — blocks or masks content in real time	Notification — creates an alert for staff to review
Scope	Content quality and safety	Welfare and child protection
Who sees it	The user gets a safe response message	Staff with safeguarding permissions get an email alert

Both systems run simultaneously on every conversation. A single message could trigger both a guardrail (e.g. blocking explicit content) and a safeguarding alert (e.g. flagging a child in distress).

Good to Know

All guardrails are enabled by default and run on every chatbot, including public ones
Guardrails work on both authenticated and anonymous conversations
The system uses multiple AI models for detection, each tuned for accuracy in a school context
Guardrail violations are encrypted and logged for audit purposes but are not visible to end users
Response messages are designed to be age-appropriate and non-alarming
Guardrails cannot be disabled by school administrators — they are a platform-wide safety feature
The off-topic filter uses each chatbot’s system prompt to understand what is considered “on topic”, so it adapts to different chatbot purposes (e.g. a reception chatbot vs. an IT helpdesk chatbot)

Next Steps

Personal Data in Documents — How document uploads are scanned for personal data
Safeguarding Alerts — How conversation monitoring creates welfare alerts
Conversation Monitoring — Browse and review all chatbot conversations
Creating Chatbots — Set up chatbot system prompts and instructions