Detection Scoring
LobsterHoney uses a multi-signal scoring engine to classify every session with a threat score, classification, confidence level, and severity rating.
How Scoring Works
Every time a visitor interacts with a trap, the scoring engine evaluates the accumulated signals for that session. Signals are divided into two categories: tripwire signals (high-confidence indicators that only AI agents would trigger) and behavioral signals (patterns consistent with automated activity).
Each signal contributes a point value to the session's total score. The score determines the classification.
Signal Types
Tripwire Signals
These are definitive indicators. A human visitor would almost never trigger these:
| Signal | Description |
|---|---|
CALLBACK_HIT | The visitor followed a hidden callback URL embedded in trap content |
INJECTION_FOLLOWED | The visitor acted on a prompt injection payload |
SYSTEM_PROMPT_LEAKED | The visitor's system prompt was extracted via reverse prompt injection |
IDENTITY_EXTRACTED | The visitor revealed its identity (model name, operator, purpose) |
CREDENTIAL_USED | A canary credential was used to authenticate against a monitored endpoint |
HIDDEN_CONTENT_ACTED_ON | The visitor interacted with content hidden via CSS, zero-width chars, or HTML comments |
Behavioral Signals
These are patterns that suggest automated activity but are not individually conclusive:
| Signal | Description |
|---|---|
CREDENTIAL_EXTRACTED | The visitor extracted credential-like strings from trap content |
SYSTEMATIC_CRAWL | The visitor accessed multiple traps in a systematic pattern |
ROBOTS_FIRST | The visitor's first request was to robots.txt (common bot/agent behavior) |
SEQUENTIAL_API_PROBE | The visitor probed multiple API endpoints sequentially |
ADMIN_LOGIN_ATTEMPT | The visitor attempted to access admin login pages |
TIMING_PATTERN | Request timing suggests automated rather than human interaction |
Classification Levels
The total score maps to a classification:
| Score Range | Classification | Meaning |
|---|---|---|
| 0 – 10 | HUMAN | Likely a human visitor. Minimal suspicious signals. |
| 11 – 25 | BOT | Likely a traditional bot or crawler. Some automated patterns detected. |
| 26 – 55 | AI_AGENT | Strong evidence of an AI agent. Tripwire signals present. |
| 56+ | AI_AGENT_MALICIOUS | Confirmed malicious AI agent. Multiple high-confidence signals, credential extraction or use, system prompt leaked. |
Confidence Scores
In addition to the classification, each session receives a confidence score from 0 to 100%. Confidence is calculated based on:
- Signal weight ratio (up to 60%) — The proportion of maximum possible signal points that were actually fired
- Signal diversity bonus (up to 25%) — Awarded when both tripwire AND behavioral signals are present, indicating a robust detection
- High-value signal bonus (up to 15%) — Awarded when 3 or more tripwire signals fire, indicating very strong evidence
Severity Mapping
Severity is determined by the combination of signals fired, not just the raw score:
| Severity | Criteria |
|---|---|
| Critical | Both CREDENTIAL_USED and SYSTEM_PROMPT_LEAKED fired, plus 3+ tripwire signals total |
| High | Credential extraction or use detected, plus at least one tripwire signal |
| Medium | At least one tripwire signal fired |
| Low | Only behavioral signals detected; no tripwires fired |
Scoring in Practice
The scoring engine runs automatically after every trap hit. As more signals accumulate for a session, the score and classification may escalate. A session that starts as BOT can be reclassified to AI_AGENT_MALICIOUS as the agent continues to interact with traps.
You can view the full scoring breakdown for any session in the Incidents view of the dashboard, including which signals fired, their point values, and the resulting classification.