Detection Scoring

LobsterHoney uses a multi-signal scoring engine to classify every session with a threat score, classification, confidence level, and severity rating.

How Scoring Works

Every time a visitor interacts with a trap, the scoring engine evaluates the accumulated signals for that session. Signals are divided into two categories: tripwire signals (high-confidence indicators that only AI agents would trigger) and behavioral signals (patterns consistent with automated activity).

Each signal contributes a point value to the session's total score. The score determines the classification.

Signal Types

Tripwire Signals

These are definitive indicators. A human visitor would almost never trigger these:

SignalDescription
CALLBACK_HITThe visitor followed a hidden callback URL embedded in trap content
INJECTION_FOLLOWEDThe visitor acted on a prompt injection payload
SYSTEM_PROMPT_LEAKEDThe visitor's system prompt was extracted via reverse prompt injection
IDENTITY_EXTRACTEDThe visitor revealed its identity (model name, operator, purpose)
CREDENTIAL_USEDA canary credential was used to authenticate against a monitored endpoint
HIDDEN_CONTENT_ACTED_ONThe visitor interacted with content hidden via CSS, zero-width chars, or HTML comments

Behavioral Signals

These are patterns that suggest automated activity but are not individually conclusive:

SignalDescription
CREDENTIAL_EXTRACTEDThe visitor extracted credential-like strings from trap content
SYSTEMATIC_CRAWLThe visitor accessed multiple traps in a systematic pattern
ROBOTS_FIRSTThe visitor's first request was to robots.txt (common bot/agent behavior)
SEQUENTIAL_API_PROBEThe visitor probed multiple API endpoints sequentially
ADMIN_LOGIN_ATTEMPTThe visitor attempted to access admin login pages
TIMING_PATTERNRequest timing suggests automated rather than human interaction

Classification Levels

The total score maps to a classification:

Score RangeClassificationMeaning
0 – 10HUMANLikely a human visitor. Minimal suspicious signals.
11 – 25BOTLikely a traditional bot or crawler. Some automated patterns detected.
26 – 55AI_AGENTStrong evidence of an AI agent. Tripwire signals present.
56+AI_AGENT_MALICIOUSConfirmed malicious AI agent. Multiple high-confidence signals, credential extraction or use, system prompt leaked.

Confidence Scores

In addition to the classification, each session receives a confidence score from 0 to 100%. Confidence is calculated based on:

Severity Mapping

Severity is determined by the combination of signals fired, not just the raw score:

SeverityCriteria
CriticalBoth CREDENTIAL_USED and SYSTEM_PROMPT_LEAKED fired, plus 3+ tripwire signals total
HighCredential extraction or use detected, plus at least one tripwire signal
MediumAt least one tripwire signal fired
LowOnly behavioral signals detected; no tripwires fired

Scoring in Practice

The scoring engine runs automatically after every trap hit. As more signals accumulate for a session, the score and classification may escalate. A session that starts as BOT can be reclassified to AI_AGENT_MALICIOUS as the agent continues to interact with traps.

You can view the full scoring breakdown for any session in the Incidents view of the dashboard, including which signals fired, their point values, and the resulting classification.