Technical deep-dive
A 7-step walkthrough of the pipeline that catches real CVEs before they ship — and nothing posts to your PR without your sign-off.
TL;DR
Seven steps from GitHub webhook to a finding in your /reviews queue.
PR opened / updated
│
▼
< span class="hiw-pipe">├─ GitHub webhook fires
│
▼
< span class="hiw-step">1 Fetch diff + surrounding lines per hunk
│
▼
< span class="hiw-step">2 Fetch top-level config files (automatically — no setup required)
│ package.json, requirements.txt, go.mod, Cargo.toml,
│ Gemfile, pom.xml, build.gradle, Dockerfile, .eslintrc, tsconfig.json
│
▼
< span class="hiw-step">3 Prompt construction — diff + context + 9-bucket CWE taxonomy prior
│
▼
< span class="hiw-step">4 Model inference (claude-sonnet-4-5)
│
▼
< span class="hiw-step">5 Post-processing: dedup → severity → line anchors → CVSS estimate
│
▼
< span class="hiw-step">6 Finding lands in /reviews — waiting for your approval
│
▼
< span class="hiw-step">7 You approve ──► GitHub API posts review comment to the PR
Step 01
PullLight receives a GitHub pull_request event (opened or synchronized/updated). No polling. No background jobs scanning branches.
Step 02
PullLight calls GET /repos/{owner}/{repo}/pulls/{pull_number}/files. For each changed file, we read the full diff. For each hunk, we fetch N lines of surrounding context (default: 20 above/below). This gives the model enough context to trace variable origins without loading the entire file.
Step 03
We fetch repo root files — package.json, requirements.txt, go.mod, Cargo.toml, Gemfile, pom.xml, build.gradle, Dockerfile, .eslintrc, tsconfig.json — whatever exists. This surfaces naming conventions, dependency versions, and lint rules so the model can flag version-specific vulnerability patterns (e.g., "this requires qs >= 4.19.2 because CVE-2024-29041 was fixed in that version").
Step 04
The prompt includes: (a) the diff with context, (b) the bug-class taxonomy (9 CWE buckets with descriptions and example sinks), (c) instructions to output structured JSON with severity, category, file:line, description, and suggested fix. The taxonomy prior acts like a security-engineering cheat sheet — it biases the model toward known vulnerability patterns rather than generic code style feedback.
Step 05
claude-sonnet-4-5 via Polsia AI proxy. Payload is the diff + context + config files + taxonomy prompt. No full repo clone. No git history. No .env files or credentials.
Step 06
Raw model output goes through: dedup (remove findings already commented on by other bots), severity classification (critical/high/medium/low based on CVSS-equivalent scoring), line-anchor resolution (map model file references to actual line numbers in the PR diff), CVSS estimation.
Step 07
All findings land in /reviews. Nothing auto-posts. You approve individual comments, dismiss false positives, or skip the review entirely. Only approved comments hit the PR via GitHub's POST /repos/{owner}/{repo}/pulls/{pull_number}/reviews API.
PullLight is trained on a 9-bucket CWE taxonomy. Every finding is tagged to one of these. The model doesn't just look for "bad code" — it looks for specific attack primitives.
| CWE | Bug Class | Example Sinks |
|---|---|---|
| CWE-78 | Remote Code Execution | eval(), new Function(), exec(), spawn(), deserialization |
| CWE-502 | Insecure Deserialization | pickle.load(), yaml.load() (without Loader=), fastjson |
| CWE-22 | Path Traversal | fs.writeFile(filename), sendFile(path), URL path join |
| CWE-1321 | Prototype Pollution | Object.assign({}, userInput), deep-merge without key validation |
| CWE-89 | SQL Injection | Template SQL with string interpolation, ORM raw queries |
| CWE-918 | Server-Side Request Forgery | fetch(url), axios.get(url) on user-supplied input |
| CWE-287 | Authentication Bypass | Middleware logic gaps, token expiry not checked, route ordering |
| CWE-94 | Server-Side Template Injection | render(template, ctx) with unsanitized user input |
| CWE-78 | OS Command Injection | child_process.exec(), shell=True subprocess calls |
Four-stage filtering pipeline. Each finding must pass all four stages before it reaches /reviews.
Stage 1 — Severity floor
Findings below a minimum severity threshold (based on CVSS-equivalent impact scoring) are dropped before human presentation. Style preferences and variable naming suggestions don't make the cut.
Stage 2 — Confidence threshold
The model outputs a confidence score for each finding. Below a configurable floor, findings are held but not surfaced. This prevents hallucinated vulnerability claims.
Stage 3 — Dedup
Before posting, PullLight checks the existing PR comment thread (via GitHub API) for findings already commented on by CodeRabbit, Copilot, Sonar, Semgrep, etc. If a comment with matching file:line + similar description already exists, the finding is suppressed.
Stage 4 — Suppression rules
Findings are dropped if the code matches patterns like *.min.js, dist/, node_modules/, vendor/, test/, *.lock, __generated__/. Test fixtures, vendored deps, and compiled output never surface as findings.
Result: findings that reach /reviews are actionable, novel, and non-duplicative. You get fewer comments. They mean more.
Here's exactly what PullLight saw and produced for a real CVE — from PR diff to /reviews comment.
The PR diff PullLight analyzed
The full file contents, the package.json showing "jspdf": "^3.0.4" (vulnerable version range), and the surrounding writeFile() function definition showing that doc.save() calls writeFile directly without path resolution.
Severity: Critical → passes severity floor. Confidence: High (strong model flag on unvalidated user input → file write) → passes confidence threshold. Dedup: no existing comment at this file:line → not suppressed. Suppression rules: not in node_modules/, test/, or dist/ → passes.
After you approve in /reviews, this posts as a GitHub PR review comment.
Explicitly. Senior engineers always ask.
| What we DON'T do | Why it matters |
|---|---|
| We don't train on your code | API access — Anthropic's commercial terms contractually prohibit training on customer data. Your diffs are never stored beyond the current session and are never used to fine-tune any model. |
| We don't auto-merge | PullLight posts comments only. The Check Run reports a risk score (0–100) but conclusion stays neutral. You decide what to do with findings. |
| We don't approve PRs | Same as above — comments only, no approval state changes on GitHub. |
| We don't post style nits | No formatting, no lint complaints, no variable naming feedback unless it's a security risk (e.g., eval(userVariable)). |
| We don't read your full repo | Diff + on-demand file reads for referenced functions only. No clone, no branch scan, no git history traversal. |
| We don't post without your approval | Every finding sits in /reviews until you explicitly approve. No auto-publish. |
The benchmark: 8 real CVEs, all publicly confirmed and fixed before PullLight's testing. PullLight caught all 8. Every competitor (CodeRabbit, Greptile, Copilot PR Review, Qodo) caught 0.
CodeRabbit, Copilot PR Review, and Qodo are built for general code quality — readability, style, correctness. They are not security-purpose-built. They lack the bug-class taxonomy prior that makes PullLight effective on known vulnerability patterns.
When CodeRabbit reviews a diff like doc.save(req.body.filename), it sees a normal API call. The vulnerability lives in the origin of filename (user-supplied HTTP body) — not in the save call itself. A generic reviewer has no model of "user input → file write = path traversal." PullLight does, because the taxonomy prior encodes exactly this pattern.
1
Bug-class-aware prompting
The taxonomy prior biases the model toward known vulnerability patterns. It's the difference between "here's a PDF library call" and "here's a file write from user input — check for path traversal."
2
Larger context window
PullLight fetches surrounding code and config files — not just the diff hunk. Knowing the version range was jspdf: "^3.0.4" (vulnerable) is part of the context. General-purpose reviewers miss this.
3
Post-filter dedup
Many competitors post confidently and wrong, cluttering the PR thread. PullLight's four-stage filter ensures only novel, high-confidence, high-severity findings surface. You don't learn to ignore the bot.
The
filenameparameter fromreq.body.filenameis passed directly todoc.save()without any path sanitization. An attacker can supply../../app/config/evil.jsto write files outside the intended output directory. In server-side Node.js deployments, this enables arbitrary file write → RCE.Suggested fix: