Skip to content

Harden inline creative rendering#459

Open
prk-Jr wants to merge 6 commits intomainfrom
fix/401-harden-inline-creative-rendering
Open

Harden inline creative rendering#459
prk-Jr wants to merge 6 commits intomainfrom
fix/401-harden-inline-creative-rendering

Conversation

@prk-Jr
Copy link
Collaborator

@prk-Jr prk-Jr commented Mar 7, 2026

Summary

  • Add server-side HTML sanitization for inline ad creatives using lol_html: strip executable elements (script, iframe, object, embed, base, meta, form, link, style, noscript), on* event-handler attributes, and dangerous URI schemes (javascript:, vbscript:, unsafe data: URIs including data:image/svg+xml).
  • Sanitization runs server-side before URL rewriting so the client only ever receives clean markup.
  • Tighten the client-side iframe sandbox: remove allow-scripts and allow-same-origin, keeping only allow-popups, allow-popups-to-escape-sandbox, and allow-top-navigation-by-user-activation.
  • Fail closed on oversized input and parse errors (return empty string rather than raw markup).
  • Validate bid.adm before rendering; reject and log malformed or empty creatives with structured metadata, never logging raw creative HTML.
  • Add regression coverage for sandbox permissions, dangerous URI/style payloads, malformed creatives, and accepted safe markup.

Changes

File Change
crates/common/src/creative.rs Add sanitize_creative_html: lol_html-based server-side sanitizer that strips dangerous elements and attributes. Fail closed on oversized input and parse errors.
crates/common/src/auction/formats.rs Run sanitize_creative_html before rewrite_creative_html so URL rewriting only ever sees clean markup.
crates/js/lib/src/core/render.ts Simplify client-side sanitizeCreativeHtml to type/emptiness validation only (server handles sanitization). Tighten iframe sandbox — remove allow-scripts and allow-same-origin.
crates/js/lib/src/core/request.ts Guard against missing bid.adm. Clear slot only after sanitization succeeds so rejected creatives never blank existing content. Add structured rejection logging.
crates/js/lib/test/core/render.test.ts Cover sandbox tokens, accepted safe markup, rejected type/emptiness payloads, and buildCreativeDocument template injection.
crates/js/lib/test/core/request.test.ts Cover safe request-path rendering plus fail-closed behavior for dangerous, malformed, and empty creatives without logging raw HTML.

Closes

Closes #401

Test plan

  • cargo test --workspace
  • cargo clippy --all-targets --all-features -- -D warnings
  • cargo fmt --all -- --check
  • JS tests: cd crates/js/lib && npx vitest run
  • JS format: cd crates/js/lib && npm run format
  • Docs format: cd docs && npm run format
  • WASM build: cargo build --bin trusted-server-fastly --release --target wasm32-wasip1
  • Manual testing via fastly compute serve
  • Other: cd crates/js/lib && npm run build

Checklist

  • Changes follow CLAUDE.md conventions
  • No unwrap() in production code — use expect("should ...")
  • Uses log macros (not println!)
  • New code has tests
  • No secrets or credentials committed

@prk-Jr prk-Jr self-assigned this Mar 7, 2026
Copy link
Collaborator

@aram356 aram356 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

Solid security hardening — removing allow-scripts/allow-same-origin from the sandbox, rejecting dangerous creatives rather than silently sanitizing, and fixing String.replace $-sequence injection. However, slot clearing before validation creates a regression where rejected bids blank the slot, and the URI detection has gaps that could cause both false negatives (data:image/svg+xml) and false positives (benign URI attribute removals).

Blocking

🔧 wrench

  • Slot blanking on rejection: container.innerHTML = '' runs before sanitization — rejected creatives destroy existing slot content. In multi-bid scenarios, a later rejected bid erases an earlier successful render. (request.ts:96)
  • Missing bid.adm guard: The bid.adm && check from main was removed, so bids with missing/empty/malformed adm enter the render path, clear the slot, then get rejected. (request.ts:51)
  • Narrow data: URI pattern: Only blocks data:text/html, missing data:text/xml, data:application/xhtml+xml, and data:image/svg+xml (SVG can embed <script>). (render.ts:35)
  • Over-aggressive URI attribute flagging: isDangerousRemoval flags any removed URI attribute as dangerous regardless of value, causing false rejections for benign creatives. Inconsistent with hasDangerousMarkup which correctly checks the value. (render.ts:108)

Non-blocking

🤔 thinking

  • 3.8x bundle size increase: DOMPurify is statically imported into tsjs-core.js (8,964 B → 34,160 B raw, 3,788 B → 12,940 B gzip). The build uses inlineDynamicImports: true so lazy import() won't help. Since the policy is reject-only, hasDangerousMarkup (native <template> parser) already does the full detection. Consider removing DOMPurify entirely or moving sanitization server-side.
  • Static-only creative contract without rollout guard: Removing allow-scripts + allow-same-origin and rejecting script-bearing markup is a major behavioral shift. Most DSP creatives use JavaScript for tracking, viewability, and click handling. Consider a strict-render feature flag (default off) with rejection metrics, rolled out by seat/publisher.

♻️ refactor

  • Inconsistent sandbox policy: <form> is in DANGEROUS_TAG_NAMES (rejected) but allow-forms is in CREATIVE_SANDBOX_TOKENS (permitted). Remove allow-forms or stop rejecting <form>. (render.ts:38)
  • hasDangerousMarkup lacks intent documentation: The post-sanitization re-scan is a valid safety net for sanitizer bugs, but the comment doesn't explain why DOMPurify output is being re-scanned. (render.ts:119)

⛏ nitpick

  • srcdoc in URI_ATTRIBUTE_NAMES: srcdoc is HTML content, not a URI. DOMPurify already strips it. (render.ts:33)

🌱 seedling

  • Missing test coverage: (1) multi-bid same slot where one bid is rejected, (2) sanitizer-unavailable path, (3) data:image/svg+xml with embedded script, (4) explicit test documenting script-based creatives are intentionally rejected.

👍 praise

  • buildCreativeDocument $-sequence fix: Function callbacks in String.replace prevent replacement pattern injection. Well-tested. (render.ts:337)
  • Structured rejection logging: Rejection logs include metadata without leaking raw creative HTML. Tests verify no raw HTML in log output. (request.ts:100)

CI Status

  • cargo fmt: PASS
  • cargo test: PASS
  • vitest: PASS
  • format-typescript: PASS
  • format-docs: PASS
  • CodeQL: PASS

Copy link
Collaborator

@ChristianPavilonis ChristianPavilonis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No new findings in this review pass.

Copy link
Collaborator

@aram356 aram356 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

Adds server-side HTML sanitization for inline ad creatives via lol_html, tightens the client-side iframe sandbox, and improves render-path logging. The sanitizer architecture (server strips dangerous markup, client validates type/emptiness, sandbox enforces no-script) is sound, but the fail-open fallback paths and unsanitized <style> content need to be addressed before merge.

Blocking

🔧 wrench

  • Fail-open on oversized input: sanitize_creative_html returns raw markup when input exceeds MAX_CREATIVE_SIZE — should return empty string (creative.rs:355)
  • Fail-open on parse errors: Raw markup returned when lol_html fails to parse — should return empty string (creative.rs:464)
  • <style> element content not sanitized: Inline style attributes are checked but <style> blocks pass through with expression(), @import, etc. (creative.rs:402)

❓ question

  • Is preserving <style> elements intentional?: <link> is stripped but <style> is allowed — inconsistent treatment (creative.rs:393)

Non-blocking

🤔 thinking

  • Proxy path skips sanitization: CreativeHtmlProcessor (in proxy.rs) only runs through rewrite_creative_html, not sanitize_creative_html. Probably intentional since proxied pages may legitimately need scripts/iframes, but worth documenting the trust boundary difference.
  • removedCount always 0 on client: Client-side sanitization fields are always identity/zero, could mislead operators (render.ts:71)

♻️ refactor

  • data-src and srcset not checked: Missing from the URI attribute check list for defense-in-depth (creative.rs:413)

🌱 seedling

  • Missing sanitizer + rewriter integration test: No test runs both in sequence as formats.rs does

📝 note

  • Sandbox removes allow-scripts: Deliberate defense-in-depth but will break creatives relying on inline JS for click tracking, viewability, or animation — worth validating with real-world ad creatives

⛏ nitpick

  • unwrap_or("") on infallible split: .split().next() can never be None (creative.rs:323)

CI Status

All 10 checks pass: cargo fmt, cargo test, vitest, format-typescript, format-docs, CodeQL, Analyze (actions, javascript-typescript x2, rust).

@prk-Jr prk-Jr requested a review from aram356 March 16, 2026 03:47
Copy link
Collaborator

@ChristianPavilonis ChristianPavilonis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

Well-executed security hardening PR. The defense-in-depth architecture is sound — server-side lol_html sanitization, client-side type/emptiness validation, tightened iframe sandbox. All CI passes. Previous review feedback fully addressed. Two minor findings below.


PR description is stale: The summary table still references crates/js/lib/package.json and crates/js/lib/package-lock.json with DOMPurify changes, but these files aren't in the diff. The current architecture (server-side lol_html, no client-side DOMPurify) is better — please update the description to match.

// discards the tag — but the handler still fires. This is intentional and
// harmless.
element!("*", |el| {
let on_attrs: Vec<String> = el
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔧 Consider stripping <noscript>: Consider adding <noscript> to the stripped elements list. In the no-script sandbox (allow-scripts removed), browsers render <noscript> content. While not a direct bypass — URL rewriting catches tracking pixels downstream — stripping it is low-effort defense-in-depth against parser differential attacks.

|| url.starts_with("vbscript:")
|| (url.starts_with("data:") && !is_safe_data_uri(&url))
});
if is_dangerous {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 Document fail-closed srcset stripping: Worth a brief comment here documenting the intentional choice to strip the entire srcset attribute (losing safe entries) rather than selectively filtering individual entries. The current fail-closed behavior is correct — a mixed safe/dangerous srcset likely indicates a malicious creative — but the reasoning isn't obvious to future readers.

Copy link
Collaborator

@ChristianPavilonis ChristianPavilonis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Summary

Excellent security hardening PR with textbook fail-closed design. A few observations worth considering — none are blockers.

Highlights:

  • Textbook fail-closed design on every error path — parse failures, oversized inputs, and non-UTF-8 output all return empty string rather than passing through unsanitized markup
  • Three-layer defense: server-side sanitization → client-side type/emptiness validation → sandboxed iframe with minimal permissions
  • The $-pattern injection fix in buildCreativeDocument (using () => replacement callbacks) is subtle and well-tested
  • Comprehensive test coverage: 30+ sanitizer tests in Rust covering element removal, attribute stripping, URI scheme filtering, srcset parsing, and size limits; full JS coverage for the client-side validation and rendering pipeline
  • Structured logging without raw HTML prevents log injection — originalLength/sanitizedLength/removedCount fields give observability without leaking creative content

type RenderCreativeInlineOptions = {
slotId: string;
// Accept unknown input here because bidder JSON is untrusted at runtime.
creativeHtml: unknown;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔧 P1 — parseAuctionResponse coerces adm to empty string, making unknown typing unreachable via normal flow

parseAuctionResponse does adm: b.adm ?? '', which means if the server sends a bid with no adm field (or adm: null), the parsed bid will have adm: '' (a string). The guard in request.ts (if (!bid.adm)) tests for falsiness — empty string is falsy, so this particular case works.

However, creativeHtml is typed as unknown here explicitly because "bidder JSON is untrusted at runtime", yet the AuctionBid interface types adm as string. This means sanitizeCreativeHtml(creativeHtml) will never actually receive a non-string value through the normal auction flow — the typeof !== 'string' check in sanitizeCreativeHtml is unreachable for the standard code path.

The defense-in-depth is valuable, but consider either:

  1. Removing the ?? '' coercion in the auction parser so client-side validation can catch null/undefined/non-string adm values in real traffic, or
  2. Adding a clarifying comment in the parser acknowledging the type coercion and that the client-side unknown guard is a second line of defense for non-standard callers

// srcset attribute is removed rather than filtering individual entries.
// A mixed safe/dangerous srcset is a strong indicator of a malicious
// creative, so dropping the whole attribute is the correct response.
if let Some(val) = el.get_attribute("srcset") {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔧 P2 — Missing imagesrcset handling in sanitizer

The sanitizer checks srcset for dangerous URLs but does not check imagesrcset (used on <link> and <source> elements). The URL rewriter already handles imagesrcset, which means a creative could contain an element with imagesrcset="javascript:alert(1) 1x" that passes through the sanitizer unchecked.

While <link> elements are stripped by the sanitizer (so that vector is covered), imagesrcset can appear on <source> and potentially other elements. Since <source> is not in the removal list, this is a gap.

Practical risk is low given the sandbox restrictions, but for completeness and defense-in-depth, imagesrcset should get the same treatment as srcset.

Suggested fix: duplicate the srcset check block for imagesrcset:

if let Some(val) = el.get_attribute("imagesrcset") {
    let is_dangerous = val.split(',').any(|entry| {
        let url = entry
            .trim()
            .split_ascii_whitespace()
            .next()
            .unwrap_or("")
            .to_ascii_lowercase();
        url.starts_with("javascript:")
            || url.starts_with("vbscript:")
            || (url.starts_with("data:") && !is_safe_data_uri(&url))
    });
    if is_dangerous {
        el.remove_attribute("imagesrcset");
    }
}

HtmlSettings {
element_content_handlers: vec![
// Remove executable/dangerous elements along with their inner content.
element!("script", |el| {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 P2 — 10 identical element-removal handlers could be consolidated

Ten nearly identical element!("tag", |el| { el.remove(); Ok(()) }) handlers are defined here (lines 364–409). lol_html supports comma-separated selectors in a single element! macro call.

This could be consolidated into a single handler:

element!("script, iframe, object, embed, base, meta, form, link, style, noscript", |el| {
    el.remove();
    Ok(())
})

This would reduce ~50 lines to ~5 and make it easier to maintain the blocklist (adding or removing a tag is a single edit instead of a copy-paste block).

// subtypes can carry HTML/JS payloads inside CSS url() values).
if let Some(style) = el.get_attribute("style") {
let lower = style.to_ascii_lowercase();
if lower.contains("expression(")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 P2 — Style attribute sanitization uses simple substring matching — CSS escapes could bypass

The inline style attribute check uses simple substring matching (lower.contains("expression("), lower.contains("javascript:"), etc.). CSS allows various forms of obfuscation that could bypass these checks:

  • CSS escape sequences: \65xpression(expression(
  • CSS comments: expr/**/ession(
  • Null bytes between characters

Since the iframe sandbox removes allow-scripts and allow-same-origin, CSS expression attacks won't execute in practice (they only work in IE6–8 anyway). The current approach is reasonable for the threat model.

Consider either:

  1. Stripping all style attributes entirely for maximum strictness, or
  2. Adding a // NOTE: comment documenting that CSS-escape bypass is mitigated by the sandbox, so a future reviewer is aware of this trade-off

The current defense is adequate given the sandbox — this is just about making the design rationale explicit.


export type CreativeSanitizationRejectionReason = 'empty-after-sanitize' | 'invalid-creative-html';

export type AcceptedCreativeHtml = {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 P2 — sanitizeCreativeHtml result types have misleading fields for client-side usage

AcceptedCreativeHtml and RejectedCreativeHtml include sanitizedLength and removedCount fields, but the client-side sanitizeCreativeHtml function never actually sanitizes content — it only validates type and emptiness. For accepted creatives, sanitizedLength === originalLength and removedCount === 0 always.

The comment at line 40–44 does a good job explaining this, but the fields still add noise to logging output (every render log line includes sanitizedLength and removedCount that carry no signal).

Consider either:

  1. Simplifying the client-side types to only include fields that carry signal (e.g., originalLength and rejectionReason), or
  2. Renaming the function/type to validateCreativeHtml / ValidateCreativeHtmlResult to make the distinction from server-side sanitization clearer

import NORMALIZE_CSS from './styles/normalize.css?inline';
import IFRAME_TEMPLATE from './templates/iframe.html?raw';

const CREATIVE_SANDBOX_TOKENS = [
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 P2 — allow-forms silently removed from sandbox

The previous sandbox included allow-forms. The PR description mentions removing allow-scripts and allow-same-origin, but allow-forms was also removed without being called out.

Removing allow-forms could break ad creatives that use forms for lead generation (e.g., email signup forms within ad units). The server-side sanitizer already strips <form> elements, so this is consistent — but it's a notable behavioral change that should be documented.

Either:

  1. Add allow-forms back if form submission from creatives is a valid use case for your publishers, or
  2. Explicitly document the allow-forms removal in the PR description and confirm it's intentional (since the server strips <form> anyway, this is likely fine — just worth calling out)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Unsanitized creative HTML injected into iframe with weakened sandbox

3 participants