Replace regex with DOMParser in GPT document.write rewriting#503
Open
ChristianPavilonis wants to merge 3 commits intomainfrom
Open
Replace regex with DOMParser in GPT document.write rewriting#503ChristianPavilonis wants to merge 3 commits intomainfrom
ChristianPavilonis wants to merge 3 commits intomainfrom
Conversation
e94fa50 to
303975f
Compare
303975f to
248afe3
Compare
Collaborator
Author
|
dismissed github bot findings that were introduced from previous commits and already determined to be false positives. |
aram356
requested changes
Mar 16, 2026
Collaborator
aram356
left a comment
There was a problem hiding this comment.
Summary
Replaces a fragile regex with DOMParser-based HTML parsing for GPT document.write interception — a good direction with solid test coverage. One blocking issue around a missing fast-path guard.
Blocking
🔧 wrench
- Missing fast-path guard:
rewriteHtmlStringenters the DOMParser / fail-closed path for all HTML, not just GPT-related strings. Non-GPTdocument.writecalls will be silently dropped when DOMParser is unavailable, and unnecessarily parsed when it is. (script_guard.ts:208)
Non-blocking
🤔 thinking
replaceAllis a blunt instrument: Replaces the URL everywhere in the string, not just insidesrcattributes. Unlikely to cause issues with GPT's simple output, but less precise than the old regex. (script_guard.ts:229)
CI Status
- cargo fmt: PASS
- cargo clippy: PASS
- cargo test: PASS
- vitest: PASS
- format-docs: PASS
- format-typescript: PASS
- CodeQL: PASS
prk-Jr
requested changes
Mar 16, 2026
Collaborator
prk-Jr
left a comment
There was a problem hiding this comment.
Summary
This change replaces the GPT document.write regex rewrite path with a DOMParser-based implementation and adds broader coverage for edge-case markup. The parser-based approach is directionally better, but the new replacement strategy introduces a behavior regression for non-GPT writes under parser failure and a proxy-bypass case for HTML-escaped URLs.
Blocking
🔧 wrench
- Non-GPT
document.writeoutput gets dropped whenDOMParseris unavailable or broken:rewriteHtmlString()returns an empty string before it knows whether the HTML actually contains a GPT URL, so benign writes are erased in the very failure modes this PR adds support for. - HTML-escaped query strings can bypass the rewrite and leak back to the Google host: the new flow reads
srcthroughgetAttribute()and then runsreplaceAll()on the original HTML string, which misses common serialized forms such as&and can emit the originalsecurepubads.g.doubleclick.netURL unchanged.
Non-blocking
🤔 thinking
- The new test section still misses the DOMParser-specific regression cases above: adding coverage for escaped entities like
&and for non-GPT pass-through whenDOMParseris unavailable would catch both regressions.
CI Status
- fmt: PASS
- rust tests: PASS
- vitest: PASS
- CodeQL: PASS
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
SCRIPT_SRC_REregex withDOMParser-based HTML parsing in the GPTdocument.write/document.writelninterception layer, fixing edge cases where unquoted attributes, unusual spacing, or mixed quote styles caused the regex to fail-open and bypass the first-party proxy.DOMParseris unavailable or throws, return an empty string instead of passing through unproxied GPT URLs.writelnparity, non-GPT passthrough, protocol-relative URLs, and both fail-closed scenarios.Changes
crates/js/lib/src/integrations/gpt/script_guard.tsSCRIPT_SRC_REregex and regex-basedrewriteHtmlStringwithDOMParser.parseFromStringimplementation; usesgetAttribute("src")+replaceAll()for robust URL swapping; adds fast-path skip when GPT domain not present; fails closed on errorcrates/js/lib/test/integrations/gpt/script_guard.test.tsCloses
Closes #407
Test plan
cargo test --workspacecargo clippy --all-targets --all-features -- -D warningscargo fmt --all -- --checkcd crates/js/lib && npx vitest runcd crates/js/lib && npm run formatChecklist
unwrap()in production code — useexpect("should ...")tracingmacros (notprintln!)