back

LangChain JS Arbitrary File Read Vulnerability: The Playwright Design Problem

Summary

Vendor: LangChain / Microsoft Playwright

Vendor URLs: LangChain JS | Playwright

Versions Affected:

Advisory URL: https://huntr.com/bounties/23f45984-7336-48d8-a373-75b39bcd6367

CVE Identifier: N/A (LangChain) | Related: CVE-2026-44439 (PlaywrightCapture), CVE-2025-9611 (Playwright MCP)

Risk: High (vendor classified as Informative)

LangChain is an open-source framework designed to assist the development of applications powered by large language models (LLMs). The JavaScript library has more than 11,000 stars and 380,000+ weekly downloads.

I discovered an Arbitrary File Read (AFR) vulnerability in LangChain JS's PlaywrightWebBaseLoader component. This vulnerability allows an attacker to read arbitrary files on the server using file:// protocol URLs. The root cause is Playwright's design philosophy: Playwright does not act as a security boundary and allows all protocols including file:// by default.

The Playwright Problem

This vulnerability is not merely a LangChain implementation flaw—it stems from Playwright's architectural decisions:

Playwright's "Not a Security Boundary" Philosophy

Microsoft/Playwright team explicitly states that users fully control where requests are sent when using Playwright. Playwright is designed as a browser automation tool, not a security filter. Consequently:

Pattern of Similar Vulnerabilities

This issue continuously reappears across the Playwright ecosystem:

CVE-2026-44439 (PlaywrightCapture): The same file:// exploitation allowed arbitrary file reads via window.location.href manipulation. Fixed by adding only_global_lookup parameter to restrict local file access.

GHSA-665w-mwrr-77q3 (url-to-png): Playwright-based screenshot service exploited via file:// wrapper to read /etc/passwd. Mitigation: restrict to http/https only.

CVE-2025-9611 (Playwright MCP): Microsoft later acknowledged the risk in Playwright MCP Server, adding PLAYWRIGHT_MCP_ALLOW_UNRESTRICTED_FILE_ACCESS environment variable to control file access.

The Trust Boundary Violation

┌─────────────────────────────────────────┐
│           APPLICATION (Express.js)      │
│                  ↓                      │
│    LangChain PlaywrightWebBaseLoader    │
│    (NO URL VALIDATION)                  │
│                  ↓                      │
│         Playwright Browser              │
│    (ALLOWS file:// BY DESIGN)           │
│                  ↓                      │
│       OS File System (/etc/passwd)      │
└─────────────────────────────────────────┘

Utilizing AFR using SSRF

The vulnerability is exploited by manipulating the URL to access local files. Playwright processes file:// URIs exactly like HTTP URLs, rendering them in the browser context.

Example

Imagine a web application that provides a URL preview feature using PlaywrightWebBaseLoader. Without proper validation, an attacker provides a file URL:

Vulnerable Code Snippet

import express from 'express';
import { PlaywrightWebBaseLoader } from "@langchain/community/document_loaders/web/playwright";

const app = express();
const PORT = 9000;

app.get('/', async (req, res) => {
    const url = req.query.url;

    if (!url) {
        return res.status(400).send('URL query parameter is required');
    }

    try {
        const loader = new PlaywrightWebBaseLoader(url);
        const docs = await loader.load();  // Executes Playwright with user-controlled URL

        console.log(docs);
        res.send(docs);

    } catch (error) {
        console.error(error);
        res.status(500).send('An error occurred');
    }
});

app.listen(PORT, () => {
    console.log(`Server is running on port ${PORT}`);
});

Exploitation

Request:

GET /?url=file:%2f%2f%2fetc%2fpasswd HTTP/1.1
Host: localhost:9000

Response contains the full contents of /etc/passwd rendered by Playwright as HTML, proving that Playwright processed the file:// URL without restriction.

You can access the PoC code at: Langchain AFR PoC

Mitigation

Since Playwright does not provide security boundaries, applications MUST implement their own:

1. Protocol Validation

function isValidUrl(string) {
    try {
        const parsedUrl = new URL(string);
        if (parsedUrl.protocol !== 'http:' && parsedUrl.protocol !== 'https:') {
            throw new Error('Unsupported protocol');
        }
        return true;
    } catch (error) {
        return false;
    }
}

2. Playwright Route Interception (Defense in Depth)

const { chromium } = require('playwright');

const browser = await chromium.launch();
const page = await browser.newPage();

// Block file:// requests at Playwright level
await page.route('**/*', route => {
    const url = route.request().url();
    if (url.startsWith('file://') || url.startsWith('ftp://')) {
        route.abort();
    } else {
        route.continue();
    }
});

3. Recommended Security Measures

Disclosure

LangChain Response: Report marked as "Informative" with the statement that Playwright is responsible for URL handling.

Microsoft/Playwright Response: "The user fully controls where the requests are sent when using Playwright." Playwright is not classified as a security boundary.

My Position:My Position: While Playwright's design places validation responsibility on developers, LangChain's PlaywrightWebBaseLoader exposes this unsafe default to end users without documentation or warnings. It is worth noting that Microsoft has since addressed similar patterns in other Playwright-based products—such as introducing PLAYWRIGHT_MCP_ALLOW_UNRESTRICTED_FILE_ACCESS in Playwright MCP Server (CVE-2025-9611) and implementing request routing restrictions in PlaywrightCapture (CVE-2026-44439)—suggesting the security community and the ecosystem increasingly recognize the importance of built-in safeguards for this attack vector.

The LangChain documentation (link) contains no security warnings regarding URL input validation when receiving user-controlled URLs.

Timeline

2024-05-20 - Initial discovery and report to LangChain

2024-05-25 - v1.0 published

2024-05-30 - v1.1 updated with Microsoft response

2024-10-08 - v1.2 updated

2025-01-XX - CVE-2025-9611 disclosed (Playwright MCP)

2026-XX-XX - CVE-2026-44439 disclosed (PlaywrightCapture) - Same pattern confirmed