Forked from npacker/web-tools
Project Files
CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
LM Studio plugin that exposes four web-oriented tools to local LLMs β Web Search, Image Search, Visit Website, and Fetch Images β built on @lmstudio/sdk. Descended from Daniel Sig's original lms-plugin-duckduckgo and lms-plugin-visit-website plugins, merged and extended by Nigel Packer.
npm run dev β run plugin in LM Studio dev mode (lms dev)npm run push β publish to LM Studio Hub (lms push)npm run lint / npm run lint:fix β ESLint on src/**/*.tsnpm run format / npm run format:check β Prettiernpm run knip β dead-code / unused-export checkA local pre-commit hook at .git/hooks/pre-commit runs lint, format:check, and knip sequentially and aborts the commit on any failure. The hook is not committed to the repo β fresh clones need to reinstall it. Bypass with git commit --no-verify when necessary.
No test suite is configured. TypeScript targets ES2023 / CommonJS. Requires Node ^22.17.0 || >=24 (fetch, AbortSignal.any, and stable util.MIMEType; the EOL 23.0β23.10 window predates MIMEType's stabilisation).
Entry point src/index.ts registers a config schematic and a tools provider with the LM Studio SDK.
src/parsers/page-text.ts feeds raw HTML to @mozilla/readability to strip boilerplate (nav, sidebars, comments), then routes Readability's .content HTML through either:
Both paths share src/text/normalize-blank-lines.ts for trailing-whitespace and blank-line collapsing. The parser returns the full extracted content untruncated β bounding it to a budget and biasing it toward search terms is the retrieval layer's job (see below). Both contentFormat (plugin select field, default "markdown") and contentLimit are plugin-only β neither is exposed as a tool parameter so the model cannot override the user-set or default values. The tool still returns contentLength, the pre-truncation character count, so the model can detect truncation and refine with findInPage.
src/retrieval/ bounds extracted text to the contentLimit budget and, when findInPage terms are supplied, biases the returned excerpt toward the most relevant chunks rather than head-slicing. It is a chunk β rank β select β assemble pipeline split one module per stage so a future embedding-based retriever (RAG) can swap the lexical ranker without touching the rest:
The barrel exports only buildExcerpt/Excerpt; the pipeline stages stay internal to the module (re-exporting them would trip knip's unused-export check). renderPageResult calls buildExcerpt for both HTML (on the parser's full content) and non-HTML kinds (on the pre-extracted PDF/text/JSON body), unifying the budget policy across page kinds.
src/parsers/page-images.ts scrapes <img> tags in document order, resolves relative src against the page URL, deduplicates, and returns { src, alt, title } tuples (up to maxImages). Fetch Images then downloads each via downloadImages and the result records are assembled by the renderers layer (see below).
src/renderers/ is the response-assembly layer for the two page-consuming tools β the counterpart to parsers/, which extracts structured fragments from raw content. Where parsers/ turns HTML/PDF/bytes into intermediate fragments, renderers/ composes those fragments (and other inputs) into the LLM-facing payload, sitting one layer above fetchPage. The two renderers share a parallel render<Subject>Result(s) β <Subject>Result shape, with singular/plural reflecting cardinality (one page result; many image results). src/renderers/page-result.ts (renderPageResult β PageResult) builds the Visit Website result by narrowing on the fetched page's kind, orchestrating the parsers/ HTML/text extractors, and bounding the result through the retrieval/ excerpt builder. src/renderers/image-results.ts (renderImageResults β ImageResult[]) pairs each ImageSubject with its positional downloadImages outcome (DownloadedImage), emitting a markdown reference on success ( derived via , markdown escaped via ) or a message on failure. The two search tools have no renderer β their parse/enrich output is already response-ready, so they return it directly.
Three disk-backed TTL caches (via cacache) are constructed once in toolsProvider and shared across tools. They persist across plugin reloads β clearing requires removing the cacache directory at ~/.lmstudio/plugin-data/lms-plugin-duckduckgo-cache:
Cache sizes and subdirs are defined in src/tools-provider.ts; the TTLCache implementation is in src/cache/ttl-cache.ts. TTL defaults live in src/config/config-defaults.ts. Fetch Images downloads are not cached because they land in chat-scoped working directories.
Bing renders each image tile as <a class="iusc" m="<JSON>">. The JSON blob carries murl (full image URL), purl (source page URL), and t (title), among others; only those fields are typed in src/bing/parse-results.ts. jsdom returns the m attribute already entity-decoded, so the parser feeds it straight to JSON.parse; malformed tiles are swallowed individually rather than aborting the result list, and tiles whose murl doesn't end in a supported image extension are filtered out. Bing returns ~35 tiles per HTML page; pagination advances by 35 via the first query parameter. imageMaxResults slices the parsed list β its slider tops at 35 because that's Bing's natural page size.
DuckDuckGo uses non-obvious p param values: strictβ"1", moderateβ"", offβ"-1". Encoding lives in src/duckduckgo/build-urls.ts. Bing accepts the literal mode strings (strict/moderate/off) on its adlt param, so src/bing/build-urls.ts just passes the SafeSearch value through. The SafeSearch type itself is provider-neutral and lives in src/search/safe-search.ts.
Two error hierarchies are load-bearing:
FetchError (src/http/fetch-error.ts) β HTTP/network failures, carries url and optional cause.NoResultsError base with NoWebResultsError / NoImageResultsError (src/errors/no-results-error.ts).formatToolError in src/errors/tool-error.ts converts these into user-facing strings per tool kind (web-search, image-search, website, image-download), including abort-detection via DOMException.name === "AbortError".
ESLint enforces two rules on src/tools/*-tool.ts: the file must contain exactly one exported create<Name>Tool factory returning Tool, and module-level function declarations other than that factory are banned. Per-tool helpers either live in a sibling module (e.g. src/fs/url-filename.ts, src/parsers/, src/renderers/) or are inlined inside the implementation arrow. Interfaces at module scope are allowed.
A file is named for its domain. Prefer a descriptive noun for the concept the module owns (src/timing/rate-limiter.ts β RateLimiter, src/http/ssrf.ts β assertPublicUrl, src/fs/markdown-path.ts β toMarkdownPath, src/enrichment/metascraper.ts β createMetascraper); a verb-phrase filename is acceptable only for a module that mirrors a single verb export (src/images/download-image.ts β downloadImage, src/page/fetch-page.ts β fetchPage). Never use vague or filler names β no retry, helpers, utils, or bare *-guard; name the actual concept (, , ). A module grouping several cohesive exports takes the domain-concept noun (, ). Export identifiers stay independent of the filename: functions/predicates are verbs (, ), classes/types/values are nouns (). Reserve the prefix for a function that converts a value a defined type ( returns a markdown path); do not use it for a function that returns or an ad-hoc shape. When a helper exists only to dispatch to another function, echo that function's verb rather than coining a new one (a wrapper delegating to is , not ). Never invent a name when the convention already fixes one; describing what a function does always beats describing where it is called.
src/enrichment/ wires metascraper into the web-search flow. metascraper.ts builds a single in-tree rule plugin that resolves date, type, and description against OpenGraph, microdata, JSON-LD, and standard meta tags. Rules use @metascraper/helpers for the heavy lifting: helpers.date (chrono-node-backed) for ISO normalization across many input formats, helpers.$jsonld for memoized JSON-LD lookups so multiple property accesses on the same page reuse one parse pass, and helpers.description for the 500-char-clamped description sanitizer. The local og:type rule keeps a thin trimmed() helper since helpers does not export a generic string sanitizer. The date rule chain prefers article:modified_time over article:published_time so the model sees the most recent change date; helpers' date() collapses both into a single ISO 8601 value rather than splitting them. Types for @metascraper/helpers (which ships pure JS) are declared inline in src/enrichment/metascraper-helpers.d.ts. The wrapper only emits keys whose extraction succeeded so the per-result merge in cannot pollute records with properties. The fan-out runs concurrently via and gates each fetch on the so distinct domains run in parallel while same-host calls still observe ; the website cache is consulted first per result so warm enrichment pays no rate-limit cost. Non-HTML pages (PDF, plain text, JSON) are returned without metadata since the rules only match parsed HTML.
createWebSearchTool, createImageSearchTool, createVisitWebsiteTool, createFetchImagesTool). Config resolves via resolveConfig in src/config/resolve-config.ts, reading plugin UI settings from src/config/config-schematics.ts and falling back to the built-in defaults in src/config/config-defaults.ts, which the schematics and the resolver share so the two layers cannot drift (plugin config > default). All settings are plugin-driven; the tools expose no per-call config overrides.RateLimiter (src/timing/rate-limiter.ts, backed by bottleneck) enforces a requestIntervalSeconds gap (default 5s) between successive outbound-request initiations (the single-target flows hand the limiter to fetchOrThrow via RequestOptions.limiter, which awaits it before fetching, so the interval spaces request starts rather than request completions) for the global / single-target flows (DuckDuckGo web search, Bing image search, Visit Website, Fetch Images downloads). Web-search enrichment instead drives its fan-out through a PerHostRateLimiter (src/timing/per-host-rate-limiter.ts, backed by Bottleneck.Group) keyed by URL host: requests targeting the same host still observe requestIntervalSeconds, but results spanning distinct domains run in parallel so a 10-result enrichment pass costs roughly one fetch's worth of wall time rather than ten.impit client (src/http/impit-client.ts) wrapped by withFetchRetry in src/http/fetch-retry.ts (maxRetries, retryInitialBackoffSeconds, retryMaxBackoffSeconds). The single shared GET entry point is fetchOrThrow in src/http/fetch.ts, which throws FetchError on transport failure or non-2xx; it composes the retry wrapper over src/http/redirects.ts (manual redirect following with a per-hop SSRF re-check via assertPublicUrl in src/http/ssrf.ts). Image downloads go through the same entry point with a shorter 10s per-attempt timeout via RequestOptions.timeoutMs. Do not replace impit with fetch β it applies browser TLS fingerprints and headers that DuckDuckGo's and Bing's anti-bot layers require (a bare fetch against Bing image search returns a degraded mobile variant; DDG's web search blocks bare fetch outright β see commit 9e97d38).search-web.ts, build-urls.ts, and the WebSearchResult domain type (web-search-result.ts) for the DDG /html/ web-search endpoint. Image search lives separately in src/bing/ (search-images.ts, parse-results.ts, build-urls.ts) and hits Bing's /images/search HTML page. The provider-neutral SafeSearch type lives in src/search/safe-search.ts and is consumed by both providers; DDG-specific encoding lives in src/duckduckgo/build-urls.ts, Bing passes the mode strings through unchanged..result__a); src/bing/parse-results.ts parses Bing image-search HTML by reading the JSON-encoded m attribute on each <a class="iusc"> tile. Image-format reasoning (URL/extension sniffing, supported-format predicates) lives in src/parsers/image-extensions.ts and is shared by the Bing parser, the image downloader, and the page-image scraper.fetchPage (reusing the page cache + per-host rate limiter) and run through a shared metascraper instance (src/enrichment/) that pulls date, type, and description from OpenGraph, microdata, JSON-LD, and standard HTML meta tags. The wrapper omits keys whose extraction yielded no value, so absent fields don't appear on the returned object rather than appearing as undefined. Cache hits skip the rate limiter; per-result failures are silently demoted to an unenriched shape rather than aborting the search. The full enriched payload is what gets cached under the search-enriched subdir, so warm queries skip both the DuckDuckGo fetch and the per-result fan-out.{ title, url, snippet?, date?, type?, description? } records (snippet omitted when includeSnippets is disabled; the three metadata keys are omitted when extraction yielded nothing for that result). Image Search is discovery-only: it returns { image, title?, sourcePage? } records where image is the remote full-resolution URL (Bing's murl), with title and sourcePage populated from Bing's tile metadata when present β no files are written to disk. Fetch Images is the only path to disk for images; it downloads via src/images/download-images.ts into the per-chat working directory obtained from ctl.getWorkingDirectory(). That call is made inside the tool's implementation (not at toolsProvider setup), since the SDK only attaches a working directory when a tool is actually invoked from a chat. Visit Website (src/page/fetch-page.ts) returns only the page title, first-level headings, and a readable-content excerpt β no image download or link extraction. Fetch Images accepts explicit HTTP(S) URLs and/or scrapes images from a given page, returning per-image records with filename, alt, title, and a markdown reference to the downloaded file. The intended workflow is Image Search β Fetch Images β embed the returned markdown in the reply.turndown service (ATX headings, - bullets, fenced code, inline links, inline images). script/style/noscript/template are stripped before conversion.html-to-text with token-conservative options: word wrapping disabled, anchor URLs dropped (only inner text kept), <img>/<noscript>/<template> skipped, headings and table headers left in source case rather than uppercased, and list items prefixed with - .chunkText) splits normalized text into paragraph chunks on blank-line boundaries.rankChunksByTerms) ranks chunks by fuzzy relevance to the terms via Fuse.js (score threshold 0.3). This is the RAG swap point β any (chunks, terms) => number[] ranker can replace it without touching the chunker, the selector, or the orchestrator.selectChunks) grows the selection outward from the best matches through their Β±n neighbours until the budget is filled (factoring in the join-separator width), and returns the chosen chunk indices in source order β assembling them into text is the orchestrator's job.buildExcerpt β Excerpt) orchestrates the pipeline: it assembles the selected chunks into the excerpt body (joining and truncating to the budget), or falls back to a leading slice when no terms are supplied or no chunk matches, and reports totalLength (the pre-truncation length) that the renderer surfaces as contentLength.filenamesearch-enriched) β up to 100 entries, searchCacheTtlSeconds (default 15 min). Keyed by query, safe-search, page, result cap, and enrichment flag (the cap is part of the key because the stored payload is sliced to it, so a changed cap re-fetches instead of serving a stale slice). Stores the post-enrichment payload, so warm queries skip both the DuckDuckGo fetch and the per-result fan-out. The legacy search subdir from before enrichment landed is orphaned; it can be deleted by hand alongside the rest of the cacache directory.image-search) β up to 100 entries, also gated by searchCacheTtlSeconds. Stores { results, count } keyed by query, safe-search, page, and result cap so repeat image searches skip the Bing fetch and the shared rate limiter. Kept on a separate cache instance from web search so the two flows do not compete for eviction slots and the payload shapes stay typed independently.websiteCacheTtlSeconds (default 10 min, plugin config key kept as websiteCacheTtlSeconds to match the user-facing setting). Internally exposed as pageCache and stored under the on-disk subdir "website" (also preserved). Shared by Visit Website, Fetch Images, and the web-search enrichment pass β a result that lands here once is cheap to revisit by any of the three flows.fetch-retryerror-messagessrffollowRedirectsisAbortErrorRateLimiterto<X>XtoMarkdownPathunknownwrapImpitErrorwrapHopFailuretoHopErrorenrich-search-results.tsundefinedPromise.allPerHostRateLimiterrequestIntervalSeconds@lmstudio/sdk β plugin/tool registrationimpit β HTTP client with TLS + header fingerprinting (required for anti-bot)jsdom β HTML parsing@mozilla/readability β readable article extraction for Visit Website (boilerplate removal, not text extraction)turndown β HTML β Markdown conversion for Visit Website's content fieldhtml-to-text β HTML β plain-text conversion for Visit Website's content field when markdown is opted outmetascraper + @metascraper/helpers β meta tag / OpenGraph / JSON-LD extraction backing the web-search enrichment pass. The helpers package is consumed directly (no per-field metascraper-* plugin packages) for date, description, and $jsonld; types are declared locally in src/enrichment/metascraper-helpers.d.tszod β tool parameter schemasbottleneck β backs the shared RateLimitercacache β disk-backed cache store for all three TTL cachesfile-type β MIME sniffing for downloaded images