Forked from vadimfedenko/visit-website-reworked
README.md
An LM Studio plugin for fetching web pages, extracting structured page data, and downloading images for use in chat.
Version 1.3.0 focuses on more reliable HTML parsing, safer image downloads, better blocked-page handling, and clearer output for LLM workflows.
Visit Website returns structured data such as:
{
"url": "https://example.com",
"title": "Example Domain",
"headings": {
"h1": ["Example Domain"],
"h2": [],
"h3": []
},
"fetch": {
"source": "direct",
"finalUrl": "https://example.com/",
"statusCode": 200,
"server": "..."
},
"links": [],
"images": [],
"content": "..."
}
Headings are returned as headings: { h1[], h2[], h3[] }, with up to 5 entries per level.
node-html-parser.h1 to h3 headings, up to 5 per level.r.jina.ai.The plugin first tries to fetch the target URL directly. If the page appears blocked or returns a common anti-bot status such as 401, 403, 429, or 503, it can retry through r.jina.ai.
This fallback is enabled by default to preserve existing behavior. Users who do not want URLs forwarded to a third-party service can disable Jina Fallback in plugin settings.
The fetch metadata in Visit Website shows whether content came from:
direct: fetched from the original site.jina: fetched through the fallback service.Image extraction supports:
src, data-src, data-original, and srcset.Downloads are batched in groups of 3 with short pauses between batches. Each remote response must have an image/* content type before it is saved.
When possible, the plugin creates compact WebP thumbnails. sharp remains an optional dependency; if thumbnail creation fails, the original downloaded image is returned.
Available settings:
Visit Website.r.jina.ai for blocked pages.Use 0 to exclude a section. Use -1 for automatic defaults where supported.
Install from LM Studio Hub:
Click Run in LM Studio.
From the project directory:
npm install npx.cmd lms dev
If your shell allows direct npx execution, this also works:
npx lms dev
Ask the assistant:
Open https://example.com and return the title, headings, links, images, and content.
Or:
Download images from https://example.com.
For targeted extraction:
Open this page and prioritize links and text related to pricing and documentation: https://example.com.
Works well with:
Typical workflow: use DuckDuckGo Reworked to find relevant URLs, Visit Website Reworked to extract page content and images, then Analyze Images to inspect downloaded images.