Forked from altra/web-search
README.md
A research-grade web search plugin that goes beyond snippets — it reads pages, verifies claims, detects contradictions, finds primary sources, and enforces cross-source fact verification before asserting anything as true.
Standard search gives you ten links and short snippets. You get:
This plugin fixes all of that.
cd web-search-plugin npm install npx tsc
Load the built plugin in LM Studio.
| Field | Default | Description |
|---|---|---|
| Max Search Results | 8 | Results retrieved per query |
| Max Pages to Read | 3 | Pages actually fetched and read per search |
| Page Fetch Timeout | 8000ms | Per-page timeout before giving up |
| Search Language | en-us | Language/region for results |
| SearXNG URL | (blank) | Recommended. Self-hosted SearXNG instance. Falls back to DDG → Bing if blank. DDG and Bing may block headless requests — SearXNG is the reliable path. |
| Search Recency Window | year | Limit results to: day, week, month, year, or any |
| LM Studio URL | http://localhost:1234 | Used for embedding-based result reranking via nomic-embed-text |
Before running any search, the plugin calls a clarify step. It detects ambiguity signals in the user's question:
If ambiguity is detected, the LLM asks the user focused questions before searching. This avoids wasted searches and produces a much more targeted answer.
After retrieving search results, the plugin reranks them using embeddings before fetching any pages. This means the pages that actually get read are the most semantically relevant to the query — not just the top SEO results.
How it works:
nomic-embed-text (requires the model loaded in LM Studio)If the embedding call fails (model not loaded, wrong URL), it falls back silently to the original search engine ranking.
search, search_recent, and search_news each report independent_publishers_read — the number of distinct root domains among the pages successfully fetched. This feeds a dynamic instruction injected into every result:
| Publishers read | Instruction to the LLM |
|---|---|
| 0 | Do not assert any facts — re-search or inform the user |
| 1 | Hard UNVERIFIED warning — call fact_check or label every claim as unverified before presenting |
| 2+ | Report publisher count — flag any claim supported by only one of them as UNVERIFIED |
This prevents the LLM from repeating a claim as fact just because one website said it.
The system prompt enforces five non-negotiable rules on top of the publisher diversity signal:
fact_check or compare_sources.verify_statistic before asserting the number.Results come from SearXNG (if configured) → DuckDuckGo HTML scraper → Bing HTML scraper. No API keys required.
Always called before any search. Returns either:
STATUS: READY — question is specific enough, search proceedsSTATUS: CLARIFY — question is ambiguous, LLM asks user before searchingYou do not need to call this manually. The system prompt enforces it as a mandatory first step.
The main tool. Unlike basic search, it fetches and reads the actual page content — not just snippets.
search(query, max_pages_to_read?)
Returns:
independent_publishers_read — count of distinct root domains among successful page readsFetch any URL and return the full readable text content.
fetch_and_read(url, max_chars?)
Use when:
Runs 3–5 separate searches from different perspectives on the same topic, reads pages for each, and returns everything together. Defeats single-search bias.
deep_search(topic, angles?, pages_per_angle?)
Default angles: overview facts, latest research, criticism/limitations, expert consensus.
You can specify your own angles, e.g.:
angles: ["economic impact", "environmental cost", "industry response", "regulatory landscape"]
Cross-checks a claim across four search angles: direct confirmation, debunking searches, evidence searches, and expert opinion. Returns raw evidence from all angles for the LLM to assess.
fact_check(claim)
Verdict categories: supported, disputed, unsupported, nuanced, uncertain.
Statistics are frequently outdated, misquoted, out of context, or fabricated. This tool searches for the stat, its primary source, fact-check results, and updated data.
verify_statistic(statistic, context?)
Example: verify_statistic("90% of startups fail in year one", "venture-backed US tech startups")
Secondary sources often distort original findings. This tool searches for the original study, report, official document, or statement where a claim first appeared.
find_primary_source(claim, domain?)
Prioritises: peer-reviewed journals, government reports, official organisation publications over secondary citations.
Only returns results from the specified time window. Prevents stale results from dominating on fast-moving topics.
search_recent(query, window?, read_pages?)
Windows: day (last 24h), week, month, year.
Returns independent_publishers_read and a dynamic publisher diversity instruction alongside the results.
Fetches multiple sources on the same topic and returns them side by side for the LLM to compare framing, spot conflicts, and identify unique claims.
compare_sources(topic, urls?, num_sources?)
Provide specific URLs to compare, or let it search and pick sources with varied domains automatically.
Returns structured analysis of:
Searches specifically for academic research, official positions, expert interviews, and scientific consensus — not what random blogs claim experts say.
find_expert_views(topic, field?)
Covers four angles: expert consensus, peer-reviewed research, official institutional positions, and active scientific debate.
Searches arXiv, PubMed, and Semantic Scholar for peer-reviewed papers and research publications.
search_academic(topic, source?, year_from?)
Sources: arxiv, pubmed, semantic_scholar, all.
Fetches paper pages to extract abstracts, methodology, and findings. The LLM is instructed to distinguish preprints from peer-reviewed work, note sample sizes, and not overstate findings.
News-filtered search that actively ranks established journalism above blogs, product pages, and content farms. Runs two queries — one general, one targeting major news outlets — then ranks high-credibility results first.
search_news(query, window?, read_pages?)
Windows: day, week, month, any.
Unlike search_recent (which filters by date), this filters by source type — it's about journalistic sourcing, not just recency. Best for: breaking news, corporate announcements, policy changes, anything where "who is reporting it" matters.
Returns independent_publishers_read and a dynamic publisher diversity instruction alongside the results.
Runs multiple searches from different angles, reads key pages, and instructs the LLM to produce a structured research brief: overview, established facts, contested areas, expert consensus, open questions, key sources, and confidence assessment.
research_topic(topic, depth?, focus?)
Depths:
overview — 3 angles, 2 pages eachdetailed — 5 angles, 2 pages each (default)comprehensive — 7 angles, 3 pages eachAssesses a URL or domain and returns its credibility type, known signals, reputation search results, and red flags to watch for.
check_source(url)
Domain types: government, academic institution, academic/research platform, established news outlet, encyclopedia, user-generated content, unknown.
Credibility levels: high, medium, low, unknown.
Red flags checked:
Every search result and fetched page gets a credibility assessment based on domain signals:
| Domain Type | Credibility | Examples |
|---|---|---|
| Government | HIGH | .gov, .mil, WHO, CDC |
| Academic institution | HIGH | .edu, .ac.uk, universities |
| Academic platforms | HIGH | arXiv, PubMed, Semantic Scholar |
| Established news | HIGH | Reuters, AP, BBC, Nature, NYT |
| Wikipedia | MEDIUM | Good overview, verify citations |
| User-generated / blogs | LOW | Blogspot, WordPress, Reddit, Quora |
| Unknown | UNKNOWN | Check About page and author credentials |
The plugin's system prompt instructs the LLM to:
clarify first — ask focused questions before searching if the query is ambiguousSimple fact:
"What is the Dunning-Kruger effect?" →
clarify(READY) →search
Ambiguous query:
"Tell me about python" →
clarify(CLARIFY) → asks: "Do you mean the programming language or the snake?" →search
Verify a claim:
"Is it true that we only use 10% of our brains?" →
clarify(READY) →fact_check
Verify a statistic:
"Someone told me 50,000 species go extinct every year. Is that right?" →
clarify(READY) →verify_statistic
Recent developments:
"What's happened with GPT-5 in the last week?" →
clarify(READY) →search_recent(window: "week")
Compare perspectives:
"What do different sources say about seed oils and health?" →
clarify(READY) →compare_sourcesordeep_search
Scientific consensus:
"What does the research actually say about intermittent fasting?" →
clarify(READY) →find_expert_views+search_academic
Deep research:
"Give me a thorough research brief on quantum error correction" →
clarify(READY) →research_topic(depth: "comprehensive")
Read a specific article:
"Can you read this paper and summarise the key findings? [url]" →
clarify(READY) →fetch_and_read
Check if a source is reliable:
"Is naturalhealth365.com a reliable source?" →
clarify(READY) →check_source
AI capability claim:
"I read that [model X] achieves 95% accuracy on [benchmark]. Is that right?" →
clarify(READY) →fact_check— vendor blogs and press releases are not accepted as evidence; requires independent academic or journalistic confirmation