This document tracks completed work, active work-in-progress, and planned improvements.
Each phase is independently shippable and PR-sized. All PRs target haggyroth/LM_Studio_Toolbox.
| Symbol | Meaning |
|---|---|
| โ | Merged to main |
| ๐ | PR open / in progress |
| ๐ | Planned โ not yet started |
npm audit fix resolving simple-git RCE (GHSA-jcxm-m3jx-f287) and 7 other CVEsuuid/node-notifier advisory in SECURITY.mdsubAgentTimeLimit wall-clock deadline enforced with AbortSignal on in-flight fetchessearch_file_content โ search_in_file (was mapping to a nonexistent tool)finish_task now handled as a clean termination signal in the executorfileTools integration tests: 30 cases against a real temp workspacememoryTools integration tests: 13 cases with SQLitesecurity.test.js: validatePath, parseProtectedPaths, SSRF rejection~2 hours ยท Target: 1 PR
Three silent bugs in the embedding cache introduced in Phase E:
| ID | Issue | Fix |
|---|---|---|
| BUG-R1 | Cache keyed by path+mtime only โ switching embedding models reuses stale vectors โ NaN cosine similarity โ empty results with no error | Key by model::path::mtime |
| BUG-R2 | cosineSimilarity has no vector-length guard โ mismatched dimensions produce NaN silently | Assert vecA.length === vecB.length; return 0 on mismatch with a console warning |
| PERF-R1 | Cache is unbounded โ grows forever in a long session embedding many files | Add a simple LRU eviction cap (~200 entries) |
~3 hours ยท Target: 1 PR
The highest-risk modules still have no direct tests:
~2 hours ยท Target: 1 PR
Remaining minor items from the code review:
| Item | Detail |
|---|---|
DuckDuckGo raw fetch timeout | webTools.ts โ one remaining raw fetch() call with no timeout (PERF-R3) |
query_database connection leak | miscTools.ts opens a better-sqlite3 connection per call and never calls .close() |
save_file rejects spaces in filenames | Regex `/[ *?<> |
| Linux clipboard fallbacks | No xsel or Wayland (wl-copy/wl-paste) fallback; xclip-only fails silently on many setups |
multi_replace_text overlap handling | Overlapping old_string ranges corrupt output silently; document or detect and reject |
~1 day ยท Target: 1โ2 PRs
The sub-agent system works well under ideal conditions but is brittle when either side stalls, errors, or produces unexpected output. This phase hardens the handshake and expands the role system.
Main โ sub-agent failures:
Sub-agent โ main failures:
cancelSubAgent() mechanism (perhaps via AbortController passed through ToolContext) that the preprocessor can signal on a fresh turn.[Sub-agent: turn N/8, ~Xk tokens] as a streaming status note visible in the chat so users can see the agent is alive.The current config ships with summarizer and coder examples. Extend the default subAgentProfiles with well-tuned built-in presets (user-overridable via the config JSON):
| Role key | Purpose | Specializations |
|---|---|---|
coder | Write, edit, and refactor code | Follows project conventions; runs tests after changes; uses replace_text_in_file for surgical edits |
reviewer | Audit code for bugs and security issues | Uses search_in_file + read_file; outputs structured findings; calls finish_task with a findings summary |
researcher | Gather, verify, and summarize information from the web | Chains web_search โ fetch_web_content โ rag_web_content; produces a cited summary |
debugger | Diagnose failing tests or runtime errors | Reads error output, traces stack frames, proposes and applies targeted fixes; re-runs tests to verify |
tester | Write and run tests for existing code | Infers coverage gaps from source; writes test files; executes with run_test_command; reports pass/fail |
documenter | Write or update inline docs, READMEs, and changelogs | Reads source; produces JSDoc/docstrings and markdown; uses replace_text_in_file |
planner | Decompose a complex task into an ordered checklist | Produces a structured plan as a markdown file; does not write code itself |
Allow the main agent to compose roles in sequence without manual re-invocation:
Each link in the chain receives the previous link's output + any files it saved as its starting context. The final result aggregates all handoff messages. This enables a full write โ test โ review pipeline in a single tool call.
Currently the main agent and sub-agent each maintain their own view of the workspace. Improvements:
ctx.cwd to the sub-agent as its default starting directoryfilesModified; surface these more prominently in the response)readonly mode option to consult_secondary_agent that disables save_file/delete_path for research-only roles~2 hours ยท Target: 1 PR
The git toolchain covers the common workflow but is missing operations that frequently force the model to fall back to execute_command:
| Tool | Purpose |
|---|---|
git_stash / git_stash_pop | Save and restore uncommitted changes when switching context |
git_reset | Unstage files or roll back commits (--soft, --mixed only โ --hard requires confirmation prompt) |
git_branch | List, create, and delete branches |
git_merge | Merge a branch (fast-forward only by default; --no-ff optional) |
git_fetch | Fetch remote refs without merging |
Each tool follows the existing pattern: simple-git wrapper, model-controlled inputs sanitized by the library, timeout on the spawn.
~3 hours ยท Target: 1โ2 PRs
Improvements that make working on and with the plugin easier:
Diff/patch tool
A apply_patch tool that accepts a unified diff string and applies it to the workspace. Models frequently express changes as diffs; this avoids full file rewrites and reduces the risk of silent overwrites.
Parallel search_directory
The tool currently reads files fully and sequentially. A bounded concurrency pool (8 concurrent reads via Promise.allSettled) would make large-workspace searches 5โ10ร faster with no API surface change.
Configurable tool allowlist
A disabledTools config field (comma-separated tool names) that lets users expose read_file but not delete_path, for read-only assistant configurations. The factory pattern already supports returning [] โ this just wires a config-level filter.
read_document as general opener
Extend read_document (currently PDF + DOCX only) to handle .txt, .csv, .json, .md, and .xml with encoding detection and size-aware chunking. Reduces the model's need to decide which "open" tool to call.
Longer horizon โ design required before scheduling
These are higher-effort ideas worth tracking but not yet fully specified:
Session persistence
Export the current session state (CWD, memory count, active browser URL, background command log) to a JSON snapshot and restore it on next startup. Lets users resume exactly where they left off after a context reset.
MCP server mode
Expose the toolbox as a Model Context Protocol server so non-LM-Studio clients (Claude Desktop, Cursor, VS Code Copilot Chat) can use the same tool surface. The ToolContext + factory architecture maps cleanly to MCP's tool/resource model.
Workspace profiles
Named presets that bundle CWD, protectedPaths, enabled tools, and a default sub-agent role. The model can switch profiles (use_workspace_profile("frontend")) to instantly reconfigure for a different project context.
Streaming tool output
Long-running tools (fetch_web_content on a slow host, rag_local_files on a large tree) currently block until complete. The LM Studio SDK supports streaming; a progress-callback pattern could surface partial results as intermediate chat messages.
Token/cost tracking
Track approximate token usage and elapsed time across sub-agent turns and surface a [Session: N turns, ~Xk tokens, Ys elapsed] footer. Helps users manage context budget and understand where time is spent.
Grouped by effort and impact. All items are independent and can be taken in any order.
N.1 โ Atomic file writes
save_file currently uses a direct writeFile() which can leave a half-written file if interrupted. Fix: write to <path>.tmp then rename() to the final target โ a POSIX atomic operation. Prevents file corruption during long code-generation sessions. One-line change per write site.
N.2 โ read_file line-count and token estimate
Append a footer [File: 847 lines, ~12k tokens] when a file is read. Helps the model decide whether to use read_file_range instead of loading the whole file, and prevents accidental context blowout on large generated files.
N.3 โ run_test_command streaming output
The tool currently blocks silently. Stream stdout line-by-line through ctx.status() as test results arrive โ the user sees "PASS src/auth.test.ts" ticking by instead of a frozen spinner for 30 seconds. Pairs well with M.1's existing status infrastructure.
N.4 โ git_diff word-level mode
Add a word_diff: boolean parameter that passes --word-diff to git. LLMs parse word-level diffs significantly better than line-level for prose/documentation changes. Five-line addition to gitTools.ts.
N.5 โ search_directory exclusion patterns
Add an exclude: string[] parameter accepting glob patterns (e.g. ["dist", "*.min.js", "coverage"]). Currently hardcodes only node_modules, .git, and dotfiles. A heavily requested change for large monorepos.
N.6 โ analyze_project tool
A single tool that orients the model at the start of a session: 2-level directory tree, package.json/pyproject.toml/Cargo.toml summary, recent git commits, active branch, test command, and file count. Currently the model needs 6โ8 separate tool calls to gather this context. One call should do it.
N.7 โ query_csv and transform_json tools
Lightweight structured-data tools that work without enabling Python:
query_csv(file, filter?, columns?, limit?) โ filter rows, select columns, return as JSON arraytransform_json(file, path_expression) โ traverse/filter a JSON document with a simple path expressionCovers the 90% case for data inspection workflows.
N.8 โ watch_file / watch_directory (background watcher)
Starts an fs.watch() listener registered in backgroundCommands. When the watched path changes, calls send_notification and logs the event. Enables reactive workflows: start a dev server in the background, watch dist/ for the build output, automatically re-read when it changes.
N.9 โ find_symbol and find_usages (AST-aware code search)
Uses ts-morph (already a devDependency) to add workspace-aware symbol navigation:
find_symbol(name) โ locate where a TypeScript function/class/variable is definedfind_usages(name, file?) โ find all call sites across the workspaceEliminates the false positives of search_directory's text grep (e.g., finding the string "render" in a comment when you want the render() function). Works on TypeScript and JavaScript files.
N.10 โ capture_screenshot tool
When allowBrowserControl is enabled, open a URL, take a screenshot, save it to the workspace, and return the file path โ without requiring a persistent browser session. Enables visual regression checks and "what does this page look like?" queries in a single tool call.
N.11 โ Audit log
Write every tool call (name, args summary, result status, elapsed ms, timestamp) to ~/.lm-studio-toolbox/audit.log in NDJSON format. Off by default, enabled via a enableAuditLog config field. Lets users review what the model did during a session โ especially useful for debugging unexpected file changes.
N.12 โ Custom tool plugins
Users drop a JavaScript file into ~/.lm-studio-toolbox/plugins/ that exports a tool definition using the same Zod schema pattern as built-in tools. The plugin loader scans the directory at startup and registers each export. Gives power users the ability to add domain-specific tools (deploy scripts, company-internal APIs) without forking the source.
N.13 โ Auto-capture memory
The current memory system is entirely manual โ the model must call save_memory explicitly. Add an autoCapture mode that distills key facts from each conversation using the secondary LM Studio endpoint and saves them automatically. No user action required. Controlled by a memoryAutoCapture config field.
N.14 โ rename_symbol โ workspace-wide atomic rename
Uses ts-morph to rename a TypeScript identifier across the entire workspace: updates the definition, all import statements, and all call sites in a single transaction. Currently the model needs search_directory + multiple replace_text_in_file calls and risks missing occurrences. One tool replaces 20+ calls for common refactors.
N.15 โ Diff-based editing workflow
Add edit_file_with_diff(file, unified_diff) that validates a diff against the current file content before applying it (via apply_patch). Dramatically reduces token usage for large files โ sending a 10-line diff instead of a 500-line rewrite. Pairs with a "generate diff โ apply diff โ verify" workflow that the model can adopt for large codebases.
N.16 โ Sub-agent mid-task steering
Add an interrupt_sub_agent(message) tool that injects a correction into the sub-agent's message list on the next turn. Currently once consult_secondary_agent is invoked, the main agent is locked out until it finishes. This enables the user to course-correct a running sub-agent ("stop and focus on the auth module instead") without cancelling the entire run.
| Phase | Description | Effort | Status |
|---|---|---|---|
| A | Security hotfix (CVEs) | 0.5 day | โ Done |
| B | Core security boundaries | 1 day | โ Done |
| C | Sub-agent correctness | 0.5 day | โ Done |
| D | Test suite hardening | 1โ1.5 days | โ Done |
| E | Consistency, performance, polish | 1 day | โ Done |
| F | SSRF redirect hardening | 0.5 day | โ Done |
| G | RAG cache correctness | 2 hours | โ Done |
| H | Test coverage gaps | 3 hours | โ Done |
| I | Polish (timeouts, leaks, minor bugs) | 2 hours | โ Done |
| J | Sub-agent robustness & new roles | 1 day | โ Done |
| K | Git toolset completion | 2 hours | โ Done |
| L | Developer experience & tooling | 3 hours | โ Done |
| M.1 | Streaming tool status | 3 hours | โ Done |
| M.2 | Token tracking in sub-agent | 1 hour | โ Done |
| M.3 | Session persistence enrichment | 3 hours | โ Done |
| M.4 | Workspace profiles | 3 hours | โ Done |
| M.5 | MCP server mode | 2 days | โ Done |
| N.1 | Atomic file writes | ~30m | โ Done |
| N.2 | read_file token estimate | ~30m | โ Done |
| N.3 | run_test_command streaming output | ~30m | โ Done |
| N.4 | git_diff word-level mode |
safeFetch() central helper: scheme allowlist, RFC-1918/loopback/link-local denylist, configurable timeoutbrowser_session_open / browser_open_page (blocked file://, javascript:)protectedPaths enforcement wired into validatePath() and change_directoryquery_database hardened: rejects ATTACH, PRAGMA, constrains db_path to workspacec8 coverage reporting added (npm run coverage)path + mtime with batched embed() callsnpm run ci (no-new-func: error, no-unused-vars: error)mtimes array, stale auto-summary references)[] when disabled" patternrun_python/run_javascript moved to os.tmpdir()gh_push consolidated into git_pushsafeFetch rewritten with redirect: "manual" โ every Location header re-validated before followingdns.promises.lookup) rejects hostnames resolving to private IPsstartsWith("fc") || startsWith("fd") (was startsWith("fc00"), missed fc01โfcff)::ffff:a.b.c.d) detected and recursively validated198.0.0.0/8 over-block fixed: now targets only 198.18โ19/15 (benchmarking) and 198.51.100/24 (TEST-NET-2)isBlockedIp() and validateSsrfUrl() exported as pure functions; 42 new unit testsSECURITY.md updated with residual DNS-rebinding TOCTOU notewebTools: fetch_web_content and rag_web_content through the real safeFetch path (using mocked fetch); wikipedia_search timeout behaviorpromptPreprocessor: memory injection, message-count increment, startup-file loading, legacy memory.md migration triggermiscTools: query_database ATTACH rejection, rag_local_files end-to-end scoring with a stub embedding modelfinish_task clean termination, time-limit enforcement, TASK_FAILED propagationnoToolCallCount and eventually exits. The main agent never learns why. Add a structured { status: "stalled", summary: "..." } return so the main agent can retry or escalate.{ status: "timeout", progress: "..." } field. The main agent can then re-invoke with a narrower scope rather than starting over.consult_secondary_agent fails due to a connection error (LM Studio not running, model not loaded), retry up to 2 times with a 5 s backoff before surfacing the error. Include a clear message telling the user which model/server to check.echo OK) and bail early with a clear error if the secondary endpoint is unreachable.data_analyst | Query, transform, and summarize structured data | Uses query_database, run_python, and rag_local_files; produces tables and charts |
| ~30m |
| โ Done |
| N.5 | search_directory exclusion patterns | ~30m | โ Done |
| N.6 | analyze_project tool | ~2h | โ Done |
| N.7 | query_csv and transform_json tools | ~3h | โ Done |
| N.8 | watch_file / watch_directory | ~3h | โ Done |
| N.9 | find_symbol and find_usages (AST) | ~3h | โ Done |
| N.10 | capture_screenshot tool | ~2h | โ Done |
| N.11 | Audit log | ~1h | โ Done |
| N.12 | Custom tool plugins | ~1 day | โ Done |
| N.13 | Auto-capture memory | ~1 day | โ Done |
| N.14 | rename_symbol โ workspace-wide atomic rename | ~1 day | โ Done |
| N.15 | edit_file_with_diff โ diff-based editing | ~1 day | โ Done |
| N.16 | Sub-agent mid-task steering | ~1 day | โ Done |
| O.1 | Integration tests + full validator coverage | ~3h | โ Done |
| O.2 | 429 rate-limit retry with Retry-After | ~2h | โ Done |
| O.3 | Sub-agent execution log + turn_log field | ~3h | โ Done |
| O.4 | Memory deduplication (exact-match, case-insensitive) | ~1h | โ Done |
| O.5 | search_directory max_matches param | ~30m | โ Done |
| O.6 | subAgentShowExecutionLog config field | ~30m | โ Done |
| O.7 | SQLite JSON fallback backend for memory tools | ~3h | โ Done |
| O.8 | query_database actionable error on macOS sandbox | ~30m | โ Done |
consult_secondary_agent({
task: "Implement and test the new auth module",
chain: ["coder", "tester", "reviewer"]
})
// ~/.lm-studio-toolbox/plugins/deploy.js
module.exports = {
name: "deploy_to_staging",
description: "Deploy the app to the staging environment",
parameters: { env: z.enum(["staging", "canary"]) },
implementation: async ({ env }) => { /* ... */ }
};