Add support for reasoning_effort and reasoning_tokens in OpenAI-compatible v1/chat/completions
Adds a reasoning field to the /api/v1/models API response, indicating each model's supported reasoning capabilities/REST configuration options learn more
Fixed a bug where Insert in chat input would sometimes not work after toggling assistant and user mode
Fixed a bug where surrounding spaces in tool call parameters would be stripped for models that uses XML/XML-like tool call formats
[CUDA] Fixed issue where some VRAM would not be deallocated under certain conditions
Fixes a bug where setting reasoning to low when using Nemotron 3 Super via the /api/v1/chat or OpenAI-compatible /v1/responses API would error out