Official issues about the agent loop, tool calls, and model behavior. Sourced from the official GitHub repo; status derived from official labels; every entry links back to its source with a last-checked date.
#58345 Workaround P2 comp/agent [Bug]: xAI grok-4.3 drops optional multiline string args from MCP tool calls — AgentMail sends blank emails (docs recommend the affected combo)
On xai-oauth with grok-4.3, optional multiline string args are dropped from MCP tool calls, causing AgentMail to send blank emails while reporting success.
#58340 Investigating P3 comp/agent [Feature Request] Auto-load relevant skills before task execution
Feature request: auto-suggest or auto-load relevant skills before task execution, since agents often forget to check installed skills.
#58327 Fixed P1 comp/agent [Bug]: Context compression breaks tool message chain, causing "role 'tool' must be a response to a preceding message with 'tool_calls'" on strict providers
Context compression could orphan tool messages by dropping their assistant tool_calls message, causing HTTP 400 on strict providers like DeepSeek. Closed.
#58325 Investigating P3 comp/agent Integration proposal: Hermes as a personal agent in a governed Gotong workflow/life-flow hub
Integration proposal from the Gotong project: Hermes as the personal agent layer inside a governed workflow hub.
#58317 Workaround P2 comp/agent Compression crash: AttributeError 'dict' object has no attribute 'count' in _summarize_tool_result
Context compression crashes with AttributeError: 'dict' object has no attribute 'count' when a write_file tool result's content argument is parsed as a dict instead of a string, blocking compression entirely until the context window overflows.
#58298 Workaround P2 comp/agent Subagent delegation ignores config, always uses credential-pool model (glm-4-flash)
Subagent delegation ignores the configured provider/model and always falls back to glm-4-flash from the credential pool, confirmed across 5 test scenarios including one where delegation config was empty and should have inherited the parent's model.
#58231 Workaround P2 comp/agent [Bug] minimax-oauth provider requires API key env var despite being OAuth
When minimax-oauth is configured for auxiliary.title_generation, Hermes still tries to load a MINIMAX-OAUTH_API_KEY env var instead of using the existing OAuth token auth, causing title generation to fail.
#58226 Workaround P2 comp/agent [Bug]: Anthropic OAuth usage renders low-usage windows as 100% used (utilization ≤1 wrongly scaled ×100)
The /usage dashboard misreports a barely-used Anthropic OAuth account as 100% used because the code incorrectly multiplies an already-percentage utilization value by 100 when it is <= 1.
#58217 Investigating P3 comp/agent [Feature]: Give pre_llm_call a last_turn record — let plugins perceive what the previous turn did
The pre_llm_call plugin hook can only see what a turn is about to do, not what the previous turn actually did, since that state is wiped at the start of the next turn; the issue proposes a last_turn record passed into the hook.
#58200 Workaround P2 comp/agent [Bug]: Trajectory files are not created in CLI / TUI / Gateway
Setting agent.save_trajectories: true in config.yml per the documentation does not produce any *.jsonl trajectory files, because the code never checks this config option.
#58197 Workaround P2 comp/agent [Feature]: Retry malformed tool calls with forced tool_choice
Local/quantized models intermittently emit malformed tool calls that get silently replaced with empty args or re-fail on retry; the proposal pins tool_choice to the failed tool on retry, already implemented per PR #44587.
#58196 Investigating P3 comp/agent [Feature]: Verification enforcement -- verify-before-claim guidance + unsupported-completion-claim detector
Agents routinely claim completion without verifying; the proposal adds verify-before-claim system-prompt guidance plus a log-only detector for unsupported completion claims, already implemented per PR #54576.
#58195 Investigating P3 comp/agent [Feature]: Per-slot reasoning_effort for MoA reference advisors
MoA reference models spend most of their latency on private reasoning the aggregator never sees, with no way to reduce it independently; the proposal adds a per-slot reasoning_effort option, already implemented per PR #57043.
#58193 Investigating P3 comp/agent [Feature]: Attach an open desktop window so the agent can see and control it
Operating a desktop app currently requires describing the window and controls in text; the proposal adds a way to attach a live window so the agent can see and control it via a UI-automation sidecar, already implemented per PR #53852.
#58192 Investigating P3 comp/agent [Feature]: Keep local-backend prompt prefixes warm to eliminate cold-session prefill
Local llama.cpp/vLLM-style servers pay a full prefill cost for every new session because the shared prompt prefix is not kept warm; the proposal adds an opt-in gateway watcher that periodically replays a minimal request, and the author notes it is already implemented in PR #57019.
#58185 Workaround P2 comp/agent [Bug]: /model picker for Bedrock offers bare foundation-model IDs that 400 on on-demand accounts (and persists them)
The Bedrock /model picker lists non-invokable bare foundation-model IDs alongside inference-profile IDs, and selecting one persists it to config.yaml, causing every subsequent call to 400; the setup wizard already dedupes this but the picker path does not.
#58168 Fixed P1 comp/agent Bug: Context compaction produces invalid message sequences (orphaned tool messages), breaking sessions permanently
The trajectory compressor produces invalid message sequences with orphaned tool messages after context compaction, permanently breaking long-running sessions with a DeepSeek API 400 error.
#58167 Workaround P2 comp/agent redact: SendGrid prefix pattern masks only the key-id segment — the key-secret segment stays in cleartext
The SendGrid redaction regex in agent/redact.py stops matching at the second dot, so the key-secret segment of a 3-part SendGrid API key remains unmasked in cleartext in logs and transcripts.
#58135 Workaround P2 comp/agent [Bug]: `is_container()` false-positives on hosts running Docker containers (containerd markers in mountinfo), making `home_mode: auto` subprocess HOME non-deterministic — breaks browser auto-launch with "Chrome not found"
is_container() misidentifies a host as running inside a container when Docker containers using the containerd snapshotter are present, because it scans mountinfo for markers that also appear in overlay mounts on the host; the cached false result destabilizes home_mode: auto and can break browser auto-launch.
#58117 Workaround P2 comp/agent Thinking blocks cause blank text responses in CLI mode + infinite heartbeat loops
When a model uses thinking blocks (e.g. DeepSeek), the agent's text response fails to reach the user in CLI mode, causing wasted API cost and infinite loops in cron/heartbeat jobs; disabling thinking blocks works around it but isn't configurable.
#58105 Workaround P2 comp/agent [Bug]: Message routing bug - User input sent to wrong session
In Hermes TUI, user input typed in one session is sometimes routed to and answered by a different session instead of the currently active one.
#58087 Workaround P2 comp/agent Bug: finish_reason='length' triggers spurious continuation retries on short complete responses (Ollama/GLM provider)
When the agent produces a short but complete final response, the finish_reason == 'length' truncation handler spuriously fires a continuation prompt, causing the same short response to be resent 2-3 times on Telegram.
#57903 Fixed P2 comp/agent async LLM calls block the desktop WebSocket loop via busy-poll in interruptible_*_api_call
After extensive investigation, the actual cause of desktop WebSocket loop stalls was found to be GIL contention from the Anthropic SDK's streaming consumer parsing thousands of SSE chunks, not the originally suspected busy-poll. This issue is now closed.
[Bug]: Envelope-layout cache breakpoints silently no-op during tool loops (tool messages skipped, empty-assistant markers ignored) — ~2x input cost on OpenRouter + Claude
On the OpenRouter/envelope cache layout, cache breakpoints get placed on tool and empty-assistant messages that cannot carry an effective cache marker, silently disabling caching for most of an agentic conversation and roughly doubling input cost. This issue is now closed.
#57740 Investigating P3 comp/agent PII (email/SSN) persisted unredacted in the session JSON transcript export
The redaction logic covers secrets like API keys and tokens but has no coverage for generic PII such as email addresses or SSNs, and the opt-in JSON transcript export writes such values unredacted.
#57228 Workaround P2 comp/agent MCP stdio subprocesses leak on reconnect in long-lived workers (orphans accumulate until DB contention)
Long-lived Hermes worker processes accumulate orphaned MCP stdio subprocesses over time (53 in the reported case), causing SQLite DB handle contention and intermittent memory tool failures even though health checks report the MCP server as healthy.
#56655 Investigating P3 comp/agent Feature: task-aware per-turn model routing via a pre_llm_call model override
The reporter requests a way for a plugin to choose the model per turn based on the incoming message, noting that existing hooks like pre_llm_call cannot currently change the model used for a turn.
#55677 Fixed P2 comp/agent [Bug] Context compaction fails with 'No user query found in messages' Jinja template error, corrupts session
Context compaction crashed with a Jinja 'No user query found in messages' error on the 2nd/3rd attempt with LMStudio-hosted models, corrupting the session. Closed.
#54220 Fixed P2 comp/agent [Tracking] Windows Desktop GUI: console windows (cmd/conhost/git/gh/powershell) flash on subprocess spawns
This tracking issue consolidates roughly 25 reports of console windows (cmd, conhost, git, gh, powershell) flashing on the Windows desktop GUI when its windowless pythonw.exe backend spawns console-subsystem child processes without the no-window flag, and documents which spawn sites are still leaking based on source and git-history verification.
#50663 Fixed P2 comp/agent [Bug]: z.ai limits hermes agent during "peak hours"
z.ai rate limits (429) the Hermes agent during peak hours when using a Max coding plan with glm-5.2, while opencode and Claude on the same account are unaffected; suspected to be based on request signature detection.
#48534 Fixed P1 comp/agent Anthropic Max OAuth fails: token exchange 404s because Anthropic now blocks the claude-cli/ User-Agent
The built-in Anthropic OAuth token exchange fails with HTTP 404 because Anthropic now blocks any request carrying a claude-cli/ User-Agent prefix regardless of version, confirmed by testing multiple User-Agent strings.
#47349 Workaround P2 comp/agent Feature: Configurable Memory Backends — disable memory.md, use honcho/fact_store only
This feature request proposes renaming memory.md to rules.md and adding configurable memory backends, since injecting all memory entries into every turn currently mixes always-needed rules with queryable facts and wastes tokens.
#43747 Workaround P2 comp/agent [Bug]: openai-codex credential pool marks healthy later account as usage_limit_reached; auth reset restores operation
Hermes incorrectly marks all openai-codex pooled credentials as rate-limited even when one account still has quota, and running hermes auth reset openai-codex immediately restores normal operation.
#39691 Investigating P3 comp/agent feat(compression): integrate headroom-ai for tool output compression
This issue proposes integrating the open-source headroom-ai library to compress individual tool outputs before they enter context, addressing known issues with Hermes's existing conversation-level compression system.
#35876 Workaround P2 comp/agent fix(vision): _resolve_single_provider kwargs regression — fallback_chain silently fails on Gemini quota errors
When Gemini returns a 429 quota error, the vision fallback chain fails silently because _resolve_single_provider does not correctly forward explicit_base_url/explicit_api_key kwargs to resolve_provider_client, so no fallback provider is used.
#34352 Investigating P3 comp/agent Solving the Multi-Tenant Hermes Problem
This issue reports that memory operations bypass Hermes's hook system, making tenant isolation impossible without forking core, and proposes upstreaming a fix along with an open-source project called Hermes Swarm Map.
#33932 Fixed P1 comp/agent OpenAI Codex provider crashes with "'NoneType' object is not iterable" (HTTP None)
Switching to gpt-5.5 via the OpenAI Codex provider (subscription OAuth) crashes on the first user message with "Non-retryable error (HTTP None): 'NoneType' object is not iterable," though fallback then takes over.
#33237 Fixed P3 comp/agent openai-codex provider crashes with TypeError: 'NoneType' object is not iterable on every request (chatgpt.com sends output: null in response.completed event)
The openai-codex provider crashes on every request after a gateway restart with a TypeError, because the chatgpt.com Codex endpoint returns output: null in response.completed events, which OpenAI SDK 2.24.0's parser cannot handle; this uncaught error drains the credential pool. Issue is closed as fixed.
#33223 Investigating P3 comp/agent [Discussion] Why was smart_model_routing removed? Request to restore
This is a discussion asking why the smart_model_routing feature, which auto-routed simple short turns to a cheaper model, was removed entirely rather than kept as opt-in, and requesting it be restored.
#33075 Fixed P3 comp/agent openai-codex/gpt-5.5 still unstable in Hermes v0.14.0: subagents almost always hit APIConnectionError/TTFB timeout while Codex CLI works
Users reported that openai-codex/gpt-5.5 remained highly unstable in Hermes v0.14.0, especially with concurrent subagents hitting connection timeouts, while the official Codex CLI worked fine on the same machine; the issue is now closed.
#32956 Fixed P3 comp/agent [Bug]: Codex Responses streaming crashes with TypeError: 'NoneType' object is not iterable (openai-codex / chatgpt.com backend)
Reporter filed a crash in Codex Responses streaming (TypeError: 'NoneType' object is not iterable) with details provided in an attached file and screenshots rather than in the issue body; this issue is marked as a duplicate.
#32903 Fixed P3 comp/agent openai-codex provider crashes: SDK parse_response fails on null output from Codex backend
This issue reports that Hermes v0.14.0 crashes with a 'NoneType' object is not iterable error under the openai-codex provider with gpt-5.5, root-caused to the OpenAI SDK's parse_response() lacking a null guard when the Codex backend returns output: null. This issue is closed.
#32892 Fixed P3 comp/agent [Bug]: Error: 'NoneType' object is not iterable
This issue reports that Hermes crashes with a 'NoneType' object is not iterable error when using the openai-codex provider with gpt-5.5, aborting as a non-retryable client error. This issue is closed.
#32883 Fixed P2 comp/agent Fix Codex stream None output recovery
This issue documents a reproducible Hermes crash when the OpenAI Codex Responses backend returns response.output = None mid-stream, and proposes patches (backfilling output, fallback to responses.create, guarding output_text access) to make Hermes resilient to it. This issue is closed.
#32373 Fixed P2 comp/agent openai-codex / gpt-5.5 repeatedly produces no first byte after #31967/#32016
Even after the merged Codex timeout fixes (#31967, #32016), openai-codex / gpt-5.5 still frequently stalls with "No first byte from provider in 45s" errors and repeated reconnects, distinct from the earlier stale-timeout issue.
#30649 Investigating P3 comp/agent [Feature]: Proton Pass AI Access Tokens support (secret source backend)
Hermes currently supports Bitwarden Secrets Manager as an external secret source; the issue proposes adding Proton Pass's new AI Access Tokens as an additional secret source backend, given its read-only vault access, expiration controls, and audit logs.
#26879 Investigating P3 comp/agent [Bug] auxiliary task provider identity lost when base_url + api_key are both set
When an auxiliary task config sets provider, base_url, and api_key together, the provider name is silently overwritten with 'custom', bypassing provider-specific handling and causing subtle failures.
#26425 Fixed P2 comp/agent [Bug]: Response truncated due to output length limit — still occurring after #7237 fix (re-opening closed issue)
This issue reports that the 'Response truncated due to output length limit' error persists across different large-context models even after fixes in #7242 and #9525, despite the earlier #7237 being closed as 'not a bug'. This issue is closed.
#24443 Workaround P2 comp/agent MiMo reasoning models may fail in Hermes because reasoning_content is not preserved in chat history
MiMo reasoning models can fail in multi-turn conversations because Hermes does not preserve and echo back the reasoning_content field, which MiMo's API requires in thinking mode, resulting in 400 errors.
#24140 Fixed P1 comp/agent All models rejected with "context window below minimum 64,000 tokens" — Telegram completely down
All calls to MiniMax-M2.7 and kimi-k2.6 (32,768 token context) were rejected by Hermes Agent's 64K minimum context check, taking down the Telegram bot and cron jobs even though the models previously worked without any config changes.
#24039 Workaround P2 comp/agent Auxiliary fallback chain should reuse fallback_providers, not maintain a separate hardcoded list
Hermes maintains two separate fallback chains that are unaware of each other: the user-configured fallback_providers used by the main agent, and a separate hardcoded fallback list (including paid models) used by auxiliary tasks like compression, vision, and title generation.
#23717 Investigating P3 comp/agent RFC: Pluggable SessionDB Provider — PostgreSQL, MySQL, and Beyond
This RFC proposes a pluggable SessionDB backend (e.g. PostgreSQL, MySQL) to replace Hermes's current shared SQLite state.db, which is prone to lock contention and corruption when multiple processes run concurrently with hot updates.
#21444 Fixed P2 comp/agent [Bug]: All openai-codex / gpt-5.5 primary calls hang silently for full stale timeout
With openai-codex / gpt-5.5 as the primary model, every turn hangs silently for the full ~300s non-streaming stale timeout before falling back, while the same setup works immediately with gpt-5.4-codex.
#20249 Investigating P3 comp/agent Model Presets: per-turn expert-on-demand model escalation
This feature proposes a model_presets config section that lets users temporarily escalate to a named expert model for a single turn and then automatically snap back to the default model, instead of manually switching models back and forth.
#18733 Investigating P3 comp/agent Per-model or per-provider compression threshold overrides
Feature request to allow per-model or per-provider overrides of compression.threshold in config.yaml, since a single global threshold is ineffective for large-context models and too aggressive for small-context ones.
#18715 Investigating P3 comp/agent Support remote Hermes agent with local tool execution
This issue requests a split-runtime mode where a remote Hermes Agent provides skills, memory, and model reasoning, while tool execution (terminal, files, browser, local MCP servers) happens on the local client machine instead of on the remote host.
#15895 Workaround P2 comp/agent [Bug]: google-gemini-cli causing 429 but gquota ok
Despite gquotas showing ample remaining quota (~98% used), using the google-gemini-cli provider with Gemini 3.1 Pro via OAuth returns HTTP 429 errors. Issue is open.
#15717 Fixed P2 comp/agent [Bug]: DeepSeek API 400 error: "reasoning_content" in thinking mode must be passed back to the API
Using a DeepSeek thinking-mode model (e.g. deepseek-v4-flash) causes an HTTP 400 error because Hermes does not pass the model's reasoning_content back on subsequent requests as DeepSeek's API requires. Issue is closed as fixed.
#14420 Fixed P2 comp/agent My agent was unable to give me an accurate answer based on the previous context.
Using hermes chat with a custom Ollama endpoint and the local gemma4:e4b model, the agent fails to recall and use prior conversation context (such as a name the user gave it), producing inconsistent answers. Issue is closed as fixed.
#13834 Workaround P2 comp/agent Hermes openai-codex fails on same machine/network where official Codex CLI still works
On the same macOS machine and network where the official Codex CLI works, Hermes configured with the openai-codex provider repeatedly fails with APIConnectionError/APITimeoutError, suggesting its compatibility layer is not equivalent to the official Codex CLI transport.
#13484 Investigating P3 comp/agent Feature: native Google Cloud Vertex AI provider support
Hermes has no working auth path for the Google Cloud Vertex AI provider, causing silent failures; the reporter built a standalone proxy handling service-account auth and proposes upstreaming it via the existing custom_providers mechanism.
#13181 Investigating P3 comp/agent [Feature]: Easy support for adding OpenCode Go models into Hermes Agent
Integrating new model backends like OpenCode Go into Hermes Agent currently requires digging into internal code; the issue proposes a config-based model registration system with a standardized adapter interface.
#13065 Fixed P3 comp/agent Feature: First-class native vision support for vision-capable main models (with reference implementation + bug findings)
This feature request notes that Hermes routes all image analysis through an auxiliary vision model even when the main model is natively vision-capable, and provides a reference implementation (patching 4 files) for native vision bypass along with several related pipeline bugs discovered in the process.
#11692 Investigating P3 comp/agent Receipts for self-improving agents: proving which skill version produced which output
A discussion-style feature request raises governance concerns about Hermes' self-modifying skills: since skills are created and improved automatically, there is no built-in way to prove which version of a skill produced a given output or under what policy it ran.
#11420 Fixed P3 comp/agent Add MiniMax as vision backend in auxiliary_client.py (_try_minimax)
Configuring AUXILIARY_VISION_PROVIDER=minimax silently fails the vision_analyze tool because _resolve_strict_vision_backend() has no MiniMax branch, even though MiniMax offers multimodal vision endpoints.
#11179 Fixed P2 comp/agent [Bug]: Responses stream crashes when terminal response.output is null
When an OpenAI-compatible provider returns a terminal response with output as null rather than an empty list, Hermes's existing recovery logic for the empty-array case did not cover it, causing get_final_response() to raise before streamed output could be backfilled; the issue is now closed.
#9514 Investigating P3 comp/agent Feature: Single-Daemon Multi-Agent with Per-Topic Workspace & Memory Isolation
This feature proposes a single-daemon architecture, inspired by OpenClaw, letting multiple Hermes agents share one gateway process with isolated per-agent workspaces and memory, routed by session key including per-topic isolation in group chats.
#9459 Investigating P3 comp/agent feat(delegation): agent profiles for delegate_task — custom orchestration harness support
This issue proposes allowing delegate_task to spawn subagents from named agent profiles defined in config.yaml, so users can customize system prompts, models, and tool scoping per subagent role without modifying Hermes core.
#8457 Investigating P3 comp/agent Feature: Persistent Session Memory with Cross-Session Search & Auto-Compression
This feature request proposes a persistent, searchable, auto-compressing "Vault" style session memory system to replace Hermes's current transient memory, which is lost whenever a session ends or the gateway restarts.
#7517 Investigating P3 comp/agent Feature Request: Native Multi-Agent Support
This issue requests native multi-agent support in Hermes, allowing a single gateway process to serve multiple named agents with isolated sessions, personas, memory, and tool configurations, instead of today's single global agent identity.
#6839 Investigating P3 comp/agent Feature: Lazy Tool Schema Loading — Two-Pass Tool Injection to Reduce Token Overhead
This issue proposes a two-pass lazy tool schema loading scheme (abbreviated tool list first, full schema only when a tool is actually selected) to cut the per-call token overhead of injecting all enabled tool schemas.
#5533 Investigating P3 comp/agent feat(dreaming): introduce stable Dreaming reflection mode across CLI and gateway
This issue documents making Dreaming a first-class, reliable reflection feature across both the CLI and gateway (including Telegram), with tool usage disabled, correct runtime/provider config propagation, and error surfacing instead of silent failure.
#5528 Investigating P3 comp/agent [Feature]: configurable approval-locked command patterns for dangerous/disruptive local actions
The set of commands requiring dangerous-action approval is currently hard-coded in tools/approval.py, so users cannot mark installation-specific or operationally-disruptive commands as approval-required without patching the source; the issue requests a configurable mechanism.
#5257 Investigating P3 comp/agent feat: Generalized ACP client for multi-agent CLI orchestration
This issue proposes generalizing Hermes's ACP client beyond the current Copilot-specific implementation to orchestrate 14 ACP-compatible coding agents (Claude Code, Codex CLI, Gemini CLI, etc.) using official vendor-maintained ACP adapters.
#5200 Workaround P2 comp/agent [docs] Context Files (AGENTS.md/SOUL.md): documented behavior doesn't match code
This issue reports three mismatches between documented and actual behavior for AGENTS.md and SOUL.md context file loading, including that AGENTS.md is not recursively loaded and cwd is overridden in gateway/messaging mode.
#4656 Investigating P3 comp/agent [Feature]: credential proxy daemon — zero-knowledge HTTP/HTTPS broker for agent credentials
This feature proposes a credential proxy daemon that intercepts HTTP/HTTPS at the transport layer so the real credential value never exists anywhere the agent process can read it, addressing a gap left by existing env-scoping and PID-isolation mitigations.
#4505 Investigating P3 comp/agent Optimize Ollama Integration: Native /api/chat vs OpenAI-Compatible Endpoint
Feature proposal to replace the OpenAI-compatible endpoint with Ollama's native /api/chat endpoint, claiming true delta streaming, longer timeouts, full native parameter support, and ~15-20% lower latency, with a proposed 581-line adapter implementation.
#4379 Workaround P2 comp/agent Token overhead analysis: 73% of each API call is fixed overhead (~13.9K tokens) — data + suggestions
Using a custom monitoring dashboard, the reporter analyzed 6 request dumps from a Hermes v0.6.0 deployment and found that 73% of every API call (~13,935 tokens) is fixed overhead from tool definitions (46.1%) and the system prompt (27.2%), independent of conversation content.
#514 Investigating P3 comp/agent Feature: A2A (Agent-to-Agent) Protocol Support — Remote Agent Discovery, Communication & Interoperability
This issue proposes adding support for Google's A2A (Agent-to-Agent) protocol to Hermes, allowing it to discover and call remote agents built on other frameworks and expose itself as an A2A-discoverable agent.
Feature Request: User-Configurable Multi-Model Routing with Capability Categories and Evaluation Feedback
The issue proposed letting users assign multiple LLMs to capability categories (speed, cost, reasoning depth, etc.) so tools could dynamically request a model based on declared needs rather than a single hard-coded model; the issue is now closed.