MidiPilot - Your AI Copilot

MidiPilot is the AI brain embedded directly in MidiEditor AI. Open the sidebar panel, type what you want in plain English, and watch it compose, edit, transform, and analyze your MIDI data automatically.

MidiPilot Agent mode composing a full arrangement from a single prompt - live streaming with GPT-5.5 (Responses API)

Key Features

🎯 Agent Mode

Multi-step agentic loop - the AI calls tools iteratively, inspecting results between steps to build complex compositions from a single prompt.

Learn more →

💬 Simple Mode

Single request/response for quick edits, small transformations, and focused tasks without the overhead of multi-step planning.

Learn more →

🎮 FFXIV Bard Mode

Enforces Final Fantasy XIV Performance constraints - 8 tracks, monophonic, C3-C6 range, tonal drum conversion for MidiBard2 octets.

Learn more →

🔌 Multi-Provider

OpenAI, OpenRouter, Google Gemini, or any OpenAI-compatible endpoint. Bring your own API key.

Learn more →

🧠 Reasoning Support

Toggle thinking/reasoning for o-series and GPT-5.x models. Configurable effort from None to Extra High.

Learn more →

✏️ Custom System Prompts

Edit AI behavior per mode via the built-in editor. Export/import as JSON - no recompiling needed.

Learn more →

📜 Conversation History

Conversations auto-saved as JSON. Browse, search, and resume past sessions from the history menu.

Learn more →

⚡ Response Streaming

Simple and Agent mode stream live via SSE, including reasoning summaries, assistant text, and tool-call progress where providers support it.

Learn more →

🔄 Model Management

Refresh provider model lists, filter non-chat models, pin favourites, and retry temporarily blocked streaming models without restarting.

Learn more →

💾 Per-File AI Presets

Save model, provider, mode, and custom instructions per MIDI file. Auto-loaded when you open the file.

Learn more →

Chat Panel

The MidiPilot panel features a clean chat interface with context-aware track info, mode selection (Agent / Simple), and model switching - all accessible without leaving the editor. Assistant bubbles and the live reasoning stream render Markdown (bold, italic, lists, fenced code, links); user and system bubbles stay plain text.

MidiPilot chat panel in light theme MidiPilot chat panel in dark theme
Docked MidiPilot panel side by side - Light theme (left) and Dark theme (right)

The input bar at the bottom provides quick access to everything you need:

Agent/Simple mode toggle · FFXIV checkbox · Model selector · Reasoning level

Simple Mode (One-Shot)

Simple mode sends a single API call to the AI model containing your instruction, the current editor state, selected events, and surrounding musical context. The model responds with one complete answer - no follow-up calls, no iterative loop.

How It Works

  1. Your prompt is bundled with a JSON snapshot of the editor (cursor position, active track, tempo, time signature, selection, and ±5 measures of surrounding events)
  2. Everything is sent as one request to the model
  3. The model returns a single JSON response with one or more actions (edit, delete, create_track, set_tempo, etc.)
  4. MidiEditor executes all actions from that response at once

When to Use Simple Mode

Limitations

If MidiPilot detects a truncated response (finish_reason: "length"), it will display a warning suggesting you switch to Agent mode for the task.

Simple Mode - one-shot response from GPT-5.5 streaming text and the resulting JSON action into the editor

Agent Mode (Multi-Step)

Agent mode is the powerhouse for complex compositions and large-scale edits. Instead of squeezing everything into one response, the AI works iteratively - planning, executing, inspecting, and adjusting across multiple API calls until the task is complete.

How It Works

  1. Your prompt is sent along with the system prompt and a set of 16 tools the model can call
  2. The model responds with one or more tool calls (e.g., create_track, insert_events, get_editor_state)
  3. MidiEditor executes each tool call and sends the results back to the model
  4. The model reviews the results, decides the next step, and issues more tool calls
  5. This loop continues until the model decides the task is complete (or the max steps limit is reached)

Why This Matters

Agent Steps Panel

During an Agent run, a collapsible Agent Steps panel appears below the chat showing real-time progress: each tool call, its parameters, and results. The agent loop can also stream live reasoning summaries, assistant text, and tool-call argument progress while the step is still being generated. Step indicators use theme-aware colors that adapt to dark and light mode (⏳ pending, 🔄 active, ✅ done, ⚠ retrying, ❌ failed).

Gemini 3.1 native streaming with live thought summaries and Agent tool progress

When to Use Agent Mode

Configuration

SettingDescription
Agent Max StepsMaximum tool calls per request (5-100, default 50). Increase for very large compositions.
Token LimitOptional output cap. Agent mode is less sensitive to this since each call is smaller, but very low limits can still truncate individual steps.

Agent Conductor & Working State

The agent loop is wrapped by a lightweight conductor that owns a compact, program-managed working state: the user goal, the inferred task type, a list of confirmed facts (what the model has already accomplished), the last tool result, the active constraints (e.g. FFXIV rules), and a counter of consecutive failed write attempts.

GPT-5.5 model-isolation policy

For OpenAI gpt-5.5*, MidiPilot applies a small set of model-specific mitigations through a central policy table - nothing else is touched. Currently:

On OpenRouter, only the schema-light tools and prompt sanitisation apply (no Responses-API knobs). On every other model, including all current OpenAI gpt-5*/gpt-4o/o-series, Pitch Bend and parallel tool calls remain unchanged.


FFXIV Bard Mode

When the FFXIV checkbox is enabled, MidiPilot appends additional constraints to the system prompt that enforce Final Fantasy XIV Bard Performance rules. This works with both Simple and Agent mode.

Enforced Rules Overview

See FFXIV Prompt Examples for tested prompts.


Fix X|V Channels

The Fix X|V Channels tool provides a one-click, deterministic channel fixer that sets up the complete MidiBard2 channel mapping - no AI calls needed. Find it in the toolbar or via Tools → Fix X|V Channels.

👉 Full documentation: Fix X|V Channels - the 5-step algorithm, Rebuild vs Preserve modes, supported instruments, guitar variant switching, before/after screenshots, and tips.


Mode Comparison

FeatureSimple ModeAgent Mode
API Calls1 (one-shot)Multiple (iterative loop)
Tool AccessNone16 tools
Self-CorrectionNoYes - can inspect and fix
Token Limit RiskHigh for complex tasksLow - work is split
Truncation HandlingWarns, suggests Agent modePer-step, can continue
SpeedFast (single round-trip)Slower (multiple round-trips)
UI FeedbackStreaming text or action previewLive thoughts, assistant text, and Agent Steps panel
UndoPer actionGranular - one Ctrl+Z per tool call
Ideal ForQuick edits, small changesComplex compositions, multi-track

Token Tracking & Context Window

MidiPilot tracks token usage per API call and per session, with automatic normalization across providers (OpenAI, Anthropic, Gemini). The token counter is displayed at the bottom of the chat panel:

<last call> | <session total>🔥 / <context window> [<limit>✂]

Context Window Management

When conversations grow long, MidiPilot automatically manages context to prevent exceeding the model’s limit:

Multi-Provider Token Normalization

Different providers report token usage in different formats. MidiPilot normalizes all of them:


AI Settings

Configure your AI connection from Settings → MidiPilot AI. Select a provider, enter your API key, choose a model, and customize behavior.

MidiPilot AI settings Connection test successful
Provider configuration - Connection test: ✅ Model: gemini-2.5-flash
SettingDescription
ProviderOpenAI, OpenRouter, Google Gemini, or Custom
Base URLAuto-filled per provider, or enter your own endpoint
API KeyYour provider API key - get one from OpenAI, OpenRouter, or Google Gemini
ModelEditable dropdown populated from the active provider's cached model list. If no cache exists, MidiPilot falls back to a small built-in starter list.
Refresh ModelsFetches the live provider model list from /models, normalizes provider-specific fields, filters obvious non-chat models, and stores the result in <userdata>/midipilot_models.json.
Manage favourites…Pick the models that should stay visible per provider. If no favourites are selected, all cached chat-capable models are shown.
Force Streaming for This ModelAppears when the selected provider/model failed streaming during the current app session. Clears the temporary warning so the next request tries live streaming again.
Token LimitOptional cap on output tokens to control costs
ThinkingEnable reasoning for o-series and GPT-5.x models
Reasoning EffortNone / Low / Medium / High / Extra High
Live StreamingStreams both Simple and Agent responses live (text, reasoning summaries, and tool-call arguments) when the provider supports it. Failed paths automatically fall back to non-streaming for the rest of the session.
Prompt Profiles…Open the Prompt Profiles dialog to attach custom system prompts to specific provider/model combinations.
Context RangeMeasures before/after cursor sent as musical context (0-50)
FFXIV ModeEnable Bard Performance rule enforcement
Agent Max StepsMaximum tool calls per Agent request (5-100)
Test ConnectionVerify your API key and model work correctly

Model Refresh & Favorites

MidiEditor AI can fetch provider model lists directly instead of relying on a fixed dropdown. The refresh button is available both in Settings → MidiPilot AI and in the MidiPilot footer, so you can update models without leaving the chat panel. Refreshed models are cached for seven days and used for context-window estimates.

Refreshing provider model list Model favorites dialog
Refresh provider models, then keep the dropdown focused with per-provider favourites

When the footer refresh completes, MidiPilot reports the result in the chat/status area so you know whether the cache was updated or the provider rejected the request.

Model update status in MidiPilot chat
Model update feedback after a successful refresh

If a model fails live streaming but still works without streaming, MidiPilot retries automatically and marks that model with a warning icon for the current app session. The mark is scoped per mode - Simple Mode (no tools) and Agent Mode (with tools) are tracked independently, because some models support live streaming in only one of the two paths. The dropdown shows which mode is blocked: ⚠ <model> (Simple), (Agent), or (Simple+Agent). Use Force Streaming for This Model to clear the temporary mark after switching providers, updating the model, or testing a fixed endpoint.

Capability-aware error handling - if the active provider returns HTTP 404 with “No endpoints found that support tool use” (or any equivalent “tools not supported” error), MidiPilot stops retrying immediately, posts a clear “Model does not support tool calling - pick a different model in Settings → AI, or switch to Simple mode for this request” bubble, and remembers the flag per provider:model for the rest of the session. Picking a different model re-enables Agent Mode automatically.


Custom System Prompts

Click Edit System Prompts… in settings to open the built-in editor. Each mode (Simple, Agent, FFXIV, FFXIV Compact) has its own tab with fully customizable instructions.

System Prompt Editor
Built-in System Prompt Editor with tab-based mode selection

Prompts are saved as system_prompts.json in the application directory. If no custom file exists, MidiPilot uses the hardcoded defaults.

Looking for per-model overrides? The newer Prompt Profiles dialog lets you attach a dedicated system prompt to one or more provider:model combinations - great for guiding a single “quirky” model without touching the global mode prompts.


Prompt Profiles (Per-Model System Prompts)

Open Settings → AI → Prompt Profiles… to manage profiles. A profile binds a custom system prompt to one or more model patterns and decides whether the profile replaces or appends to the default mode prompt.

FieldDescription
NameFree-text label shown in the profile list (e.g. “GPT-5.5 Decisive”, “Claude Strict JSON”).
Model patternsOne or more provider:model entries with optional glob suffix, e.g. openai:gpt-5.5*, openrouter:openai/gpt-5.5*, gemini:gemini-2.5-pro. The first matching profile wins.
System promptThe prompt text that should be sent for matching models. Markdown is allowed.
Append to defaultWhen checked, the profile is appended to the active mode prompt (Simple/Agent/FFXIV) instead of replacing it - useful for adding small, model-specific guardrails.

MidiPilot ships with one built-in profile: GPT-5.5 Decisive, bound to openai:gpt-5.5* and openrouter:openai/gpt-5.5*. It nudges the model to commit to a tool call instead of asking clarifying questions, which is a known weak spot of that family. You can edit, duplicate, or disable it like any other profile.

Prompt Profiles dialog showing the built-in GPT-5.5 Decisive profile, model patterns, append-to-default toggle, and system prompt editor
Prompt Profiles dialog with the built-in GPT-5.5 Decisive profile bound to openai:gpt-5.5* and openrouter:openai/gpt-5.5*

Auto-Save

MidiEditor AI automatically saves a backup copy of your work at regular intervals, so you never lose progress to a crash or accidental close. Your original file is never overwritten - the backup is stored as a separate .autosave sidecar file alongside your MIDI file.

How It Works

Crash Recovery

Settings

Auto-save options are in Settings → System & Performance:

Auto-Save settings in System & Performance
Auto-Save settings with enable toggle and interval configuration
SettingDescription
Enable auto-saveToggle automatic backups on or off (default: on)
Save after idle (seconds)Seconds of inactivity before a backup is written (30-600, default: 120)

AI Tools Reference

In Agent mode, the AI has access to 16 tools (12 base + 4 FFXIV-specific) for inspecting and modifying MIDI files:

ToolDescription
get_editor_stateRead file info, tracks, tempo, time signature, cursor position
get_track_infoGet detailed info for a specific track (channel, event count, note range)
create_trackCreate a new MIDI track
rename_trackRename an existing track
set_channelSet the MIDI channel for a track
insert_eventsAdd new MIDI events (notes, control changes, etc.)
replace_eventsModify existing events in a range
delete_eventsRemove events by index
query_eventsRead events in a tick range on a track
move_events_to_trackMove events between tracks
set_tempoChange the tempo (BPM)
set_time_signatureChange the time signature
setup_channel_patternAuto-configure MidiBard2 channel mapping (FFXIV)
convert_drums_ffxivConvert GM drum kit to FFXIV-compatible tone-mapped notes
validate_ffxivCheck FFXIV Bard Performance rule compliance
analyze_voice_loadRead-only audit of the FFXIV 16-voice ceiling and 14 notes/sec/channel rate cap. Returns globalPeak, overflowRanges and rateHotspots - see FFXIV Voice Limiter

Supported Providers

ProviderBase URLAPI KeyFree Tier
OpenAIapi.openai.com/v1Get API Key →Limited
OpenRouteropenrouter.ai/api/v1Get API Key →Free models available
Google Geminigenerativelanguage.googleapis.comGet API Key →15 RPM, 1M TPM
CustomUser-specifiedUser-specifiedVaries

Getting Started

  1. Open Settings (gear icon or Edit → Settings) and click the MidiPilot AI tab
  2. Select your Provider (Google Gemini is a great free option)
  3. Enter your API Key (get one from OpenAI, OpenRouter, or Google Gemini)
  4. Choose a Model (e.g., gemini-2.5-flash)
  5. Click Test Connection to verify everything works
  6. Close settings and open the MidiPilot panel from the sidebar
  7. Type a prompt and press Enter:
"Create an 8-bar jazz waltz in Bb major with piano, bass, and drums"

The AI will compose the requested music directly into the editor using its built-in tools. In Agent mode, it works iteratively - creating tracks, setting tempo, inserting notes, and validating the result step by step.

See Prompt Examples for more real-world prompts and a full demo.


Conversation History

MidiPilot automatically saves every conversation as a JSON file. You can browse, search, and resume past sessions at any time.

How It Works

Conversation File Format

Each conversation is stored as a single JSON file containing the full message history, model/provider info, token usage, per-turn metadata, and the associated MIDI file path. Files are human-readable and can be exported or shared.

Conversation history dropdown menu

Response Streaming

MidiPilot uses Server-Sent Events (SSE) to stream responses in real time. Instead of waiting for the entire response to complete, Simple mode can show text or action composition immediately, and Agent mode can show live reasoning, assistant text, and tool-call progress while each step is still being generated.

How It Works

When Streaming Is Used

ModeStreamingReason
Simple - text response✅ YesReduces perceived latency
Simple - JSON actions✅ PreviewShows action composition, then executes after complete JSON arrives
Agent - Chat Completions✅ YesStreams assistant text and tool-call argument deltas when the provider supports them
Agent - OpenAI Responses API✅ YesStreams text, reasoning summaries, and function-call arguments for GPT-5-family tool use
Agent - Gemini native✅ YesStreams thought summaries and whole function calls via :streamGenerateContent
Broken provider/model stream⚠ FallbackRetries non-streaming and marks the model for the current app session
OpenAI GPT-5.4 (Chat Completions, Material Dark) and Gemini 3.1 native streaming side by side - see the top of this page for the GPT-5.5 Responses-API run
Model dropdown showing a model marked with the warning icon and a (Simple) suffix after a streaming failure
Mode-scoped streaming fallback - failed paths are marked (Simple), (Agent), or (Simple+Agent) in the model dropdown until you re-enable streaming for that model

Non-Streaming Reference Run

For comparison, here is the same Agent loop running without live streaming - either because the provider does not support it, the user disabled Live Streaming in settings, or a previous request hit the per-session streaming-fallback marker. Tool calls and assistant text still arrive correctly, just in one chunk per round-trip instead of progressively.

Non-streaming Agent run - same multi-step composition flow, batched tool results per round-trip

Per-File AI Presets

Different MIDI files may need different AI settings. A 16-track orchestral arrangement needs different guidance than a 3-track FFXIV bard song. Per-file presets let you save and auto-load settings for each file.

What’s Saved

How to Use

  1. Click the ⚙ gear button in the MidiPilot footer
  2. Select “Save AI preset for this file”
  3. The current settings are saved as a .midipilot.json sidecar file next to your MIDI file
  4. Next time you open that MIDI file, the preset is auto-loaded
Gear menu with Save AI preset option
Preset saved confirmation Preset auto-loaded on file reopen

Sidecar File

Presets are stored as <filename>.midipilot.json next to the MIDI file. For example:

Sweet Child O Mine.mid
Sweet Child O Mine.mid.midipilot.json   ← preset

The preset file is a simple JSON object. All fields are optional - any field not present falls back to the global default.


MCP Server - External AI Clients

MidiEditor AI includes a built-in MCP (Model Context Protocol) server that exposes all 15 MidiPilot tools to external AI clients. Instead of using the built-in chat panel, you can connect Claude Desktop, VS Code Copilot, Cursor, Windsurf, or any other MCP-compatible client and let it edit your MIDI files directly.

Enable the MCP server in Settings → AI → MCP Server, copy the config JSON, paste it into your AI client, and you’re ready. All tool calls appear in the Protocol panel with the client name (e.g. “MidiPilotMCP (VS Code Copilot Claude Opus 4.6)”) and support full undo.

📖 Full MCP Server Documentation →


API Log

MidiPilot writes every API request and response to a log file for debugging and transparency. The log is saved as midipilot_api.log in the same directory as the MidiEditor AI executable.

DetailDescription
Locationmidipilot_api.log next to the .exe
FormatISO-8601 timestamp + direction ([REQUEST] / [RESPONSE]) + JSON body
Cleared onStarting a new chat or loading a different MIDI file - the previous log is overwritten
Manual clearDelete the file - it will be recreated on the next API call

If the AI produces unexpected results, open the log to inspect the raw JSON sent to and received from the provider. This is especially useful for debugging tool-call sequences in Agent mode.