Dictaro Power User Guide: 10 Features Most Users Never Discover

Most Dictaro users hold Ctrl+Win, speak, and stop there. This guide covers the two-hotkey system, custom cleanup recipes, real-time translation, elevated-app support, BYOK providers, and CLI transcription — everything beyond basic dictation.

TLDR

Most Dictaro users hold Ctrl+Win, speak, and stop there. That workflow is fast — but it leaves most of the tool's capability unused. Dictaro has a two-hotkey system: one to transcribe, one to clean up. Beyond those two keys, there are additional capabilities — saved cleanup recipes, real-time translation, elevated-app support, CLI transcription, custom BYOK providers, and more — that change how the tool integrates into a professional workflow. This guide covers all of them.

The Two-Hotkey System

Dictaro is designed around two distinct operations, each with its own hotkey. Most users learn the first and miss the second entirely.

Hotkey 1: Transcription

The default shortcut is Ctrl+Win. Hold it, speak, release. The transcribed text appears at your cursor in whatever application is active. This is the core workflow.

The transcription step runs on Dictaro's own private servers — not Microsoft Azure Speech or Google Cloud Speech. Audio is processed in server RAM and deleted immediately after transcription. It is never written to disk. The result you see in your application is clean raw transcription — filler words removed, reasonable punctuation added — but not yet AI-polished prose.

Hotkey 2: AI Cleanup (Pro)

After transcription, a separate second hotkey runs the selected cleanup mode on your raw transcript. This is where Dictaro's AI pipeline applies: the text routes through your chosen AI provider (or Dictaro's default processing if you have not configured BYOK) and returns as polished, formatted output.

The cleanup hotkey is configured separately in Dictaro's settings. If you are a Pro user and you have never set it up, this is the single highest-leverage configuration change available to you. Transcription gives you speed. Cleanup gives you usable text.

Customizing Your Hotkeys

The default Ctrl+Win works for most setups, but it may conflict with other system shortcuts or feel awkward for your hand position. Dictaro lets you remap both hotkeys to any combination your fingers already know.

Common alternatives used by power users:

  • Right Alt + Space (ergonomic for one-handed activation while the other hand is on the mouse)
  • Ctrl+Shift+D (familiar to users who previously used Dragon's activation key)
  • F13 or other function keys (if you have an extended keyboard — completely avoids all conflicts)

The remapping is in Dictaro's tray icon settings. There are no restrictions on the combination you choose. If a shortcut creates a conflict with another application, the conflict is visible in the settings panel and you can change it immediately.

Cleanup Modes: What Each One Does

Dictaro ships with five built-in cleanup modes, each tuned for a different output register. The second hotkey runs whichever mode is currently selected. Switching between them takes one click in the cleanup panel.

Professional tone

Removes filler words, normalises sentence structure, and produces formal prose. The tone is measured and direct — appropriate for email, documentation, reports, and professional correspondence. This is the mode that converts a verbally dictated rough thought into something you can send without editing.

Casual tone

Cleans up transcription errors and filler words but preserves the informal register of the original. The output reads like a thoughtful message rather than a corporate memo. Good for Slack, Teams, Discord, or any context where formal prose would feel out of place.

Concise

Compresses the transcript to its core meaning. Redundant phrases, hedging language, and run-on qualifications are removed. If you tend to speak in long, discursive sentences and want tight output, this mode does the compression automatically.

Bullet points

Converts a spoken passage into a bulleted list. Dictate a meeting recap, a set of action items, or a requirements list in conversational speech. The cleanup step extracts the discrete items and formats them as a list. For note-taking workflows where you need structured output from unstructured speech, this is the fastest path from spoken to scannable.

Translate + target language

Transcribes your speech and translates the output to a specified target language in a single cleanup pass. This is distinct from real-time speech translation (covered below) — this mode transcribes first, then translates during the cleanup step. Useful when you want to dictate in your native language and produce output in a different one with AI-quality translation applied to your cleaned prose.

Custom Prompt Recipes: The Most Underused Feature

Beyond the five built-in modes, Dictaro lets you save unlimited custom cleanup prompts. A custom prompt is a specific instruction to the AI cleanup model — defining exactly what you want done to your raw transcript. You name it, save it, and access it the same way you access the built-in modes.

This is the feature with the most leverage for users who dictate a specific content type repeatedly.

Examples worth setting up

"Email draft"
Instruction: "Format as a professional email. Add a greeting and closing. Keep the tone direct and polite. Do not add information I did not provide."
Use case: Dictate the substance of an email verbally — the ask, the context, the next step. Cleanup adds the structure and register of a proper email draft.

"Meeting summary"
Instruction: "Convert this to a structured meeting summary. Format as: Key decisions made, Action items with owners, Open questions, Next steps. Use bullet points."
Use case: Dictate your recollection of a meeting immediately after it ends. Cleanup structures it into a shareable summary format.

"Slack message"
Instruction: "Rewrite this as a clear, direct Slack message. No filler. No formal opener. One paragraph or short bullets. Under 100 words if possible."
Use case: Dictate the key point quickly; cleanup trims it to channel-appropriate length and tone.

"Code comment"
Instruction: "Rewrite this as a concise code comment. Use present tense. Explain what the code does, not how it does it. Under 2 lines."
Use case: Dictate a verbal explanation of a function or block; cleanup converts it to a clean inline comment.

"Summarise"
Instruction: "Summarise the following in 3-5 bullet points. Each bullet should be a self-contained fact or decision."
Use case: Dictate a long explanation or set of notes; cleanup extracts the key points as a reference list.

Custom prompts are stored locally in Dictaro's settings. They are not transmitted anywhere until you actually run a cleanup session. Building a library of 4-6 task-specific prompts takes about 10 minutes and eliminates the need to switch modes manually for different content types throughout the day.

Real-Time Speech Translation

A separate capability from the Translate cleanup mode — real-time speech translation runs the translation during the transcription step rather than after. You speak in one language and text appears in another at your cursor, in near real-time.

Dictaro supports 25x25 language pairs — any of the 25 supported languages as source, any other supported language as output. The most common professional use cases are:

  • Dictating notes in your native language and having them appear in the target language of the document (useful for multilingual teams or international correspondence)
  • Speaking responses to emails in a foreign language through dictation rather than manually translating after writing
  • Creating bilingual documentation where you draft one version in your native language and translate directly to a second language via a separate dictation pass

Real-time translation is a separate hotkey in settings — not the same as the main transcription hotkey or the cleanup hotkey. You configure which source and target language pair you want, then activate it when needed.

Elevated Apps, RDP, and Citrix

This capability matters significantly for developers, IT professionals, and enterprise users — and it is invisible to most users because it just works without configuration.

Electron-based dictation apps (those built on the Chrome/Node.js framework) cannot inject text into elevated Windows applications — programs running with administrator privileges, or apps inside Remote Desktop Protocol (RDP) sessions and Citrix virtual desktops. The Electron security sandbox prevents it.

Dictaro is built in native Rust. It installs at 18 MB, runs at 30 MB RAM at rest, and types into elevated apps, RDP sessions, and Citrix virtual desktops just as it types into any other application. If your workflow involves elevated terminal windows, remote desktops, or Citrix-hosted enterprise applications, Dictaro operates where Electron-based competitors silently fail.

This is relevant for: system administrators dictating into elevated PowerShell or CMD windows; users on corporate VDI environments who could not get Electron-based dictation apps to work; enterprise teams whose core tools run inside Citrix.

CLI Transcription (Pro)

Pro users have access to a command-line interface for transcription. This opens Dictaro to scripting and automation use cases that go beyond the interactive hotkey workflow.

Some uses that Pro users have built around CLI transcription:

  • Piping transcription output directly into a text processing script (log files, structured data capture, automated note filing)
  • Integrating dictation into custom workflows that require text input without manual hotkey activation
  • Building automated transcription pipelines for batch audio files
  • Triggering transcription from other tools via scripted invocation

CLI transcription documentation is available in the Dictaro docs for Pro users.

BYOK Configuration: Providers and What Each Offers

Dictaro's BYOK (bring your own API key) routes AI cleanup through your own provider account. The key is stored in Windows Credential Manager on your device and goes directly to your provider; Dictaro's servers are not in the path of the cleanup step. Full BYOK explanation.

Supported providers and their practical differences:

OpenAI (GPT-4o, GPT-4o Mini)
The default BYOK choice for most users. GPT-4o produces high-quality cleanup output across all modes. GPT-4o Mini is faster and cheaper for shorter content. Use the OpenAI API dashboard to set spending limits so cleanup costs remain predictable.

Anthropic (Claude Sonnet, Haiku)
Claude Sonnet produces notably clean, natural-sounding prose output — particularly useful for Professional tone mode where the goal is polished written English. Claude Haiku is the fastest and cheapest option for high-frequency short-form cleanup (Slack messages, brief emails).

Groq (Llama 3.3, Mixtral)
Groq's inference infrastructure is exceptionally fast — lower latency for the cleanup step than OpenAI or Anthropic for most content lengths. The Llama 3.3 model is strong for standard professional cleanup tasks. Groq offers a free tier with generous rate limits, making it useful for low-cost BYOK evaluation.

Ollama (local models)
Fully local inference. The cleanup step runs on your machine with no network request at Stage 2. For users who handle confidential content where even the cleanup step should not leave the device, Ollama eliminates outbound transmission entirely after the transcription call. Requires a capable GPU for reasonable latency with 7B+ parameter models.

Google Gemini (Gemini 2.5 Flash)
Fast inference at competitive pricing. Good for high-volume users who run cleanup on nearly everything they dictate and want to optimise per-token cost.

OpenRouter (100+ models)
A single API key that routes to 100+ models across providers. Useful for users who want to experiment with different models for different cleanup tasks, or who want access to models not natively supported by Dictaro's built-in provider list.

Custom (any OpenAI-compatible endpoint)
Configure any OpenAI-compatible API endpoint. Relevant for enterprise self-hosted LLMs, Azure OpenAI Service (which offers zero-retention enterprise agreements), or any custom model deployment that exposes an OpenAI-compatible API.

Getting More from the Free Tier

The free tier includes a daily dictation allowance with no account required. For users evaluating Dictaro before committing to Pro, a few practices extend the daily allowance further:

  • Dictate in batches, not continuously. The daily allowance resets every 24 hours. Planning your highest-volume dictation tasks for a single session makes better use of the available quota than spreading small dictations throughout the day.
  • Use it for highest-value content first. If you have 10 emails to write and a Slack thread to respond to, use the daily allowance for the emails. Leave short Slack responses for keyboard input.
  • BYOK is available on the free tier. You do not need Pro to configure your own API key for cleanup. If you connect an OpenAI or Anthropic API key, the AI cleanup step runs on your key on the free tier. The daily allowance applies to transcription volume only.

Microphone Tips That Affect Accuracy

Transcription accuracy is largely a function of audio quality. The Whisper-based engine is highly robust to accents and natural speech variation, but it responds to input signal quality like any ASR engine.

  • USB desk microphone over Bluetooth. Bluetooth audio uses compressed codecs that introduce artefacts the ASR engine has to work around. A USB desk microphone (Blue Yeti, Samson Q2U, HyperX SoloCast) or a wired headset delivers cleaner signal. The difference in accuracy, especially for technical vocabulary and proper nouns, is measurable.
  • Speak toward the microphone. Position matters more than most users expect. Speaking 20-30 cm from a cardioid desk mic at roughly speaking-mouth height produces better signal than speaking across the room or with the mic positioned behind you.
  • Reduce background noise during dictation sessions. A headset with cardioid pickup pattern isolates your voice from ambient noise better than a desk mic in a noisy room.
  • Speak at natural pace, not slowly. Over-articulation at slower-than-natural pace can reduce accuracy for Whisper-based engines, which are trained on normal conversational speech.

The Full Dictaro Setup for a Professional Workflow

A configuration that takes advantage of all the above in one setup:

  1. Remap the transcription hotkey to a key combination that does not conflict with your primary applications.
  2. Set up a BYOK provider (Anthropic Claude Sonnet for prose quality, or Groq Llama for speed and cost).
  3. Configure the cleanup hotkey as a second distinct key combination.
  4. Save 3-4 custom prompt recipes for the content types you produce most.
  5. If you handle sensitive content: configure Ollama for fully local cleanup at Stage 2.
  6. If your workflow involves RDP or elevated apps: verify the hotkey works in those contexts (it will for most setups out of the box).

The setup takes 15-20 minutes once. From that point, the full capability of the tool is available from any active text field on Windows, with the exact output format you need for each context.

For the complete Windows dictation setup guide: How to Set Up Voice Dictation on Windows.

For the BYOK privacy architecture in detail: What Is BYOK in Dictation Apps?

For how the AI cleanup pipeline works end to end: How AI Text Cleanup Works.

Download Dictaro. Free tier, no account required, BYOK available from day one. Windows 10 and 11.


Dictaro is a Windows-only AI dictation app. System-wide operation on Windows 10 and 11. Native Rust. AI text cleanup with BYOK for OpenAI, Anthropic, Groq, Ollama, Mistral, Gemini, OpenRouter, and any OpenAI-compatible endpoint. No account required. Download and start dictating in under two minutes.