On-device vs. cloud, and privacy

InkSpoke is offline-first: your words can be transcribed and polished entirely on your own computer, and nothing leaves it unless you choose a cloud model or turn on sync. This page explains what runs where, how you control it, and exactly which data stays private.

Two decisions, made per model

A single dictation goes through two AI steps, and each one is independent:

Speech recognition (ASR) — turning your audio into text. Runs on-device with Whisper.net or Parakeet, or via a cloud provider.
AI refinement (LLM) — cleaning up and reshaping that text. Runs on-device with a local model, or via the built-in InkSpoke Platform cloud model or your own BYOK provider.

You pick a model for each step (in AI Models settings), and the model you pick decides whether that step is local or cloud. So you can, for example, transcribe locally to keep your audio private, and still refine with a fast cloud model.

                       Your speech
                            │
              ┌─────────────┴──────────────┐
              │  1. Speech recognition      │
              │  Local  Whisper / Parakeet  │  → audio stays on your device
              │  Cloud  provider API        │  → audio uploaded to provider
              └─────────────┬──────────────┘
                            │   transcribed text
              ┌─────────────┴──────────────┐
              │  2. AI refinement           │
              │  Local  on-device LLM       │  → text stays on your device
              │  Cloud  Platform / BYOK     │  → text sent to provider
              └─────────────┬──────────────┘
                            │
                   Injected at your cursor

What "default" actually means

Out of the box, speech runs on-device (Whisper Small — free and offline) but refinement uses the InkSpoke Platform cloud model during your Pro trial. So by default your audio never leaves your machine, but your transcribed text is sent to be polished. To keep the entire loop local, switch refinement to an on-device model too (see below).

On-device models

Local models download once and then run with no network. The speech side has two engines:

Engine	What it is	Notes
Whisper.net	The default local speech recognizer. Default model is Whisper Small (244M).	Small is included free; the other sizes (Tiny, Base, Medium, Large, Large-v3 Turbo variants) are Pro.
Parakeet	An alternative ONNX speech engine.	Selectable as a speech model when downloaded.

On-device refinement uses a local GGUF language model. Local ASR beyond Whisper Small, and local LLMs, live on the AI Models → On-Device tab, which is a Pro feature.

GPU acceleration

On-device speech can use your GPU to run faster. This is controlled by UseGpuForDictation (on by default), and what it does depends on your OS:

Platform	On-device speech acceleration
macOS	Metal (GPU) + Apple Neural Engine for Whisper
Windows	CUDA (NVIDIA GPUs)
Linux	CPU only — no GPU acceleration

Power users — Parakeet is CPU-bound today

GPU acceleration currently applies to Whisper. The Parakeet engine runs on CPU on all platforms for now; CUDA (Windows) and CoreML (macOS) acceleration for Parakeet are planned but not yet enabled. If you rely on GPU speed, stay on a Whisper model.

Cloud models

Choosing a cloud speech or text model routes that step to a provider over the network:

InkSpoke Platform — the built-in cloud provider. Refinement through it uses InkSpoke's Responses API; it's the default text model during your Pro trial.
BYOK (bring your own key) — add any OpenAI-compatible provider with your own API key on the AI Models → Providers tab (Pro). Your key is stored in your operating system's keychain, never in a settings file. Requests go directly to your provider under your account.

Cloud speech falls back to local

Cloud speech recognition is designed to fail safe. If a cloud upload doesn't succeed, InkSpoke falls back to your local model so you still get a transcript, and the failed upload is queued for retry rather than lost.

Meetings are local for now

Meeting recording transcribes on-device only — cloud transcription for live meetings is coming soon. Cloud transcription is available when you import an audio or video file.

What stays private

InkSpoke keeps your data on your machine by default:

Your audio never leaves your device when you use a local speech model. With a cloud speech model, only then is audio uploaded.
Your history, recordings, and workspaces are stored locally (the History screen is even labelled "Local only") unless you turn on cloud sync.
Cloud sync is opt-in and end-to-end encrypted. It's off by default (CloudSyncEnabled = false) and requires you to be signed in. When on, your workspaces and settings are encrypted on your device with a key held in your OS keychain — the servers store ciphertext they can't read.
API keys and sync keys live in the OS keychain (macOS Keychain, Windows Credential Manager, Linux Secret Service) — never in the plain-text settings.json.

You also have a Privacy Tier setting (Settings → Configuration → General) that sets your overall posture. It defaults to LocalShield, with HybridIntelligence and PrivacyCloud as the other levels.

Cloud refinement sends text, not just audio

Even with local speech, if your refinement model is cloud-based, your transcribed text is sent to that provider. And if you send custom vocabulary to a cloud speech model, that's gated by a separate opt-in (CustomVocabularyCloud). For a fully private loop, keep both the speech model and the refinement model on-device.

Choosing your setup: privacy vs. accuracy and speed

Mix and match the two steps to land where you want on the privacy/performance trade-off:

Setup	Speech	Refinement	What leaves your device	Best when
Fully on-device	Local (Whisper / Parakeet)	Local LLM	Nothing	Privacy is paramount, or you're offline. Quality and speed depend on your hardware and model size. Local LLM needs Pro.
Hybrid (audio stays home)	Local	Platform or BYOK cloud	Transcribed text only	You want strong refinement quality but never want to upload audio. This is closest to the default.
Fully cloud	Cloud provider	Cloud provider	Audio + text	You're on modest hardware and want the fastest, most accurate results, and you're comfortable using a provider.

Start local, upgrade selectively

A good rule of thumb: keep speech on-device (it's free and private), and only reach for the cloud on the refinement step where a larger model helps most. You can change either model at any time — nothing is locked in.

Settings that affect this

Setting	Default	What it does
Active speech model	Whisper Small (local)	Picking a cloud speech model switches ASR to cloud (`AsrProvider.Mode`).
`UseGpuForDictation`	On	GPU acceleration for on-device speech (Metal / CUDA; no effect on Linux).
`CloudSyncEnabled`	Off	Opt-in, end-to-end-encrypted sync of workspaces and settings.
`PrivacyTier`	LocalShield	Your overall privacy posture (LocalShield / HybridIntelligence / PrivacyCloud).
`CustomVocabularyCloud`	—	Gate for sending your custom vocabulary to a cloud speech model.

Next steps

Models and providers — the full catalog: on-device, Platform, and BYOK.
Audio and models settings — download models, pick your defaults, tune the GPU toggle.
Account, sync, and updates — turn on encrypted cloud sync and manage devices.
Synced data and privacy — view or delete your end-to-end-encrypted data from the web.

Two decisions, made per model​

On-device models​

GPU acceleration​

Cloud models​

Cloud speech falls back to local​

What stays private​

Choosing your setup: privacy vs. accuracy and speed​

Settings that affect this​

Next steps​