SayToType Documentation

Complete guide to using SayToType for voice-to-text conversion

Getting Started

Download and Installation

Download the SayToType desktop application for Windows or macOS from our download page.

The app supports:

Windows: Windows 10 or newer
macOS: macOS 11 or newer (both Intel and Apple Silicon)

First Launch and Onboarding

When you first launch SayToType, you'll go through a guided 5-step onboarding process:

Step 1: Language Selection

Choose your preferred interface language
This sets the language for the app's menus and interface
Your initial transcription language will also be set based on this choice

Step 2: Account Sign-In

Sign in with your SayToType account using OAuth authentication
This connects your license and enables cloud transcription features

Step 3: Permissions Setup macOS only

Microphone Permission: Required for audio recording
Accessibility Permission: Required for global hotkeys and text insertion
Windows users can skip this step as permissions are handled automatically

Step 4: Setup Configuration

Audio Device: Select your preferred microphone from the dropdown
Hotkey Selection: Choose from platform-specific presets (see Hotkeys section)
Language Preferences: Confirm or adjust your transcription languages

Step 5: Try It Out

Test your setup with a live recording
Press your selected hotkey to start recording
Speak a few words to test the transcription
Verify that audio is being captured and transcribed correctly

Platform-Specific Setup

macOS Setup

Grant microphone and accessibility permissions when prompted
System Preferences may open automatically—follow the prompts
Default hotkey: Fn+Control
Apple Silicon Macs can use local transcription for offline/private processing

Windows Setup

No special permissions required—the app works out of the box
Default hotkey: Ctrl+` (backtick key)
Local transcription is not currently available on Windows

Note: After onboarding, the app will run in your system tray. Look for the microphone icon to access settings and start recording.

Basic Usage

Recording Your Voice

There are two ways to start recording:

Keyboard Hotkey: Press your configured hotkey (default: Fn+Control on macOS, Ctrl+` on Windows)
Tray Menu: Right-click the tray icon → "Start Recording"

During Recording

The recording window will appear with real-time audio visualization
Speak clearly into your microphone
Watch the audio waveform to confirm your voice is being captured
Press ESC key to cancel the recording at any time
Press your hotkey again to stop recording and process the transcription

After Recording

The transcribed and processed text is automatically pasted at your cursor position
The recording window behavior depends on your settings:
- Auto-detect (default): Closes if text was pasted, stays open if not
- Always close: Window closes immediately after processing
- Always open: Window stays open for review

Understanding Transcription Modes

SayToType supports three types of transcription:

1. Cloud Transcription

Default transcription method using SayToType's cloud service
Supports all languages and features
Requires internet connection
Available on all platforms (Windows, macOS)
Included in free tier

2. Local Transcription macOS Apple Silicon only

On-device processing using Apple's Whisper models
Complete privacy—no data sent to cloud
Works offline without internet connection (after models are downloaded)
6 model options with different speed/accuracy trade-offs
Free models: Standard, Standard English
Premium models Premium: Pro, Pro Light, Ultra, Ultra Light
Models must be downloaded before use
Requires macOS with Apple Silicon (M1/M2/M3)

3. Custom Provider Premium

Use your own API keys with supported providers
Supported providers:
- OpenAI: whisper-1, gpt-4o-transcribe, gpt-4o-mini-transcribe
- Mistral: voxtral-small-latest, voxtral-mini-latest
- Deepgram: nova-3, nova-2
- Custom: Any OpenAI-compatible API endpoint
You control costs and data privacy
Requires premium subscription

Mode Selection and Switching

Tray Menu: Right-click tray icon → "Select Mode" to choose from your saved modes
Recording Window: Click the mode selector in the left corner to switch modes
The currently active mode is highlighted
You can switch modes between recordings, but must finish or cancel the current recording first

Hotkeys and Shortcuts

SayToType uses customizable hotkeys to trigger recordings. The available hotkeys differ between macOS and Windows.

macOS Hotkeys

Choose from one of the following preset hotkey combinations:

Hotkey	Description
`Fn+Control`	Default hotkey (most common choice)
`Control+Option`	Alternative modifier combination
`Control+Escape`	Alternative using ESC key
`Fn+Shift`	Uses function key with Shift
`Fn+Escape`	Uses function key with ESC

Windows Hotkeys

Choose from one of the following preset hotkey combinations:

Hotkey	Description
Ctrl+`	Default hotkey (backtick key, left of 1)
Shift+`	Alternative using Shift
`Alt+Z`	Alternative using Alt key
`Alt+X`	Alternative using Alt key
`Win+Alt+Space`	Three-key combination

Universal Shortcuts

Shortcut	Action	Platform
`ESC`	Cancel current recording	All platforms
`Your Hotkey`	Start/stop recording	All platforms

Configuring Your Hotkey

Open Settings from the tray menu
Navigate to the "General" or "Configuration" tab
Find the "Hotkey" section with visual keyboard configurator
Select your preferred hotkey from the available presets
Test the hotkey to ensure it works correctly

Tip: Choose a hotkey that doesn't conflict with other applications you use frequently. The default options are designed to minimize conflicts.

Creating and Managing Modes

Understanding Modes

Modes are customizable presets that define how your voice recordings are transcribed and processed. Each mode can specify:

Transcription method (cloud, local, or custom provider)
Input and output languages
Custom instructions and formatting
Provider and model selection (for custom modes)

Accessing Mode Settings

Right-click the tray icon → "Settings"
Navigate to the "Modes" tab
Click "Add Mode" to create a new mode, or click an existing mode to edit it

Cloud Modes

Cloud modes use SayToType's transcription service and are available on all platforms.

Configuration Options

Name: Give your mode a descriptive name (e.g., "Email Draft", "Quick Notes")
Transcription Type: Select "Cloud"
Input Language: The language you'll be speaking
Tip: Setting the correct input language significantly improves accuracy
Output Language: Choose a target language for automatic translation (optional)
Tip: Leave as "Auto" or same as input for no translation
Custom Prompt: Add specific instructions for formatting or processing
Examples: "Format as bullet points", "Write in formal tone", "Summarize key points"

Local Modes macOS Apple Silicon only

Local modes use on-device Whisper models for completely private, offline transcription.

Available Models

Six model options with different speed and accuracy trade-offs:

Standard (Free): Multilingual, balanced performance
Standard English (Free): English-only, optimized for English
Pro Premium: Highest accuracy, best quality
Pro Light Premium: High accuracy, faster than Pro
Ultra Premium: Maximum speed, good for quick notes
Ultra Light Premium: Fastest processing, smallest model

Configuration Options

Name: Descriptive name for the mode
Transcription Type: Select "Local"
Model: Choose from the 6 available models
Important: You must download the model before using it (see Model Management below)
Language: Select input language (if supported by model)
Custom Prompt: Add formatting instructions (processed after transcription)

Model Management

Download: Models must be downloaded before use
- Select a model in mode settings
- Click the download button and wait for the model to finish downloading
- Download times vary depending on model size and internet speed
Unload Timeout: Configure how long models stay in memory after use (Settings → General)
Storage: Models are stored locally and require disk space (varies by model size, ~100MB to ~1.5GB)

Privacy Note: Local modes never send audio to the cloud. All processing happens on your device.

Custom Provider Modes Premium

Use your own API keys with third-party transcription providers.

Supported Providers

OpenAI

Models: whisper-1, gpt-4o-transcribe, gpt-4o-mini-transcribe
Setup: Enter your OpenAI API key in Settings → Providers
Use Case: High-quality transcription with OpenAI's latest models

Mistral

Models: voxtral-small-latest, voxtral-mini-latest
Setup: Enter your Mistral API key in Settings → Providers
Use Case: European alternative with competitive pricing

Deepgram

Models: nova-3, nova-2
Setup: Enter your Deepgram API key in Settings → Providers
Use Case: Fast, accurate transcription with real-time capabilities

Custom OpenAI-Compatible

Setup: Enter custom API endpoint URL and API key
Use Case: Self-hosted or other OpenAI-compatible services

Configuration Options

Name: Descriptive name for the mode
Transcription Type: Select "Custom Provider"
Provider: Choose from configured providers (OpenAI, Mistral, Deepgram, Custom)
Model: Select the model for the chosen provider
Language: Input language (if supported by provider)
Custom Prompt: Formatting instructions

Mode Management

Creating Modes

Open Settings → Modes tab
Click "Add Mode"
Choose transcription type (Cloud, Local, or Custom Provider)
Configure mode settings
Save the mode

Editing Modes

Click on any existing mode in the Modes tab to edit its settings
Changes are saved automatically or when you click "Save"

Reordering Modes

Drag and drop modes in the Modes tab to reorder them
The order affects how modes appear in the mode selector

Deleting Modes

Click the delete/trash icon next to a mode
Confirm deletion when prompted
Note: You cannot delete the last remaining mode

Example Mode Configurations

Email Draft (Cloud)

Type: Cloud
Name: "Email Draft"
Input Language: English
Custom Prompt: "Format as professional email with proper greeting, body, and closing"

Quick Notes (Cloud)

Type: Cloud
Name: "Quick Notes"
Input Language: English
Output Language: English
Custom Prompt: "Format as bullet points with key ideas"

Translation (Cloud)

Type: Cloud
Name: "English to Spanish"
Input Language: English
Output Language: Spanish
Custom Prompt: "Translate accurately while maintaining natural flow"

Private Notes (Local) macOS

Type: Local
Name: "Private Notes"
Model: Standard English (Free)
Custom Prompt: "Format as concise notes"
Use Case: Offline, private note-taking

Custom API Transcription Premium

Type: Custom Provider
Name: "OpenAI Whisper"
Provider: OpenAI
Model: whisper-1
Custom Prompt: "Clean transcription only"

Configuration and Settings

Access all settings by right-clicking the tray icon → "Settings"

General Configuration

App Language

Change the interface language for menus and settings
Located in Settings → General tab
Requires app restart to take effect

Hotkey Configuration

Select from platform-specific preset hotkeys
Visual keyboard configurator shows your selection
See Hotkeys section for all available options
Test your hotkey immediately after selection

Recording Window Behavior

Choose how the recording window behaves after processing:

Auto-detect (recommended): Closes if text was successfully pasted to your application, stays open if not
Always close: Window closes immediately after processing completes
Always open: Window remains open for you to review the transcription

Model Unload Timeout macOS only

Configure how long local models stay loaded in memory
Options: 5 minutes, 15 minutes, 30 minutes, 1 hour, Never unload
Shorter timeouts save memory, longer timeouts improve performance for frequent use
Only applies to local transcription models

Audio Settings

Input Device Selection

Choose your preferred microphone from the dropdown menu
Click "Refresh" to detect newly connected devices
Test with the real-time audio visualization to confirm selection

Auto-Maximize Input Volume

Toggle to automatically maximize microphone input level
Helps ensure consistent recording volume
Platform-specific implementation:
- macOS: Uses native volume control
- Windows: Uses Windows audio API
May not work with all audio devices

Account Settings

Subscription Status

View your current subscription tier (Free or Premium)
Check premium trial status and remaining time
See feature access (local paid models, custom providers)

Premium Trial

New users get a 25-minute trial of premium features
Trial countdown visible in account settings
Access to paid local models and custom providers during trial

Sign Out

Click "Sign Out" to disconnect your account
Useful for switching accounts or troubleshooting
Will require re-authentication on next launch

Platform-Specific Features

macOS Features

Permissions

Microphone Permission: Required for audio recording
Grant in: System Preferences → Security & Privacy → Privacy → Microphone
Accessibility Permission: Required for global hotkeys and text insertion
Grant in: System Preferences → Security & Privacy → Privacy → Accessibility
The app will prompt you to grant these permissions during onboarding

Local Transcription (Apple Silicon Only)

Available on Macs with M1, M2, M3, or newer Apple Silicon chips
Uses on-device Whisper models for complete privacy
6 model options: 2 free, 4 premium
Works completely offline—no internet required (after downloading models)
Models must be downloaded before first use and are stored locally
Configurable memory management with unload timeout

Function Key Support

Hotkeys can use the Fn (Function) key
Three Fn-based presets available: Fn+Control, Fn+Shift, Fn+Escape
Useful if other modifier combinations conflict with system shortcuts

Windows Features

No Permission Prompts

Windows handles microphone permissions automatically
No accessibility permissions needed
App works out of the box after installation
No system preferences configuration required

Platform Comparison

Feature	macOS	Windows
Cloud Transcription	✓ Available	✓ Available
Local Transcription	✓ Apple Silicon only	✗ Not available
Custom Providers	✓ Premium	✓ Premium
Permissions Required	✓ Microphone + Accessibility	✗ None (automatic)
Default Hotkey	Fn+Control	Ctrl+`
Function Key Support	✓ Fn key available	✗ Not available

Subscription and Premium Features

Free Tier

Included at no cost:

Cloud Transcription: Unlimited use of SayToType's cloud transcription service
Cloud Modes: Create unlimited cloud-based modes with custom languages and prompts
Free Local Models macOS Apple Silicon: Access to 2 free on-device models
- Standard (multilingual)
- Standard English
All Core Features: Hotkey customization, audio settings, mode management

Premium Features

Unlock with a premium subscription:

Premium Local Models macOS Apple Silicon: Access to 4 premium on-device models
- Pro (highest accuracy, best quality)
- Pro Light (high accuracy, faster)
- Ultra (maximum speed)
- Ultra Light (fastest processing)
Custom API Providers: Use your own API keys with:
- OpenAI (whisper-1, gpt-4o-transcribe, gpt-4o-mini-transcribe)
- Mistral (voxtral-small-latest, voxtral-mini-latest)
- Deepgram (nova-3, nova-2)
- Custom OpenAI-compatible endpoints
Provider Management: Full control over API keys and custom endpoints

Premium Trial

Duration: 25 minutes of premium feature access
Included: Try premium local models and custom providers before subscribing
One-time offer: Trial is available once per account

Managing Your Subscription

View subscription status in Settings → Account
Upgrade to Premium from Settings → Account
Cancel anytime—premium features remain active until end of billing period

Note: Premium subscription unlocks features on all your devices where you're signed in.

Tips for Best Results

Audio Quality

Use a quality microphone: External or headset mics generally perform better than built-in laptop microphones
Speak clearly: Enunciate at a normal, conversational pace—neither too fast nor too slow
Minimize background noise: Record in quiet environments when possible
Check audio levels: Watch the real-time visualization to ensure your voice is being captured
Enable auto-maximize volume: This feature helps maintain consistent recording levels
Test different devices: Try different microphones to find which works best for your setup

Language Settings

Set correct input language: This is critical for accuracy—always match your spoken language
Use translation feature wisely: Set different input/output languages for automatic translation
Create language-specific modes: Make separate modes for each language you use regularly
English-only models macOS: Use "Standard English" local model for better English-only accuracy

Custom Prompts

Be specific: "Format as bullet points with action items" is better than "make it nice"
Include formatting details: Specify structure, tone, style (e.g., "professional email format", "casual tone")
Test and iterate: Try different prompts to see what produces the best results for your use case
Keep it concise: Long prompts can dilute effectiveness—aim for 1-2 sentences
Provide examples: If needed, include a brief example of desired output format in your prompt

Mode Organization

Create specialized modes: Different modes for emails, notes, documentation, etc.
Use descriptive names: "Project Notes" is more helpful than "Mode 3"
Reorder frequently used modes: Drag your most-used modes to the top of the list
Test before important use: Try new modes with test recordings before relying on them
Leverage local modes for privacy macOS: Use local transcription for sensitive or confidential content

Choosing Transcription Type

Cloud (default): Best for general use, all languages, requires internet
Local macOS: Best for privacy, offline use, or slow internet connections
Tip: Download models in advance if you plan to use them offline
Custom Provider Premium: Best when you need specific provider features or want to control costs

Troubleshooting

Common Issues

No Audio Detected

Verify microphone is selected in Settings → Audio Settings
Check microphone permissions (macOS: System Preferences → Security & Privacy → Privacy → Microphone)
Test microphone in another application to confirm it's working
Click "Refresh" in audio device selection to detect newly connected devices
Try enabling "Auto-maximize input volume" in audio settings
Watch the audio visualization—if no waveform appears, your mic isn't capturing audio

Poor Transcription Quality

Check language setting: Ensure input language matches the language you're speaking
Improve audio quality: Reduce background noise, speak more clearly, check microphone positioning
Try different models macOS: Standard English may work better for English speech than multilingual Standard
Adjust custom prompts: Overly complex prompts can sometimes reduce accuracy
Consider cloud vs local: Cloud transcription may be more accurate for some languages/accents

Hotkey Not Working

macOS: Verify Accessibility permission is granted in System Preferences → Security & Privacy → Privacy → Accessibility
Check for conflicts: Another app might be using the same hotkey—try a different preset
Restart the app: Sometimes hotkey registration requires an app restart
Test alternative hotkeys: Try different presets from the Hotkeys section

Authentication Problems

Try signing out and signing back in via Settings → Account
Check your internet connection
Clear browser cookies if using OAuth sign-in
Verify your account is active and in good standing

Permission Issues (macOS)

Microphone Permission: System Preferences → Security & Privacy → Privacy → Microphone → Enable SayToType
Accessibility Permission: System Preferences → Security & Privacy → Privacy → Accessibility → Enable SayToType
Restart the app after granting permissions
Some macOS versions require unlocking the padlock to make changes

Model Download Failures macOS

Check your internet connection before starting download
Ensure sufficient disk space for model files (models vary from ~100MB to ~1.5GB)
If download fails, try again—temporary network issues can interrupt downloads
Check firewall settings aren't blocking the download
Remember: Models must be fully downloaded before you can use them for transcription

Platform-Specific Issues

macOS Troubleshooting

Local models not available: Requires Apple Silicon (M1/M2/M3)—not available on Intel Macs
App not in accessibility list: Manually add SayToType by clicking the "+" button in Accessibility preferences
Hotkey conflicts with system shortcuts: Try Fn-based presets (Fn+Control, Fn+Shift, Fn+Escape)

Windows Troubleshooting

Text not pasting: Ensure cursor is in a text-editable field when recording completes
Hotkey not registering: Check if another app is using the same global hotkey
Microphone access: Windows 10/11 may require app permissions in Settings → Privacy → Microphone

Getting Help

Verify all required permissions are granted for your platform
Restart the application after changing permissions or settings
Check Settings → Account to ensure you're signed in and subscription is active
Visit our Help & Support page for additional assistance
Contact support with specific error messages or behavior descriptions

Advanced Features

Auto-Translation Workflows

Set different input and output languages for instant translation
Perfect for multilingual communication, language learning, or international collaboration
Create dedicated translation modes for common language pairs
Example: Speak in English, get output in Spanish/French/Japanese
Works with cloud transcription (all platforms) and some custom providers

Custom Formatting with Prompts

Use detailed prompts to format output for specific purposes
Examples:
- Documentation: "Create markdown documentation with headings and sections"
- Structured notes: "Organize as bullet points with main ideas and key takeaways"
- Social media: "Write as engaging tweet under 280 characters"
Combine language translation with custom formatting for powerful workflows

Provider Management Premium

Configure multiple custom providers simultaneously
Store API keys securely in the app
Create different modes using different providers for different use cases
Switch between providers based on cost, accuracy, or feature requirements
Set up custom OpenAI-compatible endpoints for self-hosted solutions

Offline Operation macOS Apple Silicon

Download local models in advance for completely offline transcription
Models must be pre-downloaded while internet is available
Once downloaded, no internet required for transcription
Perfect for travel, secure environments, or unreliable connectivity
All processing happens on-device—complete data privacy

Workflow Integration Tips

Email drafting: Create mode with "professional email" prompt, speak naturally, get formatted email
Note taking: Use "structured notes" prompt to auto-organize your thoughts into bullet points
Multilingual content: Record in native language, auto-translate to target audience language

Note: Transcribed and processed text automatically appears at your cursor position in any application. The app intelligently pastes text only when successful transcription occurs.