The Speech Annotation Tool is a production-ready web application designed for data annotation teams to review and correct ASR (Automatic Speech Recognition) transcriptions efficiently and safely. Built for teams working with speech-to-text accuracy improvement, the platform combines intelligent background job processing, real-time progress tracking, browser-based session persistence, and smart row locking to deliver a reliable, responsive workflow that keeps annotators productive without blocking the UI.
The tool elegantly solves the transcription review bottleneck by supporting two distinct workflows: Review & Correct for teams with pre-chunked audio and existing Excel transcripts, and Auto-Transcribe for starting from scratch with raw audio files. Everything runs on a lightweight Flask stack with no external job queue required—threading-based background jobs, CSV-based persistence, and localStorage-based session tracking keep the entire system simple to deploy and maintain while remaining powerful enough for production use.
Data annotation teams working with audio transcriptions face several critical problems:
Audio Input → Format Detection → FFmpeg Conversion → Segmentation → Whisper Transcription → CSV Storage → Web UI → Export
The platform supports distinct workflows optimized for different data scenarios:
class JobManager:
"""
Singleton managing all background jobs with threading
Type-based locking prevents concurrent job conflicts
"""
def run_job_async(self, job_id, job_type, task_func, total_items):
# Check if another job of this type is running
can_start, active_id = self.can_start_job(job_type)
if not can_start:
return {'error': 'Job already running',
'active_job_id': active_id}
# Create job info
job_info = self.create_job(job_id, job_type, total_items)
# Spawn daemon thread
def wrapper():
try:
self.start_job(job_id)
result = task_func(job_id, self)
self.complete_job(job_id, result)
except Exception as e:
self.fail_job(job_id, str(e))
thread = Thread(target=wrapper, daemon=True)
thread.start()
return job_info
def update_progress(self, job_id, processed, total):
"""Called by task to report progress"""
self._jobs[job_id]['processed_items'] = processed
self._jobs[job_id]['progress'] = int((processed / total) * 100)
self._persist_jobs() # Save to JSON
def auto_transcribe_workflow(job_id, job_manager):
"""
Complete auto-transcribe pipeline:
Folder → Audio Files → FFmpeg Convert → Segment → Whisper → CSV
"""
audio_files = list(iter_audio_files(folder_path))
job_manager.update_progress(job_id, 0, len(audio_files))
for idx, audio_file in enumerate(audio_files):
# Step 1: Convert to standardized WAV
wav_path = convert_to_wav(audio_file, job_id)
# Step 2: Segment into 30-second chunks
segments = segment_audio(wav_path, job_id, segment_seconds=30)
# Step 3: Transcribe each segment
model = load_model(model_name) # LRU cached
for segment in segments:
transcript = transcribe_file(model, segment)
# Step 4: Store in CSV
append_record({
'filename': segment,
'transcription': transcript,
'job_id': job_id
})
# Step 5: Report progress
job_manager.update_progress(job_id, idx + 1, len(audio_files))
class CorrectionTracker {
/**
* Manages localStorage-based correction history
* Tracks which records were edited and what changes made
*/
constructor() {
this.storageKey = 'asr_corrections_tracker';
this.tracker = this.loadFromStorage();
}
markCorrected(recordId, originalText, correctedText) {
this.tracker[recordId] = {
corrected: true,
originalText,
correctedText,
timestamp: new Date().toISOString()
};
this.saveToStorage();
}
getStats() {
const total = Object.keys(this.tracker).length;
const corrected = Object.values(this.tracker)
.filter(r => r.corrected).length;
return { total, corrected };
}
}
| Workflow | Use Case | Input | Processing |
|---|---|---|---|
| Review & Correct | Teams with pre-chunked audio | Audio folder + Excel file | Direct table load, inline edit, save |
| Auto-Transcribe ⭐ | Starting from raw audio | Audio folder (any format) | Convert → Segment → Transcribe → Show results |
Watch how the tool handles background transcription jobs, live progress updates, and responsive UI interactions without blocking the interface!
The system is deployment-ready with simple configurations for multiple platforms. No Celery, Redis, or message brokers required—just Python, Flask, and FFmpeg.
# Clone and setup
git clone https://github.com/inboxpraveen/Speech-Annotation-Tool.git
cd Speech-Annotation-Tool
# Create virtual environment
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Run application
python app.py
# Open http://localhost:5000
Challenge 1: Non-Blocking Background Jobs Without External Infrastructure
Transcribing hundreds of files would block the web interface and exhaust memory.
Solution: Implemented Python threading with daemon threads and a singleton JobManager class.
Jobs run independently while the Flask server remains responsive. Type-based locking prevents concurrent conflicts.
Progress persisted to JSON, so status survives restarts.
Challenge 2: Audio Format Chaos
Audio files arrive in dozens of formats (MP3, M4A, OPUS, FLAC, etc.) with varying sample rates and channels.
Solution: Integrated FFmpeg for universal format conversion with intelligent preprocessing.
All audio automatically converted to standardized 16kHz mono WAV before Whisper processing.
FFmpeg streaming prevents memory overflow on large files.
Challenge 3: Session State Without Server Overhead
Tracking which of hundreds of records were corrected would require server sessions.
Solution: Leveraged browser localStorage to track corrections client-side.
CorrectionTracker class stores correction metadata with timestamps. Survives browser refresh,
restart, and network issues without server-side complexity.
Challenge 4: UI Responsiveness During Long Operations
Users need real-time feedback on job progress without page reloads blocking interaction.
Solution: Smart polling strategy with 2-second refresh cadence. Frontend polls job status
via API while UI remains interactive. Progress banner auto-appears and dismisses intelligently.
Silent table refreshes don't interrupt user workflow.
Challenge 5: Data Safety and Accident Prevention
Corrected transcripts must be protected from accidental overwrites by team members.
Solution: Implemented row-level locking with timestamp tracking. Once locked, rows become
read-only and visually highlighted. Lock state persisted in CSV with optional unlock capability.
Audit trail tracking enables debugging of accidental changes.
This project deliberately avoids external job queues like Celery. Threading provides:
For distributed processing across multiple servers, Celery migration path documented in PROJECT_DOCUMENTATION.md.