Transcriptor
AI Meeting Documentation Platform
~15K-line vanilla JS meeting documentation engine. 3 STT providers with speaker diarization, 7 analysis presets, psychological profiling (communication styles, group dynamics, emotional analysis), WhatsApp-style retrospective chat, 4 TTS providers with smart chunking, image generation (Imagen 4 + FLUX), context documents/images, and 4 export formats.
Year
2025
Role
Full-Stack Developer
Tech Stack
7 technologies
The Challenge
Meeting documentation is tedious and inaccurate. Teams need more than a transcript — they need structured minutes, psychological analysis of group dynamics, actionable conclusions, and the ability to interrogate the meeting content afterward.
- Raw transcripts are useless — without structure, decisions and action items get buried in text
- Group dynamics invisible — communication patterns, latent conflicts and emotional undercurrents go unnoticed
- No post-meeting reflection — once the meeting ends, there's no way to ask follow-up questions about what was discussed
- Context gaps — meeting audio alone misses supporting documents, whiteboard photos and shared diagrams
The Approach
Build a zero-dependency meeting engine (~15K lines across 5 files) that combines 3 STT providers with LLM-powered analysis (Gemini 2.5) through 7 configurable presets — each preset shaping the acta structure, psychological lens and image style.
- 3 STT providers — Groq Whisper (fast, $0.04/hr, 99 languages), ElevenLabs Scribe (speaker diarization for 1-32 speakers), OpenAI Whisper (accurate fallback)
- Rolling Summary — for long transcriptions (>30K chars), chunks text into segments processed iteratively maintaining context, then refines into final acta
- Context enrichment — up to 5 documents (PDF/TXT/MD, pdf.js extraction) + 10 images (Gemini Vision analysis) injected into all AI generation pipelines
- Persistence — IndexedDB for binary data (audio, docs) + localStorage auto-save every 5 seconds with full session restore
The Solution
Transcriptor processes meeting audio through a multi-stage pipeline — transcription → structured minutes → psychological analysis → conclusions → retrospective — with image generation and TTS at each stage:
- 7 presets — Corporativo (acta + dinámicas + plan de acción), Narrativo (historia + personajes + moraleja), Educativo (conceptos + ejemplos + takeaways), Motivacional, Creativo, Documental, Autoanálisis — each with 3 custom content blocks and image style
- Psychological analysis — professional psychologist-psychiatrist profile: communication styles (assertive/passive/aggressive), group roles (leader/facilitator/critic), emotional tone mapping, latent conflicts, coaching recommendations, burnout/demotivation alerts
- Conclusions module — structured JSON: strengths, areas for improvement, recommendations, action plan with timeframes (short/medium/long) and priorities (high/medium/low), final motivational reflection
- Retrospective chat — WhatsApp-style AI chat with full session context (3K chars transcription + 2.5K summary + 2.5K analysis), voice input via Whisper, auto-TTS toggle, 10-message history
- Image generation — Gemini Imagen 4 + FLUX.1, auto-mode extracts 3-8 key points, preset-styled prompts, acta + psychology categories, "Analyze Content" for optimized prompts
- 4 TTS providers — Browser (Web Speech API), gTTS (7 languages), MeloTTS (6 languages), ElevenLabs (custom cloned voices), smart chunking at paragraph/sentence/comma boundaries
- 4 export formats — Markdown, PDF (print-styled HTML), Interactive HTML Book (chapter-based with dark/light toggle, ToC), .perspectiva (full JSON with base64 audio/images)
- Live recording — in-browser MediaRecorder with pause/resume, timer, format negotiation (webm/mp4/ogg), multi-file sequential processing
Key Results
- ~15K lines of vanilla JS (3.2x larger than originally claimed)
- 3 STT providers: Groq Whisper, ElevenLabs Scribe (diarization), OpenAI Whisper
- 7 configurable presets with 3 content blocks + image style each
- Psychological profiling: communication styles, group dynamics, emotional analysis, coaching
- Retrospective chat: WhatsApp-style with full session context and voice input
- 4 TTS providers with smart chunking and custom cloned voices
- Image generation: Imagen 4 + FLUX with auto-prompt extraction
- 4 export formats: Markdown, PDF, Interactive HTML Book, .perspectiva
- Context enrichment: 5 documents (PDF/TXT/MD) + 10 images (Gemini Vision)