Transcriptor

AI Meeting Documentation Platform

~15K-line vanilla JS meeting documentation engine. 3 STT providers with speaker diarization, 7 analysis presets, psychological profiling (communication styles, group dynamics, emotional analysis), WhatsApp-style retrospective chat, 4 TTS providers with smart chunking, image generation (Imagen 4 + FLUX), context documents/images, and 4 export formats.

Year

2025

Role

Full-Stack Developer

Tech Stack

7 technologies

The Challenge

Meeting documentation is tedious and inaccurate. Teams need more than a transcript — they need structured minutes, psychological analysis of group dynamics, actionable conclusions, and the ability to interrogate the meeting content afterward.

Raw transcripts are useless — without structure, decisions and action items get buried in text
Group dynamics invisible — communication patterns, latent conflicts and emotional undercurrents go unnoticed
No post-meeting reflection — once the meeting ends, there's no way to ask follow-up questions about what was discussed
Context gaps — meeting audio alone misses supporting documents, whiteboard photos and shared diagrams

The Approach

Build a zero-dependency meeting engine (~15K lines across 5 files) that combines 3 STT providers with LLM-powered analysis (Gemini 2.5) through 7 configurable presets — each preset shaping the acta structure, psychological lens and image style.

3 STT providers — Groq Whisper (fast, $0.04/hr, 99 languages), ElevenLabs Scribe (speaker diarization for 1-32 speakers), OpenAI Whisper (accurate fallback)
Rolling Summary — for long transcriptions (>30K chars), chunks text into segments processed iteratively maintaining context, then refines into final acta
Context enrichment — up to 5 documents (PDF/TXT/MD, pdf.js extraction) + 10 images (Gemini Vision analysis) injected into all AI generation pipelines
Persistence — IndexedDB for binary data (audio, docs) + localStorage auto-save every 5 seconds with full session restore

The Solution

Transcriptor processes meeting audio through a multi-stage pipeline — transcription → structured minutes → psychological analysis → conclusions → retrospective — with image generation and TTS at each stage:

7 presets — Corporativo (acta + dinámicas + plan de acción), Narrativo (historia + personajes + moraleja), Educativo (conceptos + ejemplos + takeaways), Motivacional, Creativo, Documental, Autoanálisis — each with 3 custom content blocks and image style
Psychological analysis — professional psychologist-psychiatrist profile: communication styles (assertive/passive/aggressive), group roles (leader/facilitator/critic), emotional tone mapping, latent conflicts, coaching recommendations, burnout/demotivation alerts
Conclusions module — structured JSON: strengths, areas for improvement, recommendations, action plan with timeframes (short/medium/long) and priorities (high/medium/low), final motivational reflection
Retrospective chat — WhatsApp-style AI chat with full session context (3K chars transcription + 2.5K summary + 2.5K analysis), voice input via Whisper, auto-TTS toggle, 10-message history
Image generation — Gemini Imagen 4 + FLUX.1, auto-mode extracts 3-8 key points, preset-styled prompts, acta + psychology categories, "Analyze Content" for optimized prompts
4 TTS providers — Browser (Web Speech API), gTTS (7 languages), MeloTTS (6 languages), ElevenLabs (custom cloned voices), smart chunking at paragraph/sentence/comma boundaries
4 export formats — Markdown, PDF (print-styled HTML), Interactive HTML Book (chapter-based with dark/light toggle, ToC), .perspectiva (full JSON with base64 audio/images)
Live recording — in-browser MediaRecorder with pause/resume, timer, format negotiation (webm/mp4/ogg), multi-file sequential processing

Key Results

~15K lines of vanilla JS (3.2x larger than originally claimed)
3 STT providers: Groq Whisper, ElevenLabs Scribe (diarization), OpenAI Whisper
7 configurable presets with 3 content blocks + image style each
Psychological profiling: communication styles, group dynamics, emotional analysis, coaching
Retrospective chat: WhatsApp-style with full session context and voice input
4 TTS providers with smart chunking and custom cloned voices
Image generation: Imagen 4 + FLUX with auto-prompt extraction
4 export formats: Markdown, PDF, Interactive HTML Book, .perspectiva
Context enrichment: 5 documents (PDF/TXT/MD) + 10 images (Gemini Vision)

Tech Stack

Vanilla JS (~15K lines) Gemini 2.5 ElevenLabs Scribe Groq Whisper OpenAI Whisper FLUX Imagen 4

$ cat project.json

{

"name": "Transcriptor",

"status": "production",

"stack": [7],

"results": [9]

}

Previous Project

Nexus

Next Project

Clinica23