SaaS Products
2025

Transcriptor

AI Meeting Documentation Platform

~15K-line vanilla JS meeting documentation engine. 3 STT providers with speaker diarization, 7 analysis presets, psychological profiling (communication styles, group dynamics, emotional analysis), WhatsApp-style retrospective chat, 4 TTS providers with smart chunking, image generation (Imagen 4 + FLUX), context documents/images, and 4 export formats.

Year

2025

Role

Full-Stack Developer

Tech Stack

7 technologies

The Challenge

Meeting documentation is tedious and inaccurate. Teams need more than a transcript — they need structured minutes, psychological analysis of group dynamics, actionable conclusions, and the ability to interrogate the meeting content afterward.

  • Raw transcripts are useless — without structure, decisions and action items get buried in text
  • Group dynamics invisible — communication patterns, latent conflicts and emotional undercurrents go unnoticed
  • No post-meeting reflection — once the meeting ends, there's no way to ask follow-up questions about what was discussed
  • Context gaps — meeting audio alone misses supporting documents, whiteboard photos and shared diagrams

The Approach

Build a zero-dependency meeting engine (~15K lines across 5 files) that combines 3 STT providers with LLM-powered analysis (Gemini 2.5) through 7 configurable presets — each preset shaping the acta structure, psychological lens and image style.

  • 3 STT providers — Groq Whisper (fast, $0.04/hr, 99 languages), ElevenLabs Scribe (speaker diarization for 1-32 speakers), OpenAI Whisper (accurate fallback)
  • Rolling Summary — for long transcriptions (>30K chars), chunks text into segments processed iteratively maintaining context, then refines into final acta
  • Context enrichment — up to 5 documents (PDF/TXT/MD, pdf.js extraction) + 10 images (Gemini Vision analysis) injected into all AI generation pipelines
  • Persistence — IndexedDB for binary data (audio, docs) + localStorage auto-save every 5 seconds with full session restore

The Solution

Transcriptor processes meeting audio through a multi-stage pipeline — transcription → structured minutes → psychological analysis → conclusions → retrospective — with image generation and TTS at each stage:

  • 7 presets — Corporativo (acta + dinámicas + plan de acción), Narrativo (historia + personajes + moraleja), Educativo (conceptos + ejemplos + takeaways), Motivacional, Creativo, Documental, Autoanálisis — each with 3 custom content blocks and image style
  • Psychological analysis — professional psychologist-psychiatrist profile: communication styles (assertive/passive/aggressive), group roles (leader/facilitator/critic), emotional tone mapping, latent conflicts, coaching recommendations, burnout/demotivation alerts
  • Conclusions module — structured JSON: strengths, areas for improvement, recommendations, action plan with timeframes (short/medium/long) and priorities (high/medium/low), final motivational reflection
  • Retrospective chat — WhatsApp-style AI chat with full session context (3K chars transcription + 2.5K summary + 2.5K analysis), voice input via Whisper, auto-TTS toggle, 10-message history
  • Image generation — Gemini Imagen 4 + FLUX.1, auto-mode extracts 3-8 key points, preset-styled prompts, acta + psychology categories, "Analyze Content" for optimized prompts
  • 4 TTS providers — Browser (Web Speech API), gTTS (7 languages), MeloTTS (6 languages), ElevenLabs (custom cloned voices), smart chunking at paragraph/sentence/comma boundaries
  • 4 export formats — Markdown, PDF (print-styled HTML), Interactive HTML Book (chapter-based with dark/light toggle, ToC), .perspectiva (full JSON with base64 audio/images)
  • Live recording — in-browser MediaRecorder with pause/resume, timer, format negotiation (webm/mp4/ogg), multi-file sequential processing

Key Results

  • ~15K lines of vanilla JS (3.2x larger than originally claimed)
  • 3 STT providers: Groq Whisper, ElevenLabs Scribe (diarization), OpenAI Whisper
  • 7 configurable presets with 3 content blocks + image style each
  • Psychological profiling: communication styles, group dynamics, emotional analysis, coaching
  • Retrospective chat: WhatsApp-style with full session context and voice input
  • 4 TTS providers with smart chunking and custom cloned voices
  • Image generation: Imagen 4 + FLUX with auto-prompt extraction
  • 4 export formats: Markdown, PDF, Interactive HTML Book, .perspectiva
  • Context enrichment: 5 documents (PDF/TXT/MD) + 10 images (Gemini Vision)

Tech Stack

Vanilla JS (~15K lines) Gemini 2.5 ElevenLabs Scribe Groq Whisper OpenAI Whisper FLUX Imagen 4
$ cat project.json
{
"name": "Transcriptor",
"status": "production",
"stack": [7],
"results": [9]
}