Back to Blog
AI & LLMs November 17, 2025 · 8 min read

Transcriptor: From a Meeting Recording to Structured Minutes, Psychological Analysis, and an AI Retrospective Chat

GM

Gonzalo Monzón

Founder & Lead Architect

What happens after a meeting ends? Usually, nothing. Maybe someone sends a partial summary. Action items get forgotten. That one crucial decision gets attributed to the wrong person a week later. We built Transcriptor to fix that — a tool that takes a raw audio recording and outputs structured minutes, per-participant psychological analysis, a retrospective AI chat, summary images, and narrated conclusions. All powered by ElevenLabs Scribe, Gemini 2.5, and FLUX, running in 4,500 lines of zero-dependency vanilla JavaScript.

The Pipeline: Audio In, Documentation Out

Transcriptor runs a 7-stage pipeline on every meeting recording:

StageWhat It DoesPowered By
1. TranscriptionSpeech-to-text with speaker diarization — who said what, whenElevenLabs Scribe
2. Structured MinutesFormal meeting minutes: attendees, topics, decisions, action items with owners and deadlinesGemini 2.5
3. Psychological AnalysisCommunication style, participation level, emotional tone, and influence analysis per participantGemini 2.5
4. Conclusions ModuleKey takeaways, identified risks, detected opportunitiesGemini 2.5
5. Retrospective ChatWhatsApp-style interface to ask anything about the meeting — AI responds with full contextGemini 2.5
6. Summary ImageAI-generated visual representation of meeting key pointsFLUX
7. TTS NarrationAudio narration of the executive summary with smart chunking for long textsElevenLabs

Upload an audio file. Get a complete meeting documentation package. Every stage feeds into the next — the transcription feeds the minutes, the minutes feed the analysis, everything feeds the retrospective chat.

Stage 1: Transcription with Speaker Diarization

Getting words from audio is the easy part. Knowing who said them — that's the hard part. Transcriptor uses ElevenLabs Scribe as the primary STT engine because it handles diarization natively:

  • Speaker identification — each segment is tagged with a speaker ID (Speaker 1, Speaker 2, etc.)
  • Timestamps per segment — precise timing for every utterance
  • Multi-language — Spanish, English, and more
  • Whisper fallback — OpenAI Whisper handles languages or edge cases Scribe doesn't support

The diarization quality matters because everything downstream depends on it. When the psychological analysis says "Speaker 2 dominated the conversation," it needs to actually be Speaker 2.

Stage 2: Structured Minutes via AI

Gemini 2.5 takes the full diarized transcription and outputs formal meeting minutes:

  • Attendees — identified from speech patterns and mentions, with participation level
  • Topics discussed — automatically grouped and numbered
  • Decisions made — extracted from conversational context ("So we agreed to...")
  • Action items — task, responsible person, deadline — pulled from natural conversation
  • Next steps — follow-up meeting dates, pending reviews

The minutes are editable. If the AI attributes a decision to the wrong person, you fix it inline. But in practice, with good diarization, the accuracy is surprisingly high.

Stage 3: Psychological Analysis

This is the feature that makes team leads lean forward. For each participant, Gemini analyzes:

DimensionWhat's Measured
Communication StyleDirect, collaborative, passive, dominant — how does this person express ideas?
Participation Level% of speaking time, frequency of interventions, initiative vs. reactive
Emotional TonePositive, neutral, negative, anxious, enthusiastic — per topic and overall
InteractionsWho responds to whom, alliances, tensions, who gets interrupted
InfluenceWho generates most agreement/disagreement, who shifts the conversation direction

The psychological module doesn't diagnose — it surfaces patterns. A manager might discover that a quieter team member actually introduces the ideas that get adopted, they just don't fight for credit. Or that two people consistently talk past each other on project timeline topics. These patterns are invisible in real-time but become obvious when the AI maps them out.

Stage 4: The Retrospective Chat

A WhatsApp-style chat interface where you can ask anything about the meeting after it's over:

  • "What did María say about the budget?"
  • "Was any decision made about the launch?"
  • "Who proposed the partnership idea?"
  • "Summarize the 3 most important points"
  • "What was the emotional tone during the timeline discussion?"

The AI has full context: the raw transcription, the structured minutes, and the psychological analysis. So it can answer both factual questions ("What was decided?") and analytical ones ("Was there tension between Juan and María?"). It's like having a perfect memory of every meeting you've ever had.

Stages 5-6: Summary Image & TTS Narration

Two output formats for different consumption styles:

  • Summary image — FLUX generates a visual infographic-style representation of the meeting's key points. Useful for sharing in Slack or embedding in documentation
  • TTS narration — ElevenLabs narrates the executive summary with a professional voice. Smart chunking splits long summaries into manageable audio segments. Downloadable as MP3 for commute listening

The Interface: 7 Panels

PanelFunction
UploadDrop or select audio recording
TranscriptionTimeline view with speakers color-coded
MinutesStructured document — editable
AnalysisCards per participant with metrics and insights
ChatWhatsApp-style retrospective interface
ImageAI-generated summary visual
AudioTTS narration player with download

Technical Details

MetricValue
Codebase4,500+ lines of vanilla JavaScript — zero framework dependencies
STT Providers2 — ElevenLabs Scribe (primary), OpenAI Whisper (fallback)
AI for MinutesGemini 2.5 — structured output via function calling
Image GenerationFLUX via Workers AI
TTSElevenLabs — professional voice, MP3 output
DependenciesZero — vanilla JS + CSS only

Key Takeaways

1. Diarization is the foundation everything else depends on. Speaker identification quality determines the accuracy of minutes, psychological analysis, and retrospective answers. ElevenLabs Scribe's native diarization was the breakthrough — previous attempts with Whisper-only pipelines required a separate diarization step that introduced errors.

2. Psychological analysis from meeting transcripts surfaces invisible team dynamics. Managers are often surprised by what the analysis reveals: who actually introduces the ideas that get adopted, who dominates unproductively, which topics create tension. These patterns are invisible in real-time but obvious when mapped by AI.

3. Retrospective chat turns meetings into searchable knowledge. The ability to ask "What did we decide about X three meetings ago?" and get an accurate answer transforms how teams track decisions. No more scrolling through Slack or searching email threads.

4. Multi-format output matches different consumption styles. Some people read minutes. Some prefer a visual summary to share. Some listen to the audio narration during their commute. Generating all formats automatically means the meeting documentation actually gets consumed.

5. 4,500 lines of vanilla JS proves frameworks aren't always the answer. No React, no Vue, no build step. The entire tool is vanilla JavaScript and CSS. For an internal tool with well-defined scope, framework overhead would add complexity without proportional benefit. Fast to build, fast to iterate, zero dependency maintenance.

Tags

Meeting AI Transcription Diarization Team Analytics Vanilla JS ElevenLabs

About the Author

Gonzalo Monzón

Gonzalo Monzón

Founder & Lead Architect

Gonzalo Monzón is a Senior Solutions Architect & AI Engineer with over 26 years building mission-critical systems in Healthcare, Industrial Automation, and enterprise AI. Founder of Cadences Lab, he specializes in bridging legacy infrastructure with cutting-edge technology.

Stay in the loop

Get notified when we publish new articles about AI automation, use cases, and practical guides.