Преглед изворни кода

feat: voiceover narration pipeline · L2 长解说视频能力

新增「先有解说词,再按音频实测时长驱动画面」的端到端 pipeline,
适用 5-20 分钟概念解说视频/教程视频/长篇知识科普。

核心铁律:整片是一个连续的运动叙事,不是一组独立场景。
失败模式 #1:每个 scene 各自独立 layout + cue 用 fade-up + scene 切换
整页 opacity 切换 = 带配音的 PowerPoint = 质感归零。

新增脚本(4 个):
  - scripts/tts-doubao.mjs          豆包 openspeech TTS(单段文本→mp3+实测时长)
  - scripts/narrate-pipeline.mjs    L2 总指挥(解说稿 .md→voiceover.mp3+timeline.json)
  - scripts/mix-voiceover.sh        ffmpeg 混音(视频+人声+可选 BGM with ducking)
  - scripts/render-narration.sh     一条龙(HTML 录制+混音→最终发布 MP4)

新增 asset:
  - assets/narration_stage.jsx      NarrationStage + Scene + Cue + useNarration
                                    + useSceneFade + Subtitles + splitChunkToLines
                                    顶部 ASCII 边框警示铁律

新增 reference:
  - references/voiceover-pipeline.md  铁律 4 条(连续叙事/禁硬切/每帧有运动/easing+stagger+hold)
                                       + 解说稿格式 + timeline schema(含 chunks)
                                       + NarrationStage API + 字幕规则(B 站风+≤12 字+不跨句号)
                                       + 标准工作流 10 步 + 异常处理

SKILL.md 6 处铁律覆盖:
  - description 加触发词 voiceover/narration/带解说的动画/5分钟讲清楚什么是 XX
  - 工作流新增 Step 9.5(带解说时走 voiceover-pipeline,4 子步骤含 🛑 检查点)
  - 反 AI slop 表新增「动画-PowerPoint 切换」一行
  - References 路由表新增「带解说的长动画」
  - 核心提醒末尾强调「这条规则强调多少遍都不为过」

字幕系统(B 站风):
  - 浅纸白底用深墨字 #1a1a1a + 多层白色光晕,无背景框
  - 32px 无衬线,bottom 90px(不贴边)
  - splitChunkToLines 算法:先按 。!? 切句、再按 ,、;: 合并到 ≤13 字
    中英混合英文按 0.5 字算,绝不跨句号截断
  - timeline.json 新增 chunks 字段(含每子段 absoluteStart/absoluteEnd),
    字幕严格按 TTS 实测时间显示,不是按字符数估算

Demo 实证:
  - demos/md-html-narration/  3 分 21 秒「md vs html」解说,7 段 21 cue
    用 md/html 两个字符做 hero 跨 scene 演戏,验证铁律可执行
  - demos/voiceover-demo/     30 秒「什么是 token」最小验证

darwin-skill 评估:从 80.8 → 86.3(+5.5),P0 优化 A1+A2 已落地

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
alchain пре 1 месец
родитељ
комит
934fee0ab4

+ 6 - 0
.env.example

@@ -0,0 +1,6 @@
+# 豆包语音 TTS(火山引擎 openspeech)
+# 申请地址:https://console.volcengine.com/speech
+DOUBAO_TTS_API_KEY=your_api_key_here
+DOUBAO_TTS_VOICE_ID=your_clone_voice_id_here
+DOUBAO_TTS_CLUSTER=volcano_icl
+DOUBAO_TTS_ENDPOINT=https://openspeech.bytedance.com/api/v1/tts

+ 8 - 0
.gitignore

@@ -9,6 +9,14 @@
 # Personal asset index(个人真实数据,只保留 .example.json 模板)
 assets/personal-asset-index.json
 
+# 环境变量(API key 等敏感信息,只保留 .env.example 模板)
+.env
+.env.local
+
+# Voiceover 工作目录(TTS mp3、timeline.json 临时产物,可重新生成)
+**/_narration/
+**/_narration_*/
+
 # Node / editor / OS
 node_modules/
 *.swp

Разлика између датотеке није приказан због своје велике величине
+ 0 - 1
SKILL.md


+ 470 - 0
assets/narration_stage.jsx

@@ -0,0 +1,470 @@
+/**
+ * narration_stage.jsx · 解说驱动 Stage
+ *
+ * ╔══════════════════════════════════════════════════════════════════╗
+ * ║  🛑 用这套工具之前必读:references/voiceover-pipeline.md         ║
+ * ║                                                                  ║
+ * ║  铁律 #1: 整片是一个连续的运动叙事,不是一组独立场景             ║
+ * ║          You are not making 7 slides. You are directing 1 movie. ║
+ * ║                                                                  ║
+ * ║  铁律 #2: 选定 hero element 跨 scene 持续存在,不要每段一个新布局║
+ * ║                                                                  ║
+ * ║  铁律 #3: scene 之间禁止硬切(opacity 1→0/0→1)                  ║
+ * ║          要 morph,不要 cut                                      ║
+ * ║                                                                  ║
+ * ║  失败模式 #1(本 skill v1 实战踩坑):                           ║
+ * ║          每个 Scene 各自独立 layout + cue 用 fade-up + scene 切换║
+ * ║          整页 opacity 切换 = 带配音的 PowerPoint = 质感归零       ║
+ * ║                                                                  ║
+ * ║  正确做法:把 hero 直接放在 <NarrationStage> 子级(不进 Scene)  ║
+ * ║          用 useNarration() 在 hero 里读 time/scene/cue 状态      ║
+ * ║          hero 自己根据当前时间决定形态 → 跨 scene 连续运动       ║
+ * ╚══════════════════════════════════════════════════════════════════╝
+ *
+ * 用法(inline 进 HTML 的 <script type="text/babel">):
+ *   const { NarrationStage, Scene, Cue, useNarration } = NarrationStageLib;
+ *
+ *   const App = () => (
+ *     <NarrationStage timeline={TIMELINE} audioSrc="voiceover.mp3"
+ *                     width={1920} height={1080}>
+ *       <Scene id="intro">
+ *         <h1>什么是 token</h1>
+ *         <Cue id="question">
+ *           {(triggered) => triggered && <p>↑ 这是问题</p>}
+ *         </Cue>
+ *       </Scene>
+ *       <Scene id="token-2">
+ *         <Cue id="split">
+ *           {(triggered, progress) => (
+ *             <div style={{opacity: triggered ? 1 : 0.3}}>...</div>
+ *           )}
+ *         </Cue>
+ *       </Scene>
+ *     </NarrationStage>
+ *   );
+ *
+ * 时间源(自动二选一):
+ *   - 录视频模式(window.__recording === true):走 window.__time(外部 driver 推帧)
+ *   - 实播模式:走 <audio> 的 currentTime(用户点播放时和音频严格同步)
+ *
+ * 与 render-video.js 兼容:
+ *   - tick 第一帧设 window.__ready = true
+ *   - 录视频时检测 window.__recording 强制不播 audio、用 window.__time
+ *   - 暴露 window.__totalDuration 给 driver 算总帧数
+ *
+ * 依赖:React 18 + ReactDOM 18 + Babel standalone(同 animations.jsx)
+ */
+
+const NarrationStageLib = (() => {
+  const NarrationContext = React.createContext({
+    time: 0,
+    scene: null,
+    sceneTime: 0,
+    isCueTriggered: () => false,
+    cueProgress: () => 0,
+  });
+
+  /**
+   * 主组件:吃 timeline + audio,提供 context
+   *
+   * Props:
+   *   timeline       timeline.json 对象(必需)
+   *   audioSrc       voiceover.mp3 路径(必需)
+   *   width/height   Stage 尺寸,默认 1920x1080
+   *   background     默认 '#0e0e0e'
+   *   controls       是否显示底部播放条,默认 true
+   *   children       动画内容(用 <Scene>/<Cue> 组织)
+   */
+  function NarrationStage({
+    timeline,
+    audioSrc,
+    width = 1920,
+    height = 1080,
+    background = '#0e0e0e',
+    controls = true,
+    children,
+  }) {
+    const audioRef = React.useRef(null);
+    const [time, setTime] = React.useState(0);
+    const [playing, setPlaying] = React.useState(false);
+    const recording = typeof window !== 'undefined' && window.__recording === true;
+
+    // 暴露给 render-video.js
+    React.useEffect(() => {
+      if (typeof window === 'undefined') return;
+      window.__totalDuration = timeline.totalDuration;
+      window.__ready = true;
+    }, [timeline.totalDuration]);
+
+    // 时间 tick
+    React.useEffect(() => {
+      let raf;
+      if (recording) {
+        // 录视频模式:rAF wall-clock 自驱动从 0 开始
+        // 兼容 render-video.js(它依赖动画自然推进 + window.__seek 复位)
+        let startedAt = null;
+        const tick = (now) => {
+          if (startedAt === null) startedAt = now;
+          setTime(Math.min((now - startedAt) / 1000, timeline.totalDuration));
+          raf = requestAnimationFrame(tick);
+        };
+        raf = requestAnimationFrame(tick);
+        // 暴露 __seek 给 render-video.js 在 ready 后调 __seek(0) 复位
+        if (typeof window !== 'undefined') {
+          window.__seek = (t) => {
+            startedAt = performance.now() - t * 1000;
+            setTime(t);
+          };
+        }
+      } else {
+        // 实播模式:跟随 audio.currentTime
+        const tick = () => {
+          if (audioRef.current && !audioRef.current.paused) {
+            setTime(audioRef.current.currentTime);
+          }
+          raf = requestAnimationFrame(tick);
+        };
+        tick();
+      }
+      return () => cancelAnimationFrame(raf);
+    }, [recording, timeline.totalDuration]);
+
+    // 当前 scene
+    const currentScene = React.useMemo(() => {
+      if (!timeline.scenes) return null;
+      // 找到 start <= time < end 的段。最后一段保留到 end
+      for (let i = 0; i < timeline.scenes.length; i++) {
+        const s = timeline.scenes[i];
+        const next = timeline.scenes[i + 1];
+        if (time >= s.start && (!next || time < next.start)) return s;
+      }
+      return timeline.scenes[0];
+    }, [time, timeline.scenes]);
+
+    const sceneTime = currentScene ? Math.max(0, time - currentScene.start) : 0;
+
+    // 找 cue 状态(按 absoluteTime 比较,跨 scene 也能查)
+    const allCues = React.useMemo(() => {
+      const map = {};
+      for (const s of timeline.scenes || []) {
+        for (const c of s.cues || []) {
+          map[c.id] = c;
+        }
+      }
+      return map;
+    }, [timeline.scenes]);
+
+    const isCueTriggered = React.useCallback(
+      (cueId) => {
+        const c = allCues[cueId];
+        if (!c) return false;
+        return time >= c.absoluteTime;
+      },
+      [allCues, time],
+    );
+
+    /** 触发后多少秒 0→1,>1 后保持 1。用于 cue 后做渐入动画 */
+    const cueProgress = React.useCallback(
+      (cueId, ramp = 0.5) => {
+        const c = allCues[cueId];
+        if (!c) return 0;
+        const dt = time - c.absoluteTime;
+        if (dt <= 0) return 0;
+        if (dt >= ramp) return 1;
+        return dt / ramp;
+      },
+      [allCues, time],
+    );
+
+    const ctx = { time, scene: currentScene, sceneTime, isCueTriggered, cueProgress, timeline };
+
+    // play/pause/seek 控制
+    const handlePlayPause = () => {
+      if (!audioRef.current) return;
+      if (audioRef.current.paused) {
+        audioRef.current.play();
+        setPlaying(true);
+      } else {
+        audioRef.current.pause();
+        setPlaying(false);
+      }
+    };
+
+    const handleSeek = (e) => {
+      if (!audioRef.current) return;
+      const t = parseFloat(e.target.value);
+      audioRef.current.currentTime = t;
+      setTime(t);
+    };
+
+    const handleAudioEnded = () => setPlaying(false);
+
+    return (
+      <NarrationContext.Provider value={ctx}>
+        <div
+          style={{
+            position: 'relative',
+            width,
+            height,
+            background,
+            overflow: 'hidden',
+            color: '#fff',
+            fontFamily: '-apple-system, BlinkMacSystemFont, "PingFang SC", sans-serif',
+          }}
+        >
+          {children}
+        </div>
+        {!recording && (
+          <audio
+            ref={audioRef}
+            src={audioSrc}
+            preload="auto"
+            onEnded={handleAudioEnded}
+          />
+        )}
+        {!recording && controls && (
+          <div
+            style={{
+              display: 'flex',
+              alignItems: 'center',
+              gap: 12,
+              padding: '12px 16px',
+              background: '#1a1a1a',
+              color: '#ddd',
+              fontFamily: 'monospace',
+              fontSize: 13,
+              width,
+              boxSizing: 'border-box',
+            }}
+          >
+            <button
+              onClick={handlePlayPause}
+              style={{
+                padding: '6px 14px',
+                background: '#fff',
+                color: '#000',
+                border: 0,
+                borderRadius: 4,
+                cursor: 'pointer',
+                fontWeight: 600,
+              }}
+            >
+              {playing ? '❚❚ Pause' : '▶ Play'}
+            </button>
+            <input
+              type="range"
+              min={0}
+              max={timeline.totalDuration}
+              step={0.01}
+              value={time}
+              onChange={handleSeek}
+              style={{ flex: 1 }}
+            />
+            <span style={{ minWidth: 110, textAlign: 'right' }}>
+              {time.toFixed(2)} / {timeline.totalDuration.toFixed(2)}s
+            </span>
+            <span
+              style={{
+                padding: '4px 10px',
+                background: '#2a2a2a',
+                borderRadius: 4,
+                minWidth: 100,
+                textAlign: 'center',
+              }}
+            >
+              {currentScene ? currentScene.id : '—'}
+            </span>
+          </div>
+        )}
+      </NarrationContext.Provider>
+    );
+  }
+
+  /**
+   * Scene 包裹器:只在指定 scene id 激活时渲染 children
+   *
+   * Props:
+   *   id        scene id(对应 timeline.scenes[].id)
+   *   children  渲染内容;可以是 ReactNode 或 (sceneTime, sceneInfo) => ReactNode
+   *   keepMounted 默认 false。设 true 则一直挂载只切换 visibility(动画连贯需要时用)
+   */
+  function Scene({ id, children, keepMounted = false }) {
+    const { scene, sceneTime } = React.useContext(NarrationContext);
+    const isActive = scene && scene.id === id;
+    if (!isActive && !keepMounted) return null;
+    const content = typeof children === 'function' ? children(sceneTime, scene) : children;
+    return (
+      <div
+        style={{
+          position: 'absolute',
+          inset: 0,
+          opacity: isActive ? 1 : 0,
+          pointerEvents: isActive ? 'auto' : 'none',
+          transition: keepMounted ? 'opacity 0.2s' : undefined,
+        }}
+      >
+        {content}
+      </div>
+    );
+  }
+
+  /**
+   * Cue 包裹器:监听 cue 触发状态
+   *
+   * Props:
+   *   id        cue id(对应 timeline.scenes[].cues[].id)
+   *   ramp      cue 触发后 progress 0→1 的 ramp 时长(秒),默认 0.5
+   *   children  必须是函数:(triggered: bool, progress: 0-1) => ReactNode
+   */
+  function Cue({ id, ramp = 0.5, children }) {
+    const { isCueTriggered, cueProgress } = React.useContext(NarrationContext);
+    const triggered = isCueTriggered(id);
+    const progress = cueProgress(id, ramp);
+    return children(triggered, progress);
+  }
+
+  /** Hook:在自定义组件里直接拿 narration 状态 */
+  function useNarration() {
+    return React.useContext(NarrationContext);
+  }
+
+  /**
+   * splitChunkToLines · 把一段文字按标点切成 ≤maxLen 字的短行
+   *
+   * 用于字幕显示——B 站标准是单行 ≤12 字便于阅读。本函数:
+   * 1. 先按强标点(。!?\n)切句,绝不跨句号截断
+   * 2. 每句 ≤ maxLen 直接用,否则按弱标点(,、;:)切片合并
+   * 3. 中英混合:英文/数字按 0.5 字算视觉宽度
+   * 4. 兜底硬切(罕见:单个标点段超 maxLen)
+   *
+   * @param text   原文
+   * @param maxLen 单行最大视觉长度,默认 13(≈12 字 + 一个标点)
+   * @returns 切好的字幕行数组
+   */
+  function visualLen(s) {
+    let n = 0;
+    for (const ch of s) n += /[a-zA-Z0-9 .,'":;\-]/.test(ch) ? 0.5 : 1;
+    return n;
+  }
+  function splitChunkToLines(text, maxLen = 13) {
+    const lines = [];
+    const sentences = [];
+    let buf = '';
+    for (const ch of text) {
+      buf += ch;
+      if ('。!?\n'.includes(ch)) { if (buf.trim()) sentences.push(buf.trim()); buf = ''; }
+    }
+    if (buf.trim()) sentences.push(buf.trim());
+    for (const sent of sentences) {
+      if (visualLen(sent) <= maxLen) { lines.push(sent); continue; }
+      const parts = [];
+      let pbuf = '';
+      for (const ch of sent) {
+        pbuf += ch;
+        if (',、;:'.includes(ch)) { parts.push(pbuf); pbuf = ''; }
+      }
+      if (pbuf) parts.push(pbuf);
+      let merged = '';
+      for (const p of parts) {
+        if (visualLen(merged) + visualLen(p) <= maxLen) merged += p;
+        else { if (merged) lines.push(merged); merged = p; }
+      }
+      if (merged) {
+        if (visualLen(merged) <= maxLen) lines.push(merged);
+        else {
+          let hbuf = '';
+          for (const ch of merged) { hbuf += ch; if (visualLen(hbuf) >= maxLen) { lines.push(hbuf); hbuf = ''; } }
+          if (hbuf) lines.push(hbuf);
+        }
+      }
+    }
+    return lines.filter(l => l.trim());
+  }
+
+  /**
+   * Subtitles · B 站风格字幕组件(白光晕深墨字,无背景,按 chunks 时间显示)
+   *
+   * 自动从当前 scene.chunks 取活动 chunk,按 splitChunkToLines 切成短行,
+   * 按字数比例分配 chunk 时间窗给每行显示。
+   *
+   * 必需:timeline.scenes[].chunks[](narrate-pipeline.mjs 已默认输出)
+   *
+   * Props(可覆盖默认样式):
+   *   bottom    距底部像素,默认 90(不贴边)
+   *   fontSize  字号,默认 32
+   *   color     字色,默认深墨 #1a1a1a(适合浅纸白底)
+   *   haloColor 光晕色,默认 rgba(245,241,232,0.9)(适合 #f5f1e8 底)
+   *   maxLen    单行最大视觉长度,默认 13
+   *
+   * 深底场景:把 color 改成 '#fff',haloColor 改成 'rgba(0,0,0,0.85)' 即可。
+   */
+  function Subtitles({ bottom = 90, fontSize = 32, color = '#1a1a1a', haloColor = 'rgba(245,241,232,0.9)', maxLen = 13 } = {}) {
+    const { time, scene } = React.useContext(NarrationContext);
+    if (!scene || !scene.chunks) return null;
+    const active = scene.chunks.find(c => time >= c.absoluteStart && time < c.absoluteEnd);
+    if (!active) return null;
+    const lines = splitChunkToLines(active.text, maxLen);
+    if (lines.length === 0) return null;
+    const totalLen = lines.reduce((s, l) => s + visualLen(l), 0);
+    const chunkDur = active.absoluteEnd - active.absoluteStart;
+    let acc = active.absoluteStart;
+    let activeLine = lines[lines.length - 1];
+    let lineStart = active.absoluteStart;
+    for (const line of lines) {
+      const dur = (visualLen(line) / totalLen) * chunkDur;
+      if (time < acc + dur) { activeLine = line; lineStart = acc; break; }
+      acc += dur;
+    }
+    const lineProg = Math.min(1, (time - lineStart) / 0.15);
+    return React.createElement('div', {
+      style: { position: 'absolute', left: 0, right: 0, bottom, display: 'flex', justifyContent: 'center', pointerEvents: 'none', zIndex: 50 },
+    }, React.createElement('div', {
+      key: lineStart,
+      style: {
+        fontFamily: '"PingFang SC", "Noto Sans SC", -apple-system, sans-serif',
+        fontSize, fontWeight: 600, color,
+        letterSpacing: '0.04em', lineHeight: 1.2, textAlign: 'center',
+        textShadow: `0 0 6px ${haloColor}, 0 0 12px ${haloColor}, 0 1px 2px rgba(255,255,255,0.5)`,
+        opacity: lineProg, transform: `translateY(${(1 - lineProg) * 4}px)`,
+      },
+    }, activeLine));
+  }
+
+  /**
+   * useSceneFade · scene 内辅助元素的软淡入淡出 helper
+   *
+   * 铁律第二条要求 scene 之间禁止硬切——但 scene 内辅助元素(数据卡、引用块)
+   * 一旦 cue 触发后默认会一直亮到 scene 结束。如果不淡出,离开本段进入下段时
+   * 这些元素会突兀地存在或瞬间消失。本 hook 提供 [入场淡入 → hold → 出场淡出] 的统一软切换。
+   *
+   * 用法(把 op 乘进辅助元素的 opacity):
+   *   const op = useSceneFade('md-side', 0.6, 0.8);  // 进 0.6s, 出 0.8s
+   *   <Cue id="agents-md">{(t, p) => (
+   *     <div style={{ opacity: op * p }}>...</div>
+   *   )}</Cue>
+   *
+   * 这样数据卡片在 md-side 段开始 0.6s 内淡入,在段结束前 0.8s 开始淡出,
+   * 与下一段的辅助元素淡入形成 overlap,画面不出现硬切。
+   *
+   * @param sceneId  scene id
+   * @param fadeIn   入场淡入秒数(默认 0.5)
+   * @param fadeOut  出场淡出秒数(默认 0.5)
+   * @returns 0-1 之间的不透明度倍率
+   */
+  function useSceneFade(sceneId, fadeIn = 0.5, fadeOut = 0.5) {
+    const { time, timeline } = React.useContext(NarrationContext);
+    if (!timeline) return 0;
+    const s = timeline.scenes.find(x => x.id === sceneId);
+    if (!s) return 0;
+    const inT = (time - s.start) / fadeIn;
+    const outT = (s.end - time) / fadeOut;
+    const v = Math.min(1, Math.min(inT, outT));
+    return Math.max(0, v);
+  }
+
+  return { NarrationStage, Scene, Cue, useNarration, useSceneFade, Subtitles, splitChunkToLines };
+})();
+
+if (typeof window !== 'undefined') {
+  Object.assign(window, { NarrationStageLib });
+}

+ 615 - 0
demos/md-html-narration/md-html-demo.html

@@ -0,0 +1,615 @@
+<!DOCTYPE html>
+<html lang="zh-CN">
+<head>
+<meta charset="UTF-8">
+<title>md还是html,这是个蠢问题 · 解说 demo (v3 · 字幕+持续运动+修溢出)</title>
+<script crossorigin src="https://unpkg.com/react@18/umd/react.production.min.js"></script>
+<script crossorigin src="https://unpkg.com/react-dom@18/umd/react-dom.production.min.js"></script>
+<script src="https://unpkg.com/@babel/standalone/babel.min.js"></script>
+<link rel="preconnect" href="https://fonts.googleapis.com">
+<link rel="preconnect" href="https://fonts.gstatic.com" crossorigin>
+<link href="https://fonts.googleapis.com/css2?family=Source+Serif+4:wght@300;400;600;700;800&family=Noto+Serif+SC:wght@400;600;700;900&family=JetBrains+Mono:wght@400;500;700&family=Noto+Sans+SC:wght@400;500;600;700&display=swap" rel="stylesheet">
+<style>
+  body { margin: 0; background: #0a0a0a; min-height: 100vh; display: flex; align-items: center; justify-content: center; flex-direction: column; padding: 20px; box-sizing: border-box; font-family: -apple-system, sans-serif; }
+  #root { box-shadow: 0 30px 80px rgba(0,0,0,0.6); border-radius: 4px; overflow: hidden; }
+  * { box-sizing: border-box; }
+</style>
+</head>
+<body>
+<div id="root"></div>
+
+<script type="text/babel">
+// ── timeline.json (inline · 精简版,每段含 chunks 用于字幕) ───
+const TIMELINE = {"title":"md还是html,这是个蠢问题","totalDuration":198.168,"voiceover":"voiceover.mp3","scenes":[
+  {"id":"opening","start":0,"end":22.32,"duration":22.32,"chunks":[
+    {"text":"前两天,","absoluteStart":0,"absoluteEnd":0.984},
+    {"text":"Claude Code 团队的 Thariq 发了篇爆文。标题就一句话,HTML 是新的 markdown。","absoluteStart":0.984,"absoluteEnd":8.5},
+    {"text":"他说他几乎不再写 md 文件了,全让 AI 给他生成 HTML。500 万阅读,X 上立马吵翻了。","absoluteStart":8.5,"absoluteEnd":14.952},
+    {"text":"一派是 md 党,觉得 md 才是 AI 时代的源代码。另一派觉得 HTML 才是终极答案。","absoluteStart":14.952,"absoluteEnd":22.32}
+  ],"cues":[{"id":"thariq","absoluteTime":0.984},{"id":"two-camps","absoluteTime":14.952}]},
+  {"id":"md-side","start":22.82,"end":56.516,"duration":33.696,"chunks":[
+    {"text":"md 党的证据其实挺硬的。","absoluteStart":22.82,"absoluteEnd":26.5},
+    {"text":"OpenAI 去年发的 AGENTS.md,60000 多个项目用,","absoluteStart":26.5,"absoluteEnd":31.5},
+    {"text":"AWS、Anthropic、Google、微软、OpenAI,AI 半壁江山一起捐进 Linux Foundation。","absoluteStart":31.5,"absoluteEnd":38.5},
+    {"text":"Karpathy 的 llm-wiki,单一个 CLAUDE.md 文件,5 万 star。","absoluteStart":38.5,"absoluteEnd":45.14},
+    {"text":"Cloudflare 实测,同一篇博客 HTML 一万六千 token,转成 md 只要三千。省 80%。","absoluteStart":45.14,"absoluteEnd":54.764},
+    {"text":"GitHub 官方说:文档不再是描述代码,文档就是代码。","absoluteStart":54.764,"absoluteEnd":56.516}
+  ],"cues":[{"id":"agents-md","absoluteTime":27.5},{"id":"token-saving","absoluteTime":45.14},{"id":"doc-is-code","absoluteTime":54.764}]},
+  {"id":"html-side","start":57.016,"end":100.168,"duration":43.152,"chunks":[
+    {"text":"但 html 党也没说错。Thariq 的论据我都同意。","absoluteStart":57.016,"absoluteEnd":62.92},
+    {"text":"第一是空间信息。diff、调用图、架构图本来就有空间维度,html 能左右对照。","absoluteStart":62.92,"absoluteEnd":74.632},
+    {"text":"第二是动态体验。按钮颜色、easing 曲线,文字描述再多没用,html 能让你直接看见。","absoluteStart":74.632,"absoluteEnd":85.864},
+    {"text":"第三是结构化阅读。可折叠章节、tab 代码块、边栏术语表。","absoluteStart":85.864,"absoluteEnd":93},
+    {"text":"Anthropic 的 Live Artifacts,HTML 已升级为可交互、能拉实时数据的 dashboard。","absoluteStart":93,"absoluteEnd":100.168}
+  ],"cues":[{"id":"spatial","absoluteTime":62.92},{"id":"dynamic","absoluteTime":74.632},{"id":"structured","absoluteTime":85.864}]},
+  {"id":"the-real-question","start":100.668,"end":117.588,"duration":16.92,"chunks":[
+    {"text":"我看完想说,","absoluteStart":100.668,"absoluteEnd":101.748},
+    {"text":"这俩根本是在争一个蠢问题。","absoluteStart":101.748,"absoluteEnd":106},
+    {"text":"两边都赢了。但赢的是不同的问题。","absoluteStart":106,"absoluteEnd":109.044},
+    {"text":"md 党回答:我们用什么写。","absoluteStart":109.044,"absoluteEnd":112.62},
+    {"text":"html 党回答:我们给人什么看。","absoluteStart":112.62,"absoluteEnd":115.5},
+    {"text":"两个不同问题,怎么会有谁取代谁。","absoluteStart":115.5,"absoluteEnd":117.588}
+  ],"cues":[{"id":"reveal","absoluteTime":101.748},{"id":"question-md","absoluteTime":109.044},{"id":"question-html","absoluteTime":112.62}]},
+  {"id":"the-split","start":118.088,"end":158.744,"duration":40.656,"chunks":[
+    {"text":"我觉得真问题是这个。","absoluteStart":118.088,"absoluteEnd":121},
+    {"text":"md 和 html 不是替代,是分工关系。","absoluteStart":121,"absoluteEnd":126.5},
+    {"text":"以前你写 md 自己也看 md,要折中,所以 md 胜出。","absoluteStart":126.5,"absoluteEnd":131},
+    {"text":"AI 出现后,生产成本被 AI 吸收。","absoluteStart":131,"absoluteEnd":135},
+    {"text":"原来要折中的需求,被拆成了两端的极端最优。","absoluteStart":135,"absoluteEnd":140},
+    {"text":"生产端要轻、要快、要 token efficient——那就是 md。","absoluteStart":140,"absoluteEnd":148.28},
+    {"text":"消费端要丰富、要可视化、要好分享——那就是 html。","absoluteStart":148.28,"absoluteEnd":153.464},
+    {"text":"两端各自登顶,中间那个折中位置,没人需要了。","absoluteStart":153.464,"absoluteEnd":158.744}
+  ],"cues":[{"id":"split","absoluteTime":122.84},{"id":"ai-changes","absoluteTime":131},{"id":"md-side-win","absoluteTime":148.28},{"id":"html-side-win","absoluteTime":153.464}]},
+  {"id":"activity-proof","start":159.244,"end":184.084,"duration":24.84,"chunks":[
+    {"text":"最干净的活样本是 Thariq 自己。","absoluteStart":159.244,"absoluteEnd":162.5},
+    {"text":"3 月份他发《Skills 指南》,强调核心还是 markdown。","absoluteStart":162.5,"absoluteEnd":167},
+    {"text":"5 月份他发《HTML is the new markdown》。","absoluteStart":167,"absoluteEnd":169.372},
+    {"text":"同一个人,两端各自登顶,互不打架。","absoluteStart":169.372,"absoluteEnd":174},
+    {"text":"Karpathy 和 Lex Fridman 那对组合也一样。","absoluteStart":174,"absoluteEnd":177},
+    {"text":"内核是 markdown wiki,外壳是动态 HTML——是加了一层消费层。","absoluteStart":177,"absoluteEnd":184.084}
+  ],"cues":[{"id":"thariq-march","absoluteTime":164.236},{"id":"same-person","absoluteTime":169.372},{"id":"karpathy-lex","absoluteTime":176.764}]},
+  {"id":"closing","start":184.584,"end":197.88,"duration":13.296,"chunks":[
+    {"text":"所以下次你想吵这个的时候,","absoluteStart":184.584,"absoluteEnd":186.672},
+    {"text":"先问自己一句——你面对的是「写」,还是「看」?","absoluteStart":186.672,"absoluteEnd":192},
+    {"text":"写,用 md。","absoluteStart":192,"absoluteEnd":193.704},
+    {"text":"看,用 html。","absoluteStart":193.704,"absoluteEnd":195.5},
+    {"text":"工具替你处理切换,立场可以放下了。","absoluteStart":195.5,"absoluteEnd":197.88}
+  ],"cues":[{"id":"final","absoluteTime":186.672},{"id":"md-final","absoluteTime":192},{"id":"html-final","absoluteTime":193.704}]}
+]};
+
+// ── narration_stage.jsx (inline) ─────────────────────────────
+const NarrationStageLib = (() => {
+  const NarrationContext = React.createContext({});
+  function NarrationStage({ timeline, audioSrc, width = 1920, height = 1080, background = '#0e0e0e', controls = true, children }) {
+    const audioRef = React.useRef(null);
+    const [time, setTime] = React.useState(0);
+    const [playing, setPlaying] = React.useState(false);
+    const recording = typeof window !== 'undefined' && window.__recording === true;
+    React.useEffect(() => { if (typeof window !== 'undefined') { window.__totalDuration = timeline.totalDuration; window.__ready = true; } }, [timeline.totalDuration]);
+    React.useEffect(() => {
+      let raf;
+      if (recording) {
+        let startedAt = null;
+        const tick = (now) => {
+          if (startedAt === null) startedAt = now;
+          setTime(Math.min((now - startedAt) / 1000, timeline.totalDuration));
+          raf = requestAnimationFrame(tick);
+        };
+        raf = requestAnimationFrame(tick);
+        if (typeof window !== 'undefined') window.__seek = (t) => { startedAt = performance.now() - t * 1000; setTime(t); };
+      } else {
+        const tick = () => {
+          if (audioRef.current && !audioRef.current.paused) setTime(audioRef.current.currentTime);
+          raf = requestAnimationFrame(tick);
+        };
+        tick();
+      }
+      return () => cancelAnimationFrame(raf);
+    }, [recording, timeline.totalDuration]);
+    const currentScene = React.useMemo(() => {
+      if (!timeline.scenes) return null;
+      for (let i = 0; i < timeline.scenes.length; i++) {
+        const s = timeline.scenes[i]; const next = timeline.scenes[i + 1];
+        if (time >= s.start && (!next || time < next.start)) return s;
+      }
+      return timeline.scenes[0];
+    }, [time, timeline.scenes]);
+    const sceneTime = currentScene ? Math.max(0, time - currentScene.start) : 0;
+    const allCues = React.useMemo(() => { const m = {}; for (const s of timeline.scenes || []) for (const c of s.cues || []) m[c.id] = c; return m; }, [timeline.scenes]);
+    const isCueTriggered = React.useCallback(id => { const c = allCues[id]; return c ? time >= c.absoluteTime : false; }, [allCues, time]);
+    const cueProgress = React.useCallback((id, ramp = 0.6) => { const c = allCues[id]; if (!c) return 0; const dt = time - c.absoluteTime; if (dt <= 0) return 0; if (dt >= ramp) return 1; return dt / ramp; }, [allCues, time]);
+    const ctx = { time, scene: currentScene, sceneTime, isCueTriggered, cueProgress, timeline };
+    return (
+      <NarrationContext.Provider value={ctx}>
+        <div style={{ position: 'relative', width, height, background, overflow: 'hidden', color: '#1a1a1a' }}>{children}</div>
+        {!recording && <audio ref={audioRef} src={audioSrc} preload="auto" onEnded={() => setPlaying(false)} />}
+        {!recording && controls && (
+          <div style={{ display: 'flex', alignItems: 'center', gap: 12, padding: '12px 16px', background: '#1a1a1a', color: '#ddd', fontFamily: 'monospace', fontSize: 13, width, boxSizing: 'border-box' }}>
+            <button onClick={() => { if (audioRef.current.paused) { audioRef.current.play(); setPlaying(true); } else { audioRef.current.pause(); setPlaying(false); } }} style={{ padding: '6px 14px', background: '#fff', color: '#000', border: 0, borderRadius: 4, cursor: 'pointer', fontWeight: 600 }}>{playing ? '❚❚ Pause' : '▶ Play'}</button>
+            <input type="range" min={0} max={timeline.totalDuration} step={0.01} value={time} onChange={e => { const t = parseFloat(e.target.value); audioRef.current.currentTime = t; setTime(t); }} style={{ flex: 1 }} />
+            <span style={{ minWidth: 130, textAlign: 'right' }}>{time.toFixed(2)} / {timeline.totalDuration.toFixed(2)}s</span>
+            <span style={{ padding: '4px 10px', background: '#2a2a2a', borderRadius: 4, minWidth: 130, textAlign: 'center' }}>{currentScene ? currentScene.id : '—'}</span>
+          </div>
+        )}
+      </NarrationContext.Provider>
+    );
+  }
+  function useNarration() { return React.useContext(NarrationContext); }
+  function useSceneFade(sceneId, fadeIn = 0.5, fadeOut = 0.5) {
+    const { time, timeline } = React.useContext(NarrationContext);
+    if (!timeline) return 0;
+    const s = timeline.scenes.find(x => x.id === sceneId);
+    if (!s) return 0;
+    const inT = (time - s.start) / fadeIn;
+    const outT = (s.end - time) / fadeOut;
+    return Math.max(0, Math.min(1, Math.min(inT, outT)));
+  }
+  function Cue({ id, ramp = 0.6, children }) {
+    const { isCueTriggered, cueProgress } = React.useContext(NarrationContext);
+    return children(isCueTriggered(id), cueProgress(id, ramp));
+  }
+  return { NarrationStage, Cue, useNarration, useSceneFade };
+})();
+const { NarrationStage, Cue, useNarration, useSceneFade } = NarrationStageLib;
+
+// ── 设计 token ────────────────────────────────────────────
+const C = {
+  paper: '#f5f1e8', paperDeep: '#ebe5d4',
+  ink: '#1a1a1a', inkSoft: '#3a3a3a', inkMute: '#888',
+  md: '#1B4965', html: '#C04A1A', green: '#7BC47F',
+};
+const F = {
+  display: '"Source Serif 4", "Noto Serif SC", Georgia, serif',
+  body: '"Noto Sans SC", "Noto Serif SC", "Source Serif 4", sans-serif',
+  mono: '"JetBrains Mono", Menlo, monospace',
+};
+
+// ── easing & interpolate ──────────────────────────────────
+const expoOut = t => t === 1 ? 1 : 1 - Math.pow(2, -10 * t);
+const lerp = (a, b, t) => a + (b - a) * t;
+const lerpC = (from, to, t) => ({
+  x: lerp(from.x, to.x, t), y: lerp(from.y, to.y, t),
+  scale: lerp(from.scale, to.scale, t),
+  opacity: lerp(from.opacity ?? 1, to.opacity ?? 1, t),
+});
+
+// ── HERO 状态表(v3:缩小 scale 避免溢出,y 留给字幕区) ──
+// 字幕条占 y=88-100 区域,所以 hero y ≤ 70%
+const HERO_KEYS = {
+  opening:             { md: { x: 50, y: 28, scale: 1.0, opacity: 1 }, html: { x: 50, y: 55, scale: 1.0, opacity: 1 } },
+  'md-side':           { md: { x: 72, y: 48, scale: 1.4, opacity: 1 }, html: { x: 92, y: 12, scale: 0.3, opacity: 0.5 } },
+  'html-side':         { md: { x: 8,  y: 12, scale: 0.3, opacity: 0.5 }, html: { x: 28, y: 48, scale: 1.4, opacity: 1 } },
+  'the-real-question': { md: { x: 30, y: 30, scale: 0.85, opacity: 1 }, html: { x: 70, y: 30, scale: 0.85, opacity: 1 } },
+  'the-split':         { md: { x: 22, y: 60, scale: 1.15, opacity: 1 }, html: { x: 78, y: 60, scale: 1.15, opacity: 1 } },
+  'activity-proof':    { md: { x: 18, y: 18, scale: 0.5, opacity: 1 }, html: { x: 82, y: 18, scale: 0.5, opacity: 1 } },
+  closing:             { md: { x: 28, y: 50, scale: 1.3, opacity: 1 }, html: { x: 72, y: 50, scale: 1.3, opacity: 1 } },
+};
+const SCENE_ORDER = ['opening', 'md-side', 'html-side', 'the-real-question', 'the-split', 'activity-proof', 'closing'];
+
+// ── HeroAnchor: 跨 scene hero + 持续微动(消除 3s 静止)──
+const HeroAnchor = () => {
+  const { time, scene } = useNarration();
+  if (!scene) return null;
+  const idx = SCENE_ORDER.indexOf(scene.id);
+  const prevId = idx > 0 ? SCENE_ORDER[idx - 1] : scene.id;
+  const fromPos = HERO_KEYS[prevId];
+  const toPos = HERO_KEYS[scene.id];
+  const transitionDur = Math.min(2.0, scene.duration * 0.45);
+  const t = expoOut(Math.min(1, Math.max(0, (time - scene.start) / transitionDur)));
+  const md = lerpC(fromPos.md, toPos.md, t);
+  const html = lerpC(fromPos.html, toPos.html, t);
+
+  // ── 持续微动:scale 呼吸 + figure-8 飘移(确保任意 3s 都有变化)──
+  const breath = 1 + Math.sin(time * 0.7) * 0.018;
+  const driftXm = Math.cos(time * 0.32) * 0.6;
+  const driftYm = Math.sin(time * 0.41) * 0.5;
+  const driftXh = Math.sin(time * 0.28) * 0.6;
+  const driftYh = Math.cos(time * 0.37) * 0.5;
+
+  const baseSize = 240; // 缩小 from 360
+  const renderHero = (label, pos, color, dx, dy) => {
+    const px = (pos.x + dx) * 19.2;
+    const py = (pos.y + dy) * 10.8;
+    return (
+      <div key={label} style={{
+        position: 'absolute', left: px, top: py,
+        transform: `translate(-50%, -50%) scale(${pos.scale * breath})`,
+        opacity: pos.opacity,
+        fontSize: baseSize, fontFamily: F.display, fontWeight: 800,
+        color, lineHeight: 1, letterSpacing: '-0.02em',
+        willChange: 'transform, opacity', pointerEvents: 'none',
+      }}>{label}</div>
+    );
+  };
+  return (
+    <div style={{ position: 'absolute', inset: 0, perspective: '2400px' }}>
+      <div style={{ position: 'absolute', inset: 0, transformStyle: 'preserve-3d', transform: 'rotateX(2deg) rotateY(-1deg)' }}>
+        {renderHero('md', md, C.md, driftXm, driftYm)}
+        {renderHero('html', html, C.html, driftXh, driftYh)}
+      </div>
+    </div>
+  );
+};
+
+// ── BackgroundDrift ────────────────────────────────────────
+const BackgroundDrift = () => {
+  const { time } = useNarration();
+  const dx = Math.sin(time * 0.08) * 16;
+  const dy = Math.cos(time * 0.06) * 12;
+  return (
+    <div style={{
+      position: 'absolute', inset: -40,
+      background: `radial-gradient(ellipse 1400px 800px at ${50 + dx/4}% ${50 + dy/4}%, ${C.paperDeep} 0%, ${C.paper} 60%, ${C.paper} 100%)`,
+      pointerEvents: 'none',
+    }} />
+  );
+};
+
+// ── Subtitles: B 站风字幕(白字 + 黑描边,无背景,每行 ≤12 字不截断句子)──
+// 把每个 chunk 按标点切成短行,按字数比例分配 chunk 时间窗显示
+
+// 切分算法:先按强标点(。!?\n)切句,每句再按弱标点(,、;:)合并到 maxLen
+// 中英混合:英文字母按 0.5 字算(视觉宽度近似)
+function visualLen(s) {
+  let n = 0;
+  for (const ch of s) n += /[a-zA-Z0-9 .,'":;\-]/.test(ch) ? 0.5 : 1;
+  return n;
+}
+function splitChunkToLines(text, maxLen = 13) {
+  const lines = [];
+  // 1. 按强标点切句(保留标点)
+  const sentences = [];
+  let buf = '';
+  for (const ch of text) {
+    buf += ch;
+    if ('。!?\n'.includes(ch)) {
+      if (buf.trim()) sentences.push(buf.trim());
+      buf = '';
+    }
+  }
+  if (buf.trim()) sentences.push(buf.trim());
+
+  // 2. 每句按弱标点切并合并到 maxLen 以内(不跨句号边界)
+  for (const sent of sentences) {
+    if (visualLen(sent) <= maxLen) { lines.push(sent); continue; }
+    // 按弱标点切(保留标点跟前段)
+    const parts = [];
+    let pbuf = '';
+    for (const ch of sent) {
+      pbuf += ch;
+      if (',、;:'.includes(ch)) { parts.push(pbuf); pbuf = ''; }
+    }
+    if (pbuf) parts.push(pbuf);
+    // 合并到 maxLen
+    let merged = '';
+    for (const p of parts) {
+      if (visualLen(merged) + visualLen(p) <= maxLen) merged += p;
+      else { if (merged) lines.push(merged); merged = p; }
+    }
+    if (merged) {
+      if (visualLen(merged) <= maxLen) lines.push(merged);
+      else {
+        // 兜底硬切(罕见:单个标点段超 maxLen)
+        let hbuf = '';
+        for (const ch of merged) {
+          hbuf += ch;
+          if (visualLen(hbuf) >= maxLen) { lines.push(hbuf); hbuf = ''; }
+        }
+        if (hbuf) lines.push(hbuf);
+      }
+    }
+  }
+  return lines.filter(l => l.trim());
+}
+
+const Subtitles = () => {
+  const { time, scene } = useNarration();
+  if (!scene || !scene.chunks) return null;
+  const active = scene.chunks.find(c => time >= c.absoluteStart && time < c.absoluteEnd);
+  if (!active) return null;
+  const lines = splitChunkToLines(active.text);
+  if (lines.length === 0) return null;
+  // 按字数比例把 chunk 时长分配给每行
+  const totalLen = lines.reduce((s, l) => s + visualLen(l), 0);
+  const chunkDur = active.absoluteEnd - active.absoluteStart;
+  let acc = active.absoluteStart;
+  let activeLine = lines[lines.length - 1];
+  let lineStart = active.absoluteStart;
+  for (const line of lines) {
+    const dur = (visualLen(line) / totalLen) * chunkDur;
+    if (time < acc + dur) { activeLine = line; lineStart = acc; break; }
+    acc += dur;
+  }
+  // 行内淡入 0.15s
+  const lineProg = Math.min(1, (time - lineStart) / 0.15);
+  return (
+    <div style={{
+      position: 'absolute', left: 0, right: 0, bottom: 90,
+      display: 'flex', justifyContent: 'center', pointerEvents: 'none', zIndex: 50,
+    }}>
+      <div key={lineStart} style={{
+        fontFamily: '"PingFang SC", "Noto Sans SC", -apple-system, sans-serif',
+        fontSize: 32, fontWeight: 600, color: C.ink,
+        letterSpacing: '0.04em', lineHeight: 1.2, textAlign: 'center',
+        // 浅纸白背景上:深墨字 + 极细白色光晕,让字在底上跳出来又不重
+        textShadow: '0 0 6px rgba(245,241,232,0.9), 0 0 12px rgba(245,241,232,0.7), 0 1px 2px rgba(255,255,255,0.5)',
+        opacity: lineProg, transform: `translateY(${(1 - lineProg) * 4}px)`,
+      }}>
+        {activeLine}
+      </div>
+    </div>
+  );
+};
+
+// ── 段标签 ─────────────────────────────────────────────
+const SceneLabel = ({ sceneId, text }) => {
+  const op = useSceneFade(sceneId, 0.4, 0.4);
+  return (
+    <div style={{
+      position: 'absolute', top: 56, left: 80, fontFamily: F.mono, fontSize: 14,
+      color: C.inkMute, letterSpacing: '0.22em', textTransform: 'uppercase', opacity: op,
+    }}>{text}</div>
+  );
+};
+
+// ── 各 scene 辅助元素 ──────────────────────────────────
+const OpeningAux = () => {
+  const op = useSceneFade('opening', 0.6, 1.0);
+  return (
+    <>
+      <Cue id="thariq">{(t, p) => (
+        <div style={{ position: 'absolute', top: 110, left: 100, opacity: op * p, transform: `translateY(${(1-p)*20}px)`, maxWidth: 700 }}>
+          <div style={{ fontFamily: F.mono, fontSize: 14, color: C.inkMute, marginBottom: 10, letterSpacing: '0.12em' }}>2026.05.07 · @THARIQ · CLAUDE CODE</div>
+          <div style={{ fontSize: 56, fontFamily: F.display, fontWeight: 700, lineHeight: 1.05, color: C.ink, fontStyle: 'italic' }}>
+            HTML is the new<br/>markdown.
+          </div>
+        </div>
+      )}</Cue>
+      <Cue id="two-camps">{(t, p) => t && (
+        <div style={{ position: 'absolute', top: 110, right: 100, opacity: op * p, transform: `translateY(${(1-p)*16}px)`, fontFamily: F.mono, fontSize: 18, color: C.inkSoft, textAlign: 'right' }}>
+          <div style={{ fontSize: 38, fontWeight: 700, color: C.ink, letterSpacing: '-0.02em' }}>5,000,000</div>
+          <div style={{ fontSize: 13, color: C.inkMute, letterSpacing: '0.18em', marginTop: 4 }}>阅读 · &lt; 24H</div>
+        </div>
+      )}</Cue>
+    </>
+  );
+};
+
+const MdSideAux = () => {
+  const op = useSceneFade('md-side', 0.6, 0.8);
+  return (
+    <>
+      <Cue id="agents-md">{(t, p) => (
+        <div style={{ position: 'absolute', left: 80, top: 200, opacity: op * p, transform: `translateY(${(1-p)*16}px)` }}>
+          <div style={{ fontFamily: F.mono, fontSize: 13, color: C.inkMute, marginBottom: 6, letterSpacing: '0.12em' }}>AGENTS.md · OpenAI 2025</div>
+          <div style={{ fontSize: 76, fontFamily: F.display, fontWeight: 700, color: C.ink, lineHeight: 0.95 }}>60,000<span style={{ color: C.html }}>+</span></div>
+          <div style={{ fontSize: 18, color: C.inkSoft, marginTop: 4, fontFamily: F.body }}>开源项目采用</div>
+          <div style={{ marginTop: 14, fontFamily: F.mono, fontSize: 12, color: C.inkMute, letterSpacing: '0.1em' }}>AWS · ANTHROPIC · GOOGLE · MICROSOFT · OPENAI</div>
+        </div>
+      )}</Cue>
+      <Cue id="agents-md">{(t, p) => (
+        <div style={{ position: 'absolute', left: 80, top: 460, opacity: op * Math.max(0, p - 0.25) * 1.33, transform: `translateY(${(1-p)*16}px)` }}>
+          <div style={{ fontFamily: F.mono, fontSize: 13, color: C.inkMute, marginBottom: 4, letterSpacing: '0.12em' }}>karpathy/llm-wiki · CLAUDE.md</div>
+          <div style={{ fontSize: 64, fontFamily: F.display, fontWeight: 700, color: C.ink, lineHeight: 0.95 }}>50,000<span style={{ color: C.html }}>★</span></div>
+        </div>
+      )}</Cue>
+      <Cue id="token-saving">{(t, p) => t && (
+        <div style={{ position: 'absolute', left: 80, top: 640, opacity: op * p, transform: `translateY(${(1-p)*14}px)`, padding: '28px 36px', background: C.ink, color: C.paper, minWidth: 540, fontFamily: F.mono }}>
+          <div style={{ fontSize: 11, color: '#999', letterSpacing: '0.2em', marginBottom: 14 }}>CLOUDFLARE 实测 · 同一篇博客</div>
+          <div style={{ display: 'flex', alignItems: 'baseline', gap: 20, marginBottom: 14 }}>
+            <div>
+              <div style={{ fontSize: 11, color: '#777', marginBottom: 2 }}>HTML</div>
+              <div style={{ fontSize: 50, fontWeight: 700, color: C.html, lineHeight: 1 }}>16,180</div>
+            </div>
+            <div style={{ fontSize: 32, color: '#555' }}>→</div>
+            <div>
+              <div style={{ fontSize: 11, color: '#777', marginBottom: 2 }}>md</div>
+              <div style={{ fontSize: 50, fontWeight: 700, color: C.green, lineHeight: 1 }}>3,150</div>
+            </div>
+          </div>
+          <div style={{ fontSize: 70, fontFamily: F.display, fontWeight: 700, color: C.html, lineHeight: 0.95, fontStyle: 'italic' }}>−80% token</div>
+        </div>
+      )}</Cue>
+    </>
+  );
+};
+
+const HtmlSideAux = () => {
+  const op = useSceneFade('html-side', 0.6, 0.8);
+  const items = [
+    { cue: 'spatial', label: '空间信息', desc: 'diff · 调用图 · 架构图', md: '一行字', html: '左右对照', topPx: 220 },
+    { cue: 'dynamic', label: '动态体验', desc: '按钮 · easing · 动效', md: '文字描述', html: '直接看见', topPx: 410 },
+    { cue: 'structured', label: '结构化阅读', desc: '可折叠 · tab · 边栏', md: '线性堆字', html: '真的会读', topPx: 600 },
+  ];
+  return (
+    <>
+      {items.map((it, i) => (
+        <Cue key={it.cue} id={it.cue}>{(t, p) => (
+          <div style={{ position: 'absolute', right: 80, top: it.topPx, opacity: op * p, transform: `translateX(${(1-p)*40}px)`, display: 'flex', alignItems: 'baseline', gap: 22, justifyContent: 'flex-end' }}>
+            <div style={{ fontFamily: F.mono, fontSize: 16, color: C.html, fontWeight: 700, letterSpacing: '0.18em' }}>0{i+1}</div>
+            <div style={{ textAlign: 'right' }}>
+              <div style={{ fontSize: 32, fontFamily: F.display, fontWeight: 600, color: C.ink }}>{it.label}</div>
+              <div style={{ fontSize: 16, color: C.inkMute, fontFamily: F.mono, marginTop: 3 }}>{it.desc}</div>
+              <div style={{ marginTop: 10, display: 'flex', alignItems: 'baseline', gap: 12, justifyContent: 'flex-end', fontFamily: F.body }}>
+                <span style={{ fontSize: 19, color: C.inkMute, textDecoration: 'line-through' }}>md: {it.md}</span>
+                <span style={{ fontSize: 16, color: C.html }}>→</span>
+                <span style={{ fontSize: 19, color: C.html, fontWeight: 600 }}>html: {it.html}</span>
+              </div>
+            </div>
+          </div>
+        )}</Cue>
+      ))}
+    </>
+  );
+};
+
+const RealQuestionAux = () => {
+  const op = useSceneFade('the-real-question', 0.4, 0.4);
+  return (
+    <>
+      <Cue id="reveal">{(t, p) => (
+        <div style={{ position: 'absolute', top: 480, left: 0, right: 0, textAlign: 'center', opacity: op * p }}>
+          <div style={{ fontSize: 26, fontFamily: F.body, color: C.inkMute, marginBottom: 14, fontWeight: 300 }}>这俩根本是在争一个</div>
+          <div style={{ fontSize: 170, fontFamily: F.display, fontWeight: 800, color: C.html, lineHeight: 0.95, letterSpacing: '0.05em', fontStyle: 'italic' }}>蠢问题</div>
+        </div>
+      )}</Cue>
+      <Cue id="question-md">{(t, p) => (
+        <div style={{ position: 'absolute', top: 770, left: 200, opacity: op * p, transform: `translateX(${(1-p)*-20}px)`, fontFamily: F.body, fontSize: 32, color: C.ink, textAlign: 'right', maxWidth: 360 }}>
+          <div style={{ fontFamily: F.mono, fontSize: 13, color: C.md, letterSpacing: '0.18em', marginBottom: 8 }}>MD 党在回答</div>
+          我们用什么<span style={{ color: C.md, fontStyle: 'italic', fontWeight: 700 }}>写</span>?
+        </div>
+      )}</Cue>
+      <div style={{ position: 'absolute', top: 800, left: 0, right: 0, fontSize: 48, color: C.inkMute, textAlign: 'center', fontFamily: F.mono, opacity: op * 0.6 }}>≠</div>
+      <Cue id="question-html">{(t, p) => (
+        <div style={{ position: 'absolute', top: 770, right: 200, opacity: op * p, transform: `translateX(${(1-p)*20}px)`, fontFamily: F.body, fontSize: 32, color: C.ink, maxWidth: 360 }}>
+          <div style={{ fontFamily: F.mono, fontSize: 13, color: C.html, letterSpacing: '0.18em', marginBottom: 8 }}>HTML 党在回答</div>
+          我们给人什么<span style={{ color: C.html, fontStyle: 'italic', fontWeight: 700 }}>看</span>?
+        </div>
+      )}</Cue>
+    </>
+  );
+};
+
+const SplitAux = () => {
+  const op = useSceneFade('the-split', 0.4, 0.6);
+  return (
+    <>
+      <Cue id="split">{(t, p) => (
+        <div style={{ position: 'absolute', top: 110, left: 0, right: 0, textAlign: 'center', opacity: op * p, transform: `translateY(${(1-p)*15}px)` }}>
+          <div style={{ fontSize: 22, color: C.inkMute, fontFamily: F.body, marginBottom: 6 }}>md 和 html 不是替代,是</div>
+          <div style={{ fontSize: 110, fontFamily: F.display, fontWeight: 800, color: C.ink, letterSpacing: '0.04em', lineHeight: 1 }}>
+            分工<span style={{ color: C.html }}>关系</span>
+          </div>
+        </div>
+      )}</Cue>
+      <Cue id="ai-changes">{(t, p) => t && (
+        <div style={{ position: 'absolute', top: 320, left: 0, right: 0, textAlign: 'center', opacity: op * p, fontFamily: F.body, fontSize: 20, color: C.inkSoft, lineHeight: 1.7, maxWidth: 1100, margin: '0 auto' }}>
+          <div style={{ maxWidth: 980, margin: '0 auto' }}>
+            以前你写 md 自己也看 md,所以折中。<br/>
+            AI 出现后,生产成本被 AI 吸收,原来要折中的需求<strong>被拆成了两端的极端最优。</strong>
+          </div>
+        </div>
+      )}</Cue>
+      {/* 生产端 / 消费端标签放 hero 上方,避免被遮挡 */}
+      <Cue id="md-side-win">{(t, p) => (
+        <div style={{ position: 'absolute', top: 470, left: '22%', transform: `translateX(-50%) translateY(${(1-p)*30}px)`, opacity: op * p, textAlign: 'center' }}>
+          <div style={{ fontFamily: F.mono, fontSize: 13, color: C.md, letterSpacing: '0.22em', marginBottom: 6 }}>生产端</div>
+          <div style={{ fontSize: 19, color: C.inkSoft, fontFamily: F.body }}>轻 · 快 · token-efficient</div>
+        </div>
+      )}</Cue>
+      <Cue id="html-side-win">{(t, p) => (
+        <div style={{ position: 'absolute', top: 470, left: '78%', transform: `translateX(-50%) translateY(${(1-p)*30}px)`, opacity: op * p, textAlign: 'center' }}>
+          <div style={{ fontFamily: F.mono, fontSize: 13, color: C.html, letterSpacing: '0.22em', marginBottom: 6 }}>消费端</div>
+          <div style={{ fontSize: 19, color: C.inkSoft, fontFamily: F.body }}>丰富 · 可视化 · 好分享</div>
+        </div>
+      )}</Cue>
+    </>
+  );
+};
+
+const ProofAux = () => {
+  const op = useSceneFade('activity-proof', 0.4, 0.5);
+  return (
+    <>
+      <div style={{ position: 'absolute', top: 320, left: 0, right: 0, textAlign: 'center', opacity: op, fontSize: 28, fontFamily: F.body, color: C.ink }}>
+        最干净的活样本是 <span style={{ color: C.html, fontFamily: F.mono, fontWeight: 700 }}>@thariq</span>
+      </div>
+      <Cue id="thariq-march">{(t, p) => (
+        <div style={{ position: 'absolute', top: 410, left: '50%', transform: `translateX(-50%) translateY(${(1-p)*16}px)`, opacity: op * p, display: 'flex', alignItems: 'center', gap: 22 }}>
+          <div style={{ fontFamily: F.mono, fontSize: 19, color: C.md, fontWeight: 700, minWidth: 90, textAlign: 'right' }}>2026.03</div>
+          <div style={{ width: 12, height: 12, borderRadius: 6, background: C.md }} />
+          <div style={{ fontSize: 23, fontFamily: F.body, color: C.ink, minWidth: 380 }}>《Skills 指南》—— <span style={{ color: C.md }}>核心还是 markdown</span></div>
+        </div>
+      )}</Cue>
+      <Cue id="same-person">{(t, p) => (
+        <div style={{ position: 'absolute', top: 480, left: '50%', transform: `translateX(-50%) translateY(${(1-p)*16}px)`, opacity: op * p, display: 'flex', alignItems: 'center', gap: 22 }}>
+          <div style={{ fontFamily: F.mono, fontSize: 19, color: C.html, fontWeight: 700, minWidth: 90, textAlign: 'right' }}>2026.05</div>
+          <div style={{ width: 12, height: 12, borderRadius: 6, background: C.html }} />
+          <div style={{ fontSize: 23, fontFamily: F.body, color: C.ink, minWidth: 380 }}>《HTML is the new markdown》</div>
+        </div>
+      )}</Cue>
+      <Cue id="same-person">{(t, p) => t && (
+        <div style={{ position: 'absolute', top: 580, left: 0, right: 0, textAlign: 'center', opacity: op * p, fontFamily: F.display, fontSize: 28, color: C.ink, fontStyle: 'italic' }}>
+          同一个人 · 两端各自登顶 · 互不打架
+        </div>
+      )}</Cue>
+      <Cue id="karpathy-lex">{(t, p) => t && (
+        <div style={{ position: 'absolute', top: 700, left: '50%', transform: `translateX(-50%) translateY(${(1-p)*14}px)`, opacity: op * p, padding: '18px 28px', background: C.ink, color: C.paper, display: 'flex', alignItems: 'center', gap: 30 }}>
+          <div style={{ fontFamily: F.mono, fontSize: 12, color: '#999', letterSpacing: '0.2em' }}>KARPATHY × LEX</div>
+          <div style={{ display: 'flex', gap: 20, alignItems: 'center', fontFamily: F.body }}>
+            <div>
+              <div style={{ fontSize: 10, color: '#999', fontFamily: F.mono, marginBottom: 2, letterSpacing: '0.12em' }}>内核</div>
+              <div style={{ fontSize: 19, color: C.md, fontWeight: 600 }}>markdown wiki</div>
+            </div>
+            <div style={{ fontSize: 19, color: '#666' }}>+</div>
+            <div>
+              <div style={{ fontSize: 10, color: '#999', fontFamily: F.mono, marginBottom: 2, letterSpacing: '0.12em' }}>外壳</div>
+              <div style={{ fontSize: 19, color: C.html, fontWeight: 600 }}>动态 HTML</div>
+            </div>
+          </div>
+        </div>
+      )}</Cue>
+    </>
+  );
+};
+
+const ClosingAux = () => {
+  const op = useSceneFade('closing', 0.3, 0.6);
+  return (
+    <>
+      <Cue id="final">{(t, p) => (
+        <div style={{ position: 'absolute', top: 110, left: 0, right: 0, textAlign: 'center', opacity: op * p, transform: `translateY(${(1-p)*12}px)` }}>
+          <div style={{ fontSize: 22, color: C.inkMute, fontFamily: F.body, marginBottom: 12 }}>下次想吵的时候,先问自己 ——</div>
+          <div style={{ fontSize: 68, fontFamily: F.display, fontWeight: 700, color: C.ink, lineHeight: 1.15 }}>
+            你面对的是「<span style={{ color: C.md }}>写</span>」,
+            还是「<span style={{ color: C.html }}>看</span>」?
+          </div>
+        </div>
+      )}</Cue>
+      <Cue id="md-final">{(t, p) => (
+        <div style={{ position: 'absolute', top: 740, left: '28%', transform: `translateX(-50%) translateY(${(1-p)*16}px)`, opacity: op * p, textAlign: 'center' }}>
+          <div style={{ fontSize: 42, fontFamily: F.display, fontWeight: 600, color: C.md, letterSpacing: '0.04em' }}>写</div>
+          <div style={{ fontSize: 22, color: C.inkMute, fontFamily: F.mono, marginTop: 4 }}>↓</div>
+        </div>
+      )}</Cue>
+      <Cue id="html-final">{(t, p) => (
+        <div style={{ position: 'absolute', top: 740, left: '72%', transform: `translateX(-50%) translateY(${(1-p)*16}px)`, opacity: op * p, textAlign: 'center' }}>
+          <div style={{ fontSize: 42, fontFamily: F.display, fontWeight: 600, color: C.html, letterSpacing: '0.04em' }}>看</div>
+          <div style={{ fontSize: 22, color: C.inkMute, fontFamily: F.mono, marginTop: 4 }}>↓</div>
+        </div>
+      )}</Cue>
+    </>
+  );
+};
+
+// ── 主 App ─────────────────────────────────────────
+const App = () => (
+  <NarrationStage timeline={TIMELINE} audioSrc="_narration/voiceover.mp3" width={1920} height={1080} background={C.paper}>
+    <BackgroundDrift />
+    <HeroAnchor />
+    <SceneLabel sceneId="opening" text="2026.05.07 · X" />
+    <SceneLabel sceneId="md-side" text="MD 党的证据" />
+    <SceneLabel sceneId="html-side" text="HTML 党的证据" />
+    <SceneLabel sceneId="the-real-question" text="真问题" />
+    <SceneLabel sceneId="the-split" text="MD 生产 · HTML 消费" />
+    <SceneLabel sceneId="activity-proof" text="活样本" />
+    <SceneLabel sceneId="closing" text="结语" />
+    <OpeningAux />
+    <MdSideAux />
+    <HtmlSideAux />
+    <RealQuestionAux />
+    <SplitAux />
+    <ProofAux />
+    <ClosingAux />
+    {/* 字幕条放最上层(z-index 自然在 DOM 顺序最后),盖住下方内容 */}
+    <Subtitles />
+    <div style={{ position: 'absolute', bottom: 24, right: 36, fontSize: 11, color: 'rgba(26,26,26,0.35)', letterSpacing: '0.2em', fontFamily: F.mono, pointerEvents: 'none' }}>
+      Created by Huashu-Design
+    </div>
+  </NarrationStage>
+);
+
+ReactDOM.createRoot(document.getElementById('root')).render(<App />);
+</script>
+</body>
+</html>

+ 69 - 0
demos/md-html-narration/script.md

@@ -0,0 +1,69 @@
+---
+title: md还是html,这是个蠢问题
+gap: 0.5
+---
+
+## opening
+前两天,[[cue:thariq]]Claude Code 团队的 Thariq 发了篇爆文。
+标题就一句话,HTML 是新的 markdown。
+他说他几乎不再写 md 文件了,全让 AI 给他生成 HTML。
+500 万阅读,X 上立马吵翻了。
+一派是 md 党,[[cue:two-camps]]觉得 md 才是 AI 时代的源代码。
+另一派觉得 Thariq 说得对,HTML 才是终极答案。
+
+## md-side
+md 党的证据其实挺硬的。
+你看 OpenAI 去年发的 AGENTS.md,[[cue:agents-md]]60000 多个项目用,AWS、Anthropic、Google、微软、OpenAI,AI 半壁江山一起捐进 Linux Foundation 做开放标准。
+Karpathy 的 llm-wiki,主体就是三层 markdown,单一个 CLAUDE.md 文件,5 万 star。
+Cloudflare 实测过一组数据,[[cue:token-saving]]同一篇博客,HTML 一万六千 token,转成 md 只要三千。
+省 80%。
+GitHub 官方也讲过一句,文档不再是描述代码,[[cue:doc-is-code]]文档就是代码。
+
+## html-side
+但 html 党也没说错。
+Thariq 那篇文章里几条论据我都同意。
+第一是空间信息。[[cue:spatial]]diff、调用图、架构图,本来就是有空间维度的,md 把它压成一行字,html 能左右对照,理解效率不是一个量级的。
+第二是动态体验。[[cue:dynamic]]做产品原型,按钮按下去什么颜色、什么 easing 曲线,文字描述再多没用,html 能让你直接看见。
+第三是结构化阅读。[[cue:structured]]可折叠章节、tab 代码块、边栏术语表,跟同样的字线性堆一遍是两种东西。
+Anthropic 现在的 Live Artifacts,HTML 已经从静态产物升级成可以交互、能拉实时数据的 dashboard。
+
+## the-real-question
+我看完想说,[[cue:reveal]]这俩根本是在争一个蠢问题。
+两边都赢了。
+但赢的是不同的问题。
+md 党回答的是,[[cue:question-md]]我们用什么写。
+html 党回答的是,[[cue:question-html]]我们给人什么看。
+这是两个问题。
+怎么会有谁取代谁。
+
+## the-split
+我觉得真问题是这个。
+md 和 html 不是替代关系,[[cue:split]]是分工关系。
+以前你写 md 自己也看 md。
+那时候要折中,所以 md 胜出。
+但 AI 出现后,[[cue:ai-changes]]第一次有了一个新情况。
+生产成本可以被 AI 吸收。
+HTML 那部分太重的代价,AI 替你扛。
+你只负责消费。
+原来要折中的需求,被拆成了两端的极端最优。
+生产端要轻、要快、要 token efficient,[[cue:md-side-win]]那就是 md。
+消费端要丰富、要可视化、要好分享,[[cue:html-side-win]]那就是 html。
+两端各自登顶。
+中间那个折中位置,没人需要了。
+
+## activity-proof
+最干净的活样本是 Thariq 自己。
+3 月份他发了篇 Skills 指南,[[cue:thariq-march]]强调核心还是 markdown。
+5 月份他发了 HTML 是新 markdown。
+同一个人,[[cue:same-person]]两端各自登顶,互不打架。
+Karpathy 和 Lex Fridman 那对组合也一样。
+内核是 markdown wiki,[[cue:karpathy-lex]]外壳是动态 HTML。
+不是 Lex 替换了 Karpathy,是他在 Karpathy 的基础上加了一层消费层。
+
+## closing
+所以下次你想吵这个的时候,[[cue:final]]先问自己一句。
+你现在面对的是「写」,还是「看」。
+写,[[cue:md-final]]用 md。
+看,[[cue:html-final]]用 html。
+工具替你处理切换。
+立场可以放下了。

+ 17 - 0
demos/voiceover-demo/script.md

@@ -0,0 +1,17 @@
+---
+title: 什么是 token
+gap: 0.4
+---
+
+## intro
+你有没有想过,[[cue:question]]当我们和 AI 对话的时候,AI 到底是怎么理解我们的话的呢。
+
+## token-1
+答案是它根本不理解汉字,[[cue:reveal]]它只认识 token。
+
+## token-2
+你可以把 token 理解成 AI 的最小信息单位。
+比如「人工智能」这四个字,[[cue:split]]在 AI 眼里可能是两个 token:人工,智能。
+
+## ending
+所以下次看到「百万 token 上下文」这种说法,[[cue:context]]你就知道,它说的是 AI 一次能记住多少个这样的小块。

+ 201 - 0
demos/voiceover-demo/什么是token.html

@@ -0,0 +1,201 @@
+<!DOCTYPE html>
+<html lang="zh-CN">
+<head>
+<meta charset="UTF-8">
+<title>什么是 token · narration demo</title>
+<script crossorigin src="https://unpkg.com/react@18/umd/react.production.min.js"></script>
+<script crossorigin src="https://unpkg.com/react-dom@18/umd/react-dom.production.min.js"></script>
+<script src="https://unpkg.com/@babel/standalone/babel.min.js"></script>
+<style>
+  body { margin: 0; background: #0a0a0a; font-family: -apple-system, BlinkMacSystemFont, "PingFang SC", sans-serif; min-height: 100vh; display: flex; align-items: center; justify-content: center; flex-direction: column; }
+  #root { box-shadow: 0 20px 60px rgba(0,0,0,0.5); }
+  .scene-padding { padding: 120px; height: 100%; box-sizing: border-box; display: flex; flex-direction: column; justify-content: center; }
+</style>
+</head>
+<body>
+<div id="root"></div>
+
+<script type="text/babel">
+// ── timeline.json (inline) ─────────────────────────────────
+const TIMELINE = {
+  "title": "什么是 token",
+  "voice": null,
+  "speed": 1,
+  "gap": 0.4,
+  "totalDuration": 23.808,
+  "scenes": [
+    {"id":"intro","start":0,"end":4.368,"duration":4.368,"audio":"audio/intro.mp3","text":"你有没有想过,当我们和 AI 对话的时候,AI 到底是怎么理解我们的话的呢。","cues":[{"id":"question","offset":1.08,"absoluteTime":1.08}]},
+    {"id":"token-1","start":4.768,"end":7.576,"duration":2.808,"audio":"audio/token-1.mp3","text":"答案是它根本不理解汉字,它只认识 token。","cues":[{"id":"reveal","offset":1.632,"absoluteTime":6.4}]},
+    {"id":"token-2","start":7.976,"end":16.808,"duration":8.832,"audio":"audio/token-2.mp3","text":"你可以把 token 理解成 AI 的最小信息单位。\n比如「人工智能」这四个字,在 AI 眼里可能是两个 token:人工,智能。","cues":[{"id":"split","offset":5.4,"absoluteTime":13.376}]},
+    {"id":"ending","start":17.208,"end":23.664,"duration":6.456,"audio":"audio/ending.mp3","text":"所以下次看到「百万 token 上下文」这种说法,你就知道,它说的是 AI 一次能记住多少个这样的小块。","cues":[{"id":"context","offset":2.376,"absoluteTime":19.584}]}
+  ],
+  "voiceover": "voiceover.mp3"
+};
+
+// ── narration_stage.jsx (inline) ───────────────────────────
+const NarrationStageLib = (() => {
+  const NarrationContext = React.createContext({ time: 0, scene: null, sceneTime: 0, isCueTriggered: () => false, cueProgress: () => 0 });
+
+  function NarrationStage({ timeline, audioSrc, width = 1920, height = 1080, background = '#0e0e0e', controls = true, children }) {
+    const audioRef = React.useRef(null);
+    const [time, setTime] = React.useState(0);
+    const [playing, setPlaying] = React.useState(false);
+    const recording = typeof window !== 'undefined' && window.__recording === true;
+
+    React.useEffect(() => {
+      if (typeof window === 'undefined') return;
+      window.__totalDuration = timeline.totalDuration;
+      window.__ready = true;
+    }, [timeline.totalDuration]);
+
+    React.useEffect(() => {
+      let raf;
+      const tick = () => {
+        if (recording) {
+          if (typeof window.__time === 'number') setTime(window.__time);
+        } else if (audioRef.current && !audioRef.current.paused) {
+          setTime(audioRef.current.currentTime);
+        }
+        raf = requestAnimationFrame(tick);
+      };
+      tick();
+      return () => cancelAnimationFrame(raf);
+    }, [recording]);
+
+    const currentScene = React.useMemo(() => {
+      if (!timeline.scenes) return null;
+      for (let i = 0; i < timeline.scenes.length; i++) {
+        const s = timeline.scenes[i];
+        const next = timeline.scenes[i + 1];
+        if (time >= s.start && (!next || time < next.start)) return s;
+      }
+      return timeline.scenes[0];
+    }, [time, timeline.scenes]);
+
+    const sceneTime = currentScene ? Math.max(0, time - currentScene.start) : 0;
+
+    const allCues = React.useMemo(() => {
+      const map = {};
+      for (const s of timeline.scenes || []) for (const c of s.cues || []) map[c.id] = c;
+      return map;
+    }, [timeline.scenes]);
+
+    const isCueTriggered = React.useCallback((cueId) => { const c = allCues[cueId]; return c ? time >= c.absoluteTime : false; }, [allCues, time]);
+    const cueProgress = React.useCallback((cueId, ramp = 0.5) => { const c = allCues[cueId]; if (!c) return 0; const dt = time - c.absoluteTime; if (dt <= 0) return 0; if (dt >= ramp) return 1; return dt / ramp; }, [allCues, time]);
+
+    const ctx = { time, scene: currentScene, sceneTime, isCueTriggered, cueProgress };
+
+    const handlePlayPause = () => { if (!audioRef.current) return; if (audioRef.current.paused) { audioRef.current.play(); setPlaying(true); } else { audioRef.current.pause(); setPlaying(false); } };
+    const handleSeek = (e) => { if (!audioRef.current) return; const t = parseFloat(e.target.value); audioRef.current.currentTime = t; setTime(t); };
+
+    return (
+      <NarrationContext.Provider value={ctx}>
+        <div style={{ position: 'relative', width, height, background, overflow: 'hidden', color: '#fff', fontFamily: '-apple-system, BlinkMacSystemFont, "PingFang SC", sans-serif' }}>
+          {children}
+        </div>
+        {!recording && <audio ref={audioRef} src={audioSrc} preload="auto" onEnded={() => setPlaying(false)} />}
+        {!recording && controls && (
+          <div style={{ display: 'flex', alignItems: 'center', gap: 12, padding: '12px 16px', background: '#1a1a1a', color: '#ddd', fontFamily: 'monospace', fontSize: 13, width, boxSizing: 'border-box' }}>
+            <button onClick={handlePlayPause} style={{ padding: '6px 14px', background: '#fff', color: '#000', border: 0, borderRadius: 4, cursor: 'pointer', fontWeight: 600 }}>
+              {playing ? '❚❚ Pause' : '▶ Play'}
+            </button>
+            <input type="range" min={0} max={timeline.totalDuration} step={0.01} value={time} onChange={handleSeek} style={{ flex: 1 }} />
+            <span style={{ minWidth: 110, textAlign: 'right' }}>{time.toFixed(2)} / {timeline.totalDuration.toFixed(2)}s</span>
+            <span style={{ padding: '4px 10px', background: '#2a2a2a', borderRadius: 4, minWidth: 100, textAlign: 'center' }}>{currentScene ? currentScene.id : '—'}</span>
+          </div>
+        )}
+      </NarrationContext.Provider>
+    );
+  }
+
+  function Scene({ id, children, keepMounted = false }) {
+    const { scene, sceneTime } = React.useContext(NarrationContext);
+    const isActive = scene && scene.id === id;
+    if (!isActive && !keepMounted) return null;
+    const content = typeof children === 'function' ? children(sceneTime, scene) : children;
+    return <div style={{ position: 'absolute', inset: 0, opacity: isActive ? 1 : 0, pointerEvents: isActive ? 'auto' : 'none', transition: keepMounted ? 'opacity 0.2s' : undefined }}>{content}</div>;
+  }
+
+  function Cue({ id, ramp = 0.5, children }) {
+    const { isCueTriggered, cueProgress } = React.useContext(NarrationContext);
+    return children(isCueTriggered(id), cueProgress(id, ramp));
+  }
+
+  return { NarrationStage, Scene, Cue };
+})();
+const { NarrationStage, Scene, Cue } = NarrationStageLib;
+
+// ── 视觉内容 ─────────────────────────────────────────────
+const App = () => (
+  <NarrationStage timeline={TIMELINE} audioSrc="_narration_token/voiceover.mp3" width={1920} height={1080} background="#0a0a0a">
+    {/* Scene 1: 大问号引入 */}
+    <Scene id="intro">
+      <div className="scene-padding" style={{ alignItems: 'center', justifyContent: 'center' }}>
+        <Cue id="question">{(triggered, p) => (
+          <div style={{ fontSize: 320, color: triggered ? '#ffd54a' : '#3a3a3a', fontWeight: 200, transition: 'color 0.4s', transform: `scale(${0.8 + p * 0.2})`, lineHeight: 1 }}>?</div>
+        )}</Cue>
+        <div style={{ fontSize: 56, color: '#aaa', marginTop: 60, letterSpacing: '0.05em', fontWeight: 300 }}>AI 是怎么理解我们的话的</div>
+      </div>
+    </Scene>
+
+    {/* Scene 2: reveal 关键词 */}
+    <Scene id="token-1">
+      <div className="scene-padding" style={{ alignItems: 'center', justifyContent: 'center' }}>
+        <div style={{ fontSize: 64, color: '#888', marginBottom: 80, fontWeight: 300 }}>它不认识汉字</div>
+        <Cue id="reveal">{(triggered, p) => (
+          <div style={{
+            fontSize: 280, fontWeight: 700, color: '#ffd54a', letterSpacing: '0.05em',
+            opacity: p, transform: `translateY(${(1 - p) * 40}px)`,
+            fontFamily: 'monospace', textShadow: triggered ? '0 0 40px rgba(255, 213, 74, 0.4)' : 'none'
+          }}>
+            token
+          </div>
+        )}</Cue>
+      </div>
+    </Scene>
+
+    {/* Scene 3: 拆字演示 */}
+    <Scene id="token-2">
+      <div className="scene-padding" style={{ alignItems: 'center', justifyContent: 'center' }}>
+        <div style={{ fontSize: 48, color: '#aaa', marginBottom: 100, fontWeight: 300 }}>token = AI 的最小信息单位</div>
+        <Cue id="split">{(triggered, p) => (
+          <div style={{ display: 'flex', gap: triggered ? 80 : 8, transition: 'gap 0.6s cubic-bezier(0.16, 1, 0.3, 1)' }}>
+            <div style={{ fontSize: 200, fontWeight: 600, color: triggered ? '#ffd54a' : '#fff', padding: triggered ? '40px 60px' : '40px 20px', border: triggered ? '4px solid #ffd54a' : '4px solid transparent', borderRadius: 24, transition: 'all 0.6s cubic-bezier(0.16, 1, 0.3, 1)', background: triggered ? 'rgba(255, 213, 74, 0.05)' : 'transparent' }}>
+              人工
+            </div>
+            <div style={{ fontSize: 200, fontWeight: 600, color: triggered ? '#ffd54a' : '#fff', padding: triggered ? '40px 60px' : '40px 20px', border: triggered ? '4px solid #ffd54a' : '4px solid transparent', borderRadius: 24, transition: 'all 0.6s cubic-bezier(0.16, 1, 0.3, 1)', background: triggered ? 'rgba(255, 213, 74, 0.05)' : 'transparent' }}>
+              智能
+            </div>
+          </div>
+        )}</Cue>
+        <div style={{ fontSize: 36, color: '#666', marginTop: 60, opacity: 0.6 }}>「人工智能」= 2 个 token</div>
+      </div>
+    </Scene>
+
+    {/* Scene 4: 总结 */}
+    <Scene id="ending">
+      <div className="scene-padding" style={{ alignItems: 'center', justifyContent: 'center' }}>
+        <Cue id="context">{(triggered, p) => (
+          <>
+            <div style={{ fontSize: 96, fontWeight: 700, letterSpacing: '0.02em', marginBottom: 40, color: '#fff', opacity: triggered ? 1 : 0.3, transition: 'opacity 0.5s' }}>
+              <span style={{ color: '#ffd54a' }}>1,000,000</span> token
+            </div>
+            <div style={{ fontSize: 48, color: '#888', fontWeight: 300, opacity: p }}>
+              ≈ AI 一次能记住的<span style={{ color: '#fff', fontWeight: 500 }}>「小块」数量</span>
+            </div>
+          </>
+        )}</Cue>
+      </div>
+    </Scene>
+
+    {/* 全局水印 */}
+    <div style={{ position: 'absolute', bottom: 24, right: 32, fontSize: 11, color: 'rgba(255,255,255,0.35)', letterSpacing: '0.15em', fontFamily: 'monospace', pointerEvents: 'none', zIndex: 100 }}>
+      Created by Huashu-Design
+    </div>
+  </NarrationStage>
+);
+
+ReactDOM.createRoot(document.getElementById('root')).render(<App />);
+</script>
+</body>
+</html>

+ 397 - 0
references/voiceover-pipeline.md

@@ -0,0 +1,397 @@
+# Voiceover Pipeline · 解说驱动动画
+
+> 把动画从「无声画面 + 后期配音」升级为「**先有解说词,再按音频实测时长驱动画面**」的工作流。
+> 适用:5-20 分钟概念解说视频、教程视频、长篇知识科普。
+>
+> 配套 `references/animation-best-practices.md` 使用——本文件管 **怎么把解说和画面对上**,
+> animation-best-practices 管 **每一帧画面怎么动**。
+
+---
+
+## 🛑 铁律 · 在写一行代码之前必读
+
+> **强调多少遍都不够:解说动画的失败模式 #1 是做成了带配音的 PowerPoint。**
+
+### 第一条 · 整片是一个连续的运动叙事,不是一组独立场景
+
+PowerPoint 是 7 张幻灯片。我们做的是 **1 段持续 X 分钟的电影**。
+
+**身份切换**:
+- ❌ 你不是「在做 7 个 scene 的内容」
+- ✅ 你是「在屏幕上让一个或几个 hero element 演 X 分钟的戏」
+
+**视觉骨架 = 一个或几个贯穿全片的 hero element**:
+- 它从 t=0 出现,到结束才离场
+- 每个 cue 是它的**状态变化**(位置 / 大小 / 颜色 / 透视 / 形态),不是「换一个新元素」
+- scene 边界在剧本里有,**在画面里不应该有**——观众看不出"这是第 3 个 scene",只看到一段连续的运动
+
+**反例(本 skill v1 实战踩坑 · 2026-05-10)**:
+- 7 个 `<Scene>` 各自独立 layout,scene 切换 = 整页 opacity 1→0 切到下一页
+- 每个 cue = `opacity: p, transform: translateY((1-p)*30px)`(fade-up 单调使用)
+- 结果:观众看完第一反应「像一页页 keynote」,整片质感归零
+
+**正确模式**:
+- 选定 1-2 个 hero element(如本文章 demo 应选「md」「html」两个字符作为骨架)
+- 这两个字符**从片头到片尾**一直在屏幕上
+- 每段「scene」实际是 hero element 的一次状态变化
+  - opening:两字符在屏幕中央对峙
+  - md-side:md 变大变粗占据画面,html 退到角落小字;数据围绕 md 涌入
+  - html-side:html 反转为主角;md 退到角落
+  - the-real-question:两字符回到中央,但中间出现「≠」分隔
+  - the-split:两字符向两侧推开,中间空白展开
+  - activity-proof:两字符在 timeline 上交替闪烁
+  - closing:两字符落地为最终答案位置
+- 这样整片是「md 和 html 在屏幕上演了 X 分钟」,不是 7 张独立 PPT
+
+**最小实现骨架**(直接抄改):
+
+```jsx
+// ── Step 1: 定义 hero 在每个 scene 的目标状态(位置/大小/不透明度)──
+const HERO_KEYS = {
+  opening:    { md: { x: 50, y: 35, scale: 1.0, opacity: 1 }, html: { x: 50, y: 65, scale: 1.0, opacity: 1 } },
+  'md-side':  { md: { x: 78, y: 50, scale: 1.6, opacity: 1 }, html: { x: 92, y: 8,  scale: 0.25, opacity: 0.4 } },
+  'html-side':{ md: { x: 8,  y: 8,  scale: 0.25, opacity: 0.4 }, html: { x: 22, y: 50, scale: 1.6, opacity: 1 } },
+  // ... 每段一个 entry,连贯的运动从前一段的 final → 本段的 from
+};
+
+// ── Step 2: easing + lerp 工具 ──
+const expoOut = t => t === 1 ? 1 : 1 - Math.pow(2, -10 * t);
+const lerp = (a, b, t) => a + (b - a) * t;
+const lerpPos = (from, to, t) => ({
+  x: lerp(from.x, to.x, t), y: lerp(from.y, to.y, t),
+  scale: lerp(from.scale, to.scale, t),
+  opacity: lerp(from.opacity ?? 1, to.opacity ?? 1, t),
+});
+
+// ── Step 3: HeroAnchor 组件 —— 直接挂在 <NarrationStage> 子级,不放进 <Scene> ──
+const HeroAnchor = () => {
+  const { time, scene, timeline } = useNarration();
+  if (!scene) return null;
+  const idx = timeline.scenes.findIndex(s => s.id === scene.id);
+  const prevId = idx > 0 ? timeline.scenes[idx - 1].id : scene.id;
+  const from = HERO_KEYS[prevId];
+  const to   = HERO_KEYS[scene.id];
+
+  // 段内前 ~45% 时间用于从 prev 状态 morph 到本段状态,剩余 hold
+  const transitionDur = Math.min(2.0, scene.duration * 0.45);
+  const t = expoOut(Math.min(1, (time - scene.start) / transitionDur));
+  const md   = lerpPos(from.md,   to.md,   t);
+  const html = lerpPos(from.html, to.html, t);
+
+  // 加 subtle breathing 让任意一帧都有运动(对应铁律第三条)
+  const breath = 1 + Math.sin(time * 0.6) * 0.012;
+
+  const renderHero = (label, pos, color) => (
+    <div style={{
+      position: 'absolute', left: `${pos.x}%`, top: `${pos.y}%`,
+      transform: `translate(-50%, -50%) scale(${pos.scale * breath})`,
+      opacity: pos.opacity, color, fontSize: 360, fontWeight: 800,
+      lineHeight: 1, willChange: 'transform, opacity', pointerEvents: 'none',
+    }}>{label}</div>
+  );
+  return <>
+    {renderHero('md',   md,   '#1B4965')}
+    {renderHero('html', html, '#C04A1A')}
+  </>;
+};
+
+// ── Step 4: 主组件 —— hero 在 NarrationStage 子级,scene 内辅助元素另外管 ──
+const App = () => (
+  <NarrationStage timeline={TIMELINE} audioSrc="_narration/voiceover.mp3" width={1920} height={1080}>
+    <HeroAnchor />  {/* ← 跨 scene 持续存在,整片视觉骨架 */}
+    {/* scene 内辅助元素用 useSceneFade 控制软淡入淡出,不要硬切 */}
+    <MdSideAux />
+    <HtmlSideAux />
+    {/* ... */}
+  </NarrationStage>
+);
+```
+
+**完整可运行参考**:`demos/md-html-narration/md-html-demo.html`(3 分 21 秒,7 段,21 cue,已实战验证)
+
+### 第二条 · 场景之间不能「硬切」
+
+| 错误模式(PowerPoint slop) | 正确模式(电影感) |
+|---|---|
+| scene A 整体 `opacity 1→0` 同时 scene B `opacity 0→1` | scene A 的核心元素 **morph 进** B(位置/大小/颜色平滑变换) |
+| 每个 scene 独立 layout,元素出现/消失 | 元素在屏幕上**持续存在**,只是位置和形态在变 |
+| `keepMounted=false`,scene 切换瞬间组件被卸载 | hero 用 `keepMounted=true`,跨 scene 共享 DOM 节点 |
+| 字幕条/数据卡片各自 fade in fade out | 字幕条作为画面唯一的"非 hero" 入场,hold 后**配合 hero 的运动一起退出** |
+
+实现层面:
+- **共享元素跨 scene** → 把 hero 提到 `<NarrationStage>` 直接子级,**不放在任何 `<Scene>` 里**
+- 用 `useNarration()` hook 在 hero 里读 `time`、`scene`、`isCueTriggered`,自己根据当前时间决定形态
+- `<Scene>` 只用来管那些只在该段出现的辅助元素(数据卡、引用块等),并且**这些辅助元素也不要硬切**——出场用 expoOut + stagger,退场用 fade overlap 跟下一段叠
+
+### 第三条 · 每一帧画面都必须有运动
+
+**自检方法**:在录制中**任意截一帧**(不是 cue 触发那一秒)。
+- 如果画面看起来「**完全静止**」→ 错。回去加底层运动(background drift / hero subtle scale / camera pan / parallax)
+- 永远有一个**底层运动**在跑(即使不是焦点):
+  - hero element 的 `scale: 1 ↔ 1.02` 5 秒呼吸循环
+  - 背景 `translateX: 0 ↔ -20px` 缓慢漂移
+  - 数据卡片入场后保留 `translateY` 微抖(Perlin noise)
+- 一个完全静止的画面 = PowerPoint slop
+
+### 第四条 · Easing / Stagger / Hold 是底线
+
+| 项 | 必须 | 禁止 |
+|---|---|---|
+| Easing | `expoOut` 主轴(`cubic-bezier(0.16, 1, 0.3, 1)`),`overshoot` 强调,`spring` 落位 | `linear`、`ease`、CSS 默认 |
+| 多元素入场 | 30ms stagger(每个晚 30ms 进) | 一刀切全部出现 |
+| 关键 cue 前 | hold 0.3-0.5s 让观众"看见"(前一段元素先静止 0.3s,再触发 cue) | 一段说完无缝切下一段 |
+| 收尾 | 戛然而止,最后一帧 hold 1s | fade to black |
+
+详细规则参考 `animation-best-practices.md` 的 §1-§4。
+
+### 自检 · 第一观众反应
+
+做完拿给一个没看过的人看(或自己 24 小时后再看),**他们的第一反应**是什么?
+
+| 反应 | 评级 | 行动 |
+|---|---|---|
+| 「这是带配音的 PPT」 | 失败 | 回去重做 |
+| 「画面跟着声音在切换」 | 不及格 | 缺连续叙事,hero element 不存在或没贯穿 |
+| 「这个东西在动」 | 合格 | 但没记忆点 |
+| 「我想看完」 | 良 | 节奏对了 |
+| 「这一段我想截图」 | great | 你做到了 |
+
+---
+
+## 工作流(高层)
+
+```
+                ┌──────────────────────────┐
+                │  解说稿 .md(## scene + │
+                │  [[cue:xx]] 标关键句)   │
+                └──────────────┬───────────┘
+                               │
+                  narrate-pipeline.mjs
+                               │
+                               ▼
+            ┌──────────────────────────────┐
+            │ voiceover.mp3 (拼接的整段)  │
+            │ timeline.json (实测时长)    │
+            └──────────────┬───────────────┘
+                           │
+              ┌────────────┴────────────┐
+              ▼                         ▼
+    ┌─────────────────┐      ┌──────────────────┐
+    │ HTML 动画       │      │ 录制 MP4 + 混音  │
+    │ (NarrationStage)│      │ render-narration │
+    │ 实播带 audio 同步│      │ → 最终发布 MP4   │
+    └─────────────────┘      └──────────────────┘
+       交付形态 1                交付形态 2
+```
+
+## 解说稿格式
+
+放在项目目录下任意位置,文件名建议 `script.md`:
+
+```markdown
+---
+title: 什么是 LLM
+voice: S_JSdgdWk22   # 可选,覆盖 .env 默认音色
+speed: 1.0           # 可选,0.5-2.0
+gap: 0.4             # 段间静音秒数,默认 0.3
+---
+
+## intro
+大家好,今天我们 5 分钟讲清楚 LLM 是什么。
+
+## what-is
+LLM 全称 Large Language Model,[[cue:bigmodel]]它是一个有几千亿参数的神经网络。
+本质是一个文字接龙的预测器。
+
+## demo
+比如你输入「今天天气」,[[cue:input]]模型会预测下一个字最可能是什么。
+[[cue:predict]]也许是「真好」,也许是「不错」。
+```
+
+**规则**:
+- 段标题 `## scene-id` 是英文/数字 + 连字符(如 `## what-is`、`## scene-1`)
+- `[[cue:xx]]` 标在**关键句中间**——脚本运行时会在该位置切割文本,cue 之后那一刻就是画面的触发点
+- cue id 在动画 HTML 里用 `<Cue id="xx">` 监听
+- 写解说时**关注节奏 + 短句**,长句 TTS 出来会平淡
+
+## timeline.json schema
+
+```ts
+{
+  title: string,
+  voice: string | null,
+  speed: number,
+  gap: number,
+  totalDuration: number,        // 整段 voiceover.mp3 的实测秒数
+  voiceover: 'voiceover.mp3',   // 相对 timeline.json 的路径
+  scenes: [
+    {
+      id: string,
+      start: number,            // 该段在整段音频里的开始时间
+      end: number,
+      duration: number,
+      audio: 'audio/<id>.mp3',  // 该段单独音频(合并前的子段已 concat)
+      text: string,             // 已剥离 [[cue:xx]] 标记的整段文本
+      // chunks 是字幕显示的源——每个 chunk 是被 cue 切开的子段,含 TTS 实测时间窗
+      chunks: [
+        {
+          text: string,            // 子段文本
+          start: number,           // 段内相对时间
+          end: number,
+          absoluteStart: number,   // 整轨绝对时间(对齐 voiceover.mp3)
+          absoluteEnd: number,
+        }
+      ],
+      cues: [
+        {
+          id: string,
+          offset: number,       // 段内相对时间
+          absoluteTime: number, // 整段时间轴上的绝对时间
+        }
+      ]
+    }
+  ]
+}
+```
+
+`absoluteTime` 和 `absoluteStart/End` 都是**真实测出来的**——pipeline 把段内文本按 cue 切成子段分别 TTS,时间 = 累加前面子段的实测时长。**不是按字符数线性估算的近似值**。
+
+## 字幕(Subtitles)
+
+> **字幕是默认带的**——长解说视频没字幕,留存率会显著下降。NarrationStage 提供 `<Subtitles />` 开箱即用。
+
+### 用法(一行)
+
+```jsx
+const { NarrationStage, Subtitles } = NarrationStageLib;
+<NarrationStage timeline={TIMELINE} audioSrc="...">
+  {/* 你的 hero / scene 内容 */}
+  <Subtitles />  {/* ← 自动从 timeline.scenes[].chunks 取活动文本 */}
+</NarrationStage>
+```
+
+### 视觉规则(B 站风 · 反 PowerPoint)
+
+| 项 | 规则 | 反例 |
+|---|---|---|
+| 背景 | **无背景**(不要黑色横条不要 backdrop-blur)| 半透明黑底 + blur = 字幕条压住画面 = PPT 感 |
+| 字色 | **浅底用深墨 `#1a1a1a` + 白光晕**;深底用白字 + 黑光晕 | 浅底白字+黑描边 = 字糊 |
+| 字号 | 32px(1080p 视频)| <24px 看不清,>40px 抢主视觉 |
+| 字体 | `PingFang SC` / `Noto Sans SC`(无衬线,B 站标准)| 衬线字体 = 像电影字幕 |
+| 位置 | bottom: 90px(不贴边)| 贴底边显得廉价 |
+| 单行长度 | **≤ 12-13 字**(中英混合时英文按 0.5 字算)| >15 字一行手机端读不完 |
+| 切句规则 | **绝不跨句号截断**:先按 `。!?` 切句,每句再按 `,、;:` 合并到 ≤maxLen | 按字数硬切,把「这是好的」切成「这是好」+「的」 |
+
+`<Subtitles />` 默认按以上规则跑,不需要传 props。深底场景:`<Subtitles color="#fff" haloColor="rgba(0,0,0,0.85)" />`。
+
+### 切句算法(已在 narration_stage.jsx 内置)
+
+```js
+splitChunkToLines(text, maxLen = 13)
+// 1. 强标点切句(。!?\n)
+// 2. 每句 ≤ maxLen 直接保留
+// 3. 否则按弱标点(,、;:)切片,合并到 ≤ maxLen
+// 4. 兜底硬切(罕见)
+// 中英混合:英文/数字按 0.5 字算视觉宽度
+```
+
+如果 chunk 切完后某行明显太长或太短,**改解说稿里 cue 位置**(cue 把段切得更细),不要在前端调切句逻辑。
+
+## NarrationStage API
+
+```jsx
+import 'assets/narration_stage.jsx';
+const { NarrationStage, Scene, Cue, useNarration } = NarrationStageLib;
+
+<NarrationStage
+  timeline={TIMELINE}                  // timeline.json 内容
+  audioSrc="_narration/voiceover.mp3"  // 相对当前 HTML 的路径
+  width={1920} height={1080}
+  background="#f5f1e8"
+  controls={true}                      // 实播时显示底部播放条
+>
+  {/* hero element:跨 scene 持续存在 —— 直接放在 NarrationStage 子级 */}
+  <HeroAnchor />
+
+  {/* scene 内辅助元素:只在该段出现 */}
+  <Scene id="intro">
+    <Cue id="bigmodel">{(triggered, progress) => (
+      <SomeElement style={{ opacity: progress }} />
+    )}</Cue>
+  </Scene>
+</NarrationStage>
+```
+
+**Hooks**:
+- `useNarration()` 返回 `{ time, scene, sceneTime, isCueTriggered, cueProgress }`
+- 在自定义组件里直接读,不需要传 props
+
+**Scene 组件**:
+- 默认只在 `scene.id === id` 时挂载
+- 加 `keepMounted` 持续挂载(跨 scene 动画连续时用)
+
+**Cue 组件**:
+- children 必须是 `(triggered, progress) => ReactNode`
+- progress 是 cue 触发后 0→1 的渐进值(默认 0.6s ramp)
+
+## 时间源(双轨)
+
+NarrationStage 自动检测 `window.__recording`:
+- **实播模式**(默认):跟随 audio 元素的 currentTime,用户暂停/拖动 seek 都能同步
+- **录视频模式**(render-video.js 设置 `window.__recording = true`):rAF wall-clock 自驱动从 0 开始,暴露 `window.__seek(t)` 给 render-video.js 复位
+
+## 三个脚本
+
+| 脚本 | 输入 | 输出 |
+|---|---|---|
+| `scripts/tts-doubao.mjs` | 单段文本 | 单个 mp3 + 实测时长 |
+| `scripts/narrate-pipeline.mjs` | 解说稿 .md | voiceover.mp3 + timeline.json |
+| `scripts/mix-voiceover.sh` | 视频 + voiceover.mp3 [+ BGM] | 带音频的 MP4 |
+| `scripts/render-narration.sh` | 解说 HTML + timeline.json | 最终 MP4(录制 + 混音一条龙)|
+
+## .env 配置
+
+skill 根目录下 `.env`(已 gitignore):
+
+```
+DOUBAO_TTS_API_KEY=<your_key>
+DOUBAO_TTS_VOICE_ID=<your_clone_voice_id>
+DOUBAO_TTS_CLUSTER=volcano_icl
+DOUBAO_TTS_ENDPOINT=https://openspeech.bytedance.com/api/v1/tts
+```
+
+参考 `.env.example` 模板。豆包语音克隆音色 ID 在火山引擎控制台获取。
+
+## 标准工作流(10 步)
+
+1. **写解说稿**:解说稿是源代码。先把整段口播写完整,标段标题 `## scene-id`,关键句前加 `[[cue:xx]]`
+2. **跑 narrate-pipeline**:`node scripts/narrate-pipeline.mjs --script script.md --out-dir _narration`
+3. **听整段 voiceover.mp3**:节奏不对回去改稿。**这一步决定整片质量上限**
+4. **🛑 设计前先回答铁律**:hero element 是什么?它在每段是什么状态?跨场景怎么 morph?答不上不要写代码
+5. **写动画 HTML**:用 NarrationStage + 一个或几个 hero element 跨 scene 演戏
+6. **实播预览**:浏览器打开 HTML,点 ▶ Play,听画面+解说同步
+7. **第一观众自检**:用上面「自检 · 第一观众反应」表打分。失败回到 Step 4 重做
+8. **录视频**:`bash scripts/render-narration.sh demo.html --timeline=_narration/timeline.json`(自动录无声 MP4 + 混入 voiceover)
+9. **可选 BGM**:在 render-narration 加 `--bgm-mood=educational`(或 tech / tutorial 等)
+10. **交付**:浏览器 HTML(实时演示用)+ 最终 MP4(发布用)
+
+## 异常处理
+
+| 问题 | 解决 |
+|---|---|
+| TTS API 报错 | 检查 .env 里 `DOUBAO_TTS_API_KEY` 是否正确 |
+| 某段音频明显比脚本长/短 | 该段文本里有奇怪标点或 emoji,TTS 解析异常 → 改稿 |
+| cue absoluteTime 不准 | 段内子段拼接时 ffmpeg 有问题 → 检查 mp3 编码一致性 |
+| 录视频结果有黑屏 | render-video.js 没拿到 `window.__ready` 信号 → 检查 NarrationStage 是否正常挂载 |
+| 录视频画面卡顿 | 动画里有重 layout(大量 box-shadow / blur)→ 简化或预合成 |
+| 实播音画不同步 | audio 元素加载延迟 → 加 `preload="auto"` 或本地预加载 |
+
+## 何时不用这套 pipeline
+
+- **<60s 短动画**:直接做无声动画 + 后期配音(add-music.sh + 一段单独 TTS)即可,不需要 timeline 驱动
+- **纯 BGM 视频**:用 `add-music.sh` 加预设 BGM
+- **真人录音替换 TTS**:把 `voiceover.mp3` 替换成真人录音,timeline 自己手写或用 ffprobe 测段时长 + 工具脚本生成 → 流程其余部分通用
+
+---
+
+**最后一次提醒**:写代码前回到铁律。**别做带配音的 PowerPoint**。

+ 127 - 0
scripts/mix-voiceover.sh

@@ -0,0 +1,127 @@
+#!/usr/bin/env bash
+# mix-voiceover.sh · Mix voiceover (人声主轨) + optional BGM into an MP4
+#
+# Usage:
+#   bash mix-voiceover.sh <video.mp4> --voiceover=<voice.mp3> [options]
+#
+# Required:
+#   --voiceover=<path>    Path to voiceover mp3 (人声主轨, 来自 narrate-pipeline.mjs)
+#
+# Optional:
+#   --bgm=<path>          BGM mp3 path (overrides --bgm-mood)
+#   --bgm-mood=<name>     Pick a preset BGM from assets/ (educational / tech / tutorial / ...)
+#   --bgm-volume=<0-1>    BGM 静态音量, 默认 0.18 (相对人声)
+#   --no-ducking          关闭 sidechain ducking(默认开启:人声响时 BGM 自动让路)
+#   --voice-volume=<0-2>  人声音量倍率, 默认 1.0
+#   --out=<path>          输出路径, 默认 <input>-voiced.mp4
+#
+# Behavior:
+#   - 视频流 stream copy(不重编码,快)
+#   - 人声始终是主轨,必带;BGM 可选
+#   - 默认开 ducking:人声响时 BGM 压到约 -10dB,人声停时回升
+#   - 输出长度 = 视频长度(人声/BGM 较短就尾静音;较长就截断)
+#
+# Examples:
+#   bash mix-voiceover.sh anim.mp4 --voiceover=narration/voiceover.mp3
+#   bash mix-voiceover.sh anim.mp4 --voiceover=v.mp3 --bgm-mood=educational
+#   bash mix-voiceover.sh anim.mp4 --voiceover=v.mp3 --bgm=~/Music/song.mp3 --bgm-volume=0.12
+#   bash mix-voiceover.sh anim.mp4 --voiceover=v.mp3 --bgm-mood=tech --no-ducking
+#
+set -e
+
+SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
+ASSETS_DIR="$SCRIPT_DIR/../assets"
+
+INPUT=""
+VOICEOVER=""
+BGM=""
+BGM_MOOD=""
+BGM_VOLUME="0.18"
+VOICE_VOLUME="1.0"
+DUCKING="1"
+OUTPUT=""
+
+for arg in "$@"; do
+  case "$arg" in
+    --voiceover=*)    VOICEOVER="${arg#*=}" ;;
+    --bgm=*)          BGM="${arg#*=}" ;;
+    --bgm-mood=*)     BGM_MOOD="${arg#*=}" ;;
+    --bgm-volume=*)   BGM_VOLUME="${arg#*=}" ;;
+    --voice-volume=*) VOICE_VOLUME="${arg#*=}" ;;
+    --no-ducking)     DUCKING="0" ;;
+    --out=*)          OUTPUT="${arg#*=}" ;;
+    -*)               echo "未知参数:$arg" >&2; exit 1 ;;
+    *)                INPUT="$arg" ;;
+  esac
+done
+
+if [ -z "$INPUT" ] || [ ! -f "$INPUT" ]; then
+  echo "Usage: bash mix-voiceover.sh <video.mp4> --voiceover=<v.mp3> [--bgm=<b.mp3> | --bgm-mood=<name>]" >&2
+  exit 1
+fi
+if [ -z "$VOICEOVER" ] || [ ! -f "$VOICEOVER" ]; then
+  echo "✗ 缺 --voiceover=<path>" >&2
+  exit 1
+fi
+
+# 解析 BGM 来源
+if [ -z "$BGM" ] && [ -n "$BGM_MOOD" ]; then
+  BGM="$ASSETS_DIR/bgm-${BGM_MOOD}.mp3"
+fi
+if [ -n "$BGM" ] && [ ! -f "$BGM" ]; then
+  echo "✗ BGM 文件不存在: $BGM" >&2
+  echo "  可用 mood: $(ls "$ASSETS_DIR" 2>/dev/null | grep -E '^bgm-.*\.mp3$' | sed 's/^bgm-//;s/\.mp3$//' | tr '\n' ' ')" >&2
+  exit 1
+fi
+
+# 输出路径
+if [ -z "$OUTPUT" ]; then
+  base="${INPUT%.*}"
+  OUTPUT="${base}-voiced.mp4"
+fi
+
+echo "─ mix-voiceover ──────────────"
+echo "  视频:     $INPUT"
+echo "  人声:     $VOICEOVER (vol=$VOICE_VOLUME)"
+if [ -n "$BGM" ]; then
+  echo "  BGM:      $BGM (vol=$BGM_VOLUME, ducking=$DUCKING)"
+else
+  echo "  BGM:      (无)"
+fi
+echo "  输出:     $OUTPUT"
+echo "──────────────────────────────"
+
+# ── ffmpeg filter graph ─────────────────────────────────────
+if [ -z "$BGM" ]; then
+  # 仅人声
+  ffmpeg -y -i "$INPUT" -i "$VOICEOVER" \
+    -filter_complex "[1:a]volume=${VOICE_VOLUME}[a]" \
+    -map 0:v -map "[a]" \
+    -c:v copy -c:a aac -b:a 192k -shortest \
+    "$OUTPUT"
+elif [ "$DUCKING" = "1" ]; then
+  # 人声 + BGM + sidechain ducking
+  ffmpeg -y -i "$INPUT" -i "$VOICEOVER" -i "$BGM" \
+    -filter_complex "
+      [1:a]volume=${VOICE_VOLUME}[voice];
+      [2:a]volume=${BGM_VOLUME},aloop=loop=-1:size=2e9[bgm_lo];
+      [bgm_lo][voice]sidechaincompress=threshold=0.04:ratio=8:attack=5:release=300:makeup=1[bgm_ducked];
+      [voice][bgm_ducked]amix=inputs=2:duration=first:dropout_transition=0,afade=t=out:st=0:d=0.5:curve=tri[a]
+    " \
+    -map 0:v -map "[a]" \
+    -c:v copy -c:a aac -b:a 192k -shortest \
+    "$OUTPUT"
+else
+  # 人声 + BGM 静态混合
+  ffmpeg -y -i "$INPUT" -i "$VOICEOVER" -i "$BGM" \
+    -filter_complex "
+      [1:a]volume=${VOICE_VOLUME}[voice];
+      [2:a]volume=${BGM_VOLUME},aloop=loop=-1:size=2e9[bgm];
+      [voice][bgm]amix=inputs=2:duration=first:dropout_transition=0[a]
+    " \
+    -map 0:v -map "[a]" \
+    -c:v copy -c:a aac -b:a 192k -shortest \
+    "$OUTPUT"
+fi
+
+echo "✓ 完成:$OUTPUT"

+ 315 - 0
scripts/narrate-pipeline.mjs

@@ -0,0 +1,315 @@
+#!/usr/bin/env node
+/**
+ * narrate-pipeline.mjs · L2 长解说总指挥
+ *
+ * 输入:markdown 解说稿(## scene-id 分段,[[cue:id]] 标关键句)
+ * 输出:voiceover.mp3(拼接好的整段人声)+ timeline.json(每段 start/end + cues 绝对时间)
+ *
+ * 用法:
+ *   node scripts/narrate-pipeline.mjs --script demo.md --out-dir _narration_demo
+ *
+ * 解说稿格式:
+ *   ---
+ *   title: 什么是 LLM
+ *   voice: S_JSdgdWk22   # 可选,不填走 .env
+ *   speed: 1.0           # 可选
+ *   gap: 0.3             # 段间静音秒数,默认 0.3
+ *   ---
+ *
+ *   ## intro
+ *   大家好,我是花叔。今天我们 5 分钟讲清楚 LLM 是什么。
+ *
+ *   ## what-is
+ *   LLM 全称 Large Language Model,[[cue:bigmodel]]它是一个有几千亿参数的神经网络。
+ *   本质是一个文字接龙的预测器。
+ *
+ * 输出文件结构(out-dir 下):
+ *   audio/
+ *     intro.mp3
+ *     what-is.mp3
+ *   voiceover.mp3       拼接全部 scene 的整段人声
+ *   timeline.json       schema 见 references/voiceover-pipeline.md
+ *
+ * 依赖:tts-doubao.mjs、ffmpeg、ffprobe
+ */
+
+import fs from 'node:fs';
+import path from 'node:path';
+import { execFileSync, execSync } from 'node:child_process';
+import { fileURLToPath } from 'node:url';
+
+const __dirname = path.dirname(fileURLToPath(import.meta.url));
+const SKILL_ROOT = path.resolve(__dirname, '..');
+const TTS_SCRIPT = path.join(__dirname, 'tts-doubao.mjs');
+
+function parseArgs(argv) {
+  const args = {};
+  for (let i = 2; i < argv.length; i++) {
+    const a = argv[i];
+    if (a === '--script') args.script = argv[++i];
+    else if (a === '--out-dir') args.outDir = argv[++i];
+    else if (a === '--help' || a === '-h') args.help = true;
+  }
+  return args;
+}
+
+function usage() {
+  console.error(`
+narrate-pipeline.mjs · L2 长解说总指挥
+
+  --script <path>     解说稿 .md 文件(必填)
+  --out-dir <path>    输出目录(必填)
+
+输出:<out-dir>/voiceover.mp3 + <out-dir>/timeline.json
+`.trim());
+  process.exit(1);
+}
+
+/**
+ * Parse frontmatter + scene blocks from markdown
+ * Returns { meta, scenes: [{ id, raw }] }
+ */
+function parseScript(md) {
+  const meta = {};
+  let body = md;
+  const fmMatch = md.match(/^---\n([\s\S]*?)\n---\n/);
+  if (fmMatch) {
+    for (const line of fmMatch[1].split('\n')) {
+      const idx = line.indexOf(':');
+      if (idx < 0) continue;
+      const key = line.slice(0, idx).trim();
+      const val = line.slice(idx + 1).trim();
+      meta[key] = val;
+    }
+    body = md.slice(fmMatch[0].length);
+  }
+  const scenes = [];
+  const re = /^##\s+([\w-]+)\s*\n([\s\S]*?)(?=^##\s+[\w-]+\s*\n|$(?![\r\n]))/gm;
+  let m;
+  while ((m = re.exec(body)) !== null) {
+    scenes.push({ id: m[1], raw: m[2].trim() });
+  }
+  return { meta, scenes };
+}
+
+/**
+ * Split a scene's text by [[cue:id]] markers into chunks.
+ * Returns: { chunks: [{ text, cueAfter? }] }
+ *   cueAfter is the cue id that follows this chunk (chunk's end = cue position)
+ *
+ * Example: "A[[cue:x]]B[[cue:y]]C" =>
+ *   chunks: [
+ *     { text: "A", cueAfter: "x" },
+ *     { text: "B", cueAfter: "y" },
+ *     { text: "C" }
+ *   ]
+ */
+function splitByCues(text) {
+  const chunks = [];
+  const re = /\[\[cue:([\w-]+)\]\]/g;
+  let lastIdx = 0;
+  let m;
+  while ((m = re.exec(text)) !== null) {
+    const before = text.slice(lastIdx, m.index).trim();
+    chunks.push({ text: before, cueAfter: m[1] });
+    lastIdx = m.index + m[0].length;
+  }
+  const tail = text.slice(lastIdx).trim();
+  chunks.push({ text: tail });
+  // 过滤空文本块(cue 紧贴段首/段尾时)
+  return chunks.filter((c) => c.text.length > 0 || c.cueAfter);
+}
+
+function getDuration(filePath) {
+  const out = execFileSync('ffprobe', [
+    '-v', 'error',
+    '-show_entries', 'format=duration',
+    '-of', 'default=noprint_wrappers=1:nokey=1',
+    filePath,
+  ], { encoding: 'utf8' });
+  return parseFloat(out.trim());
+}
+
+function callTTS(text, outPath, opts) {
+  const args = ['--text', text, '--out', outPath];
+  if (opts.voice) args.push('--voice', opts.voice);
+  if (opts.speed) args.push('--speed', String(opts.speed));
+  const out = execFileSync('node', [TTS_SCRIPT, ...args], {
+    encoding: 'utf8',
+    stdio: ['ignore', 'pipe', 'inherit'],
+  });
+  return JSON.parse(out.trim());
+}
+
+function ffmpegConcat(inputs, output) {
+  // 用 concat demuxer 合并相同编码的 mp3
+  const listFile = output + '.list';
+  fs.writeFileSync(
+    listFile,
+    inputs.map((p) => `file '${p.replace(/'/g, "'\\''")}'`).join('\n'),
+  );
+  execSync(
+    `ffmpeg -y -f concat -safe 0 -i "${listFile}" -c copy "${output}"`,
+    { stdio: ['ignore', 'pipe', 'pipe'] },
+  );
+  fs.unlinkSync(listFile);
+}
+
+function makeSilence(duration, outPath) {
+  execSync(
+    `ffmpeg -y -f lavfi -i anullsrc=r=24000:cl=mono -t ${duration} -q:a 9 -acodec libmp3lame "${outPath}"`,
+    { stdio: ['ignore', 'pipe', 'pipe'] },
+  );
+}
+
+async function main() {
+  const args = parseArgs(process.argv);
+  if (args.help || !args.script || !args.outDir) usage();
+
+  const scriptPath = path.resolve(args.script);
+  const outDir = path.resolve(args.outDir);
+  const audioDir = path.join(outDir, 'audio');
+  const tmpDir = path.join(outDir, '.tmp');
+  fs.mkdirSync(audioDir, { recursive: true });
+  fs.mkdirSync(tmpDir, { recursive: true });
+
+  const md = fs.readFileSync(scriptPath, 'utf8');
+  const { meta, scenes } = parseScript(md);
+  if (scenes.length === 0) {
+    console.error('错:解说稿没有 ## scene 段,至少一段。');
+    process.exit(1);
+  }
+
+  const voice = meta.voice || undefined;
+  const speed = meta.speed ? parseFloat(meta.speed) : 1.0;
+  const gap = meta.gap ? parseFloat(meta.gap) : 0.3;
+
+  console.error(`[narrate] script=${path.basename(scriptPath)} scenes=${scenes.length} voice=${voice || '(env)'} speed=${speed} gap=${gap}s`);
+
+  // 段间静音文件(共用一个)
+  const gapFile = path.join(tmpDir, 'gap.mp3');
+  if (gap > 0) makeSilence(gap, gapFile);
+
+  const timeline = {
+    title: meta.title || path.basename(scriptPath, '.md'),
+    voice: voice || null,
+    speed,
+    gap,
+    totalDuration: 0,
+    scenes: [],
+  };
+
+  let cursor = 0;
+  const sceneAudioFiles = [];
+
+  for (let i = 0; i < scenes.length; i++) {
+    const scene = scenes[i];
+    console.error(`[narrate] (${i + 1}/${scenes.length}) scene="${scene.id}"`);
+
+    const chunks = splitByCues(scene.raw);
+    const chunkFiles = [];
+    const cueRecords = [];
+    const chunkRecords = []; // 每个 chunk 的实测 start/end 段内时间,用于字幕显示
+    let sceneInternalCursor = 0;
+
+    for (let j = 0; j < chunks.length; j++) {
+      const chunk = chunks[j];
+      if (!chunk.text) {
+        // 空文本块(cue 紧贴),跳过 TTS 但仍记录 cue 位置
+        if (chunk.cueAfter) {
+          cueRecords.push({
+            id: chunk.cueAfter,
+            offset: sceneInternalCursor,
+          });
+        }
+        continue;
+      }
+      const chunkPath = path.join(tmpDir, `${scene.id}-${j}.mp3`);
+      const result = callTTS(chunk.text, chunkPath, { voice, speed });
+      const chunkStart = sceneInternalCursor;
+      chunkFiles.push(chunkPath);
+      sceneInternalCursor += result.duration;
+      chunkRecords.push({
+        text: chunk.text,
+        start: chunkStart,
+        end: sceneInternalCursor,
+        duration: result.duration,
+      });
+      console.error(`  chunk ${j}: ${result.duration.toFixed(2)}s · ${chunk.text.length} 字 · ${chunk.text.slice(0, 30)}${chunk.text.length > 30 ? '…' : ''}`);
+      if (chunk.cueAfter) {
+        cueRecords.push({
+          id: chunk.cueAfter,
+          offset: sceneInternalCursor,
+        });
+      }
+    }
+
+    // 合并段内子段
+    const sceneAudio = path.join(audioDir, `${scene.id}.mp3`);
+    if (chunkFiles.length === 1) {
+      fs.copyFileSync(chunkFiles[0], sceneAudio);
+    } else {
+      ffmpegConcat(chunkFiles, sceneAudio);
+    }
+    const sceneDuration = getDuration(sceneAudio);
+
+    // 拼接到总轨:先加 gap(除了第一段),再加 scene
+    if (i > 0 && gap > 0) {
+      sceneAudioFiles.push(gapFile);
+      cursor += gap;
+    }
+    sceneAudioFiles.push(sceneAudio);
+
+    timeline.scenes.push({
+      id: scene.id,
+      start: cursor,
+      end: cursor + sceneDuration,
+      duration: sceneDuration,
+      audio: path.relative(outDir, sceneAudio),
+      text: scene.raw.replace(/\[\[cue:[\w-]+\]\]/g, ''),
+      // chunks: 用于字幕逐句显示。start/end 是段内相对时间,absoluteStart/absoluteEnd 是整轨绝对时间
+      chunks: chunkRecords.map((c) => ({
+        text: c.text,
+        start: c.start,
+        end: c.end,
+        absoluteStart: cursor + c.start,
+        absoluteEnd: cursor + c.end,
+      })),
+      cues: cueRecords.map((c) => ({
+        id: c.id,
+        offset: c.offset,
+        absoluteTime: cursor + c.offset,
+      })),
+    });
+
+    cursor += sceneDuration;
+  }
+
+  // 合并整轨
+  const voiceoverPath = path.join(outDir, 'voiceover.mp3');
+  ffmpegConcat(sceneAudioFiles, voiceoverPath);
+  timeline.totalDuration = getDuration(voiceoverPath);
+  timeline.voiceover = 'voiceover.mp3';
+
+  fs.writeFileSync(
+    path.join(outDir, 'timeline.json'),
+    JSON.stringify(timeline, null, 2),
+  );
+
+  // 清理 tmp
+  fs.rmSync(tmpDir, { recursive: true, force: true });
+
+  console.error(`\n[narrate] 完成。`);
+  console.error(`  voiceover: ${voiceoverPath}`);
+  console.error(`  timeline:  ${path.join(outDir, 'timeline.json')}`);
+  console.error(`  总时长:    ${timeline.totalDuration.toFixed(2)}s (${(timeline.totalDuration / 60).toFixed(2)} min)`);
+  console.error(`  段数:      ${timeline.scenes.length}`);
+  const totalCues = timeline.scenes.reduce((sum, s) => sum + s.cues.length, 0);
+  console.error(`  cue 数:    ${totalCues}`);
+}
+
+main().catch((err) => {
+  console.error(`narrate-pipeline 失败:${err.message}`);
+  console.error(err.stack);
+  process.exit(1);
+});

+ 136 - 0
scripts/render-narration.sh

@@ -0,0 +1,136 @@
+#!/usr/bin/env bash
+# render-narration.sh · 一条龙:HTML 解说动画 → 最终 MP4(带人声)
+#
+# 流水线:
+#   1. render-video.js  录无声 MP4(按 timeline.totalDuration)
+#   2. mix-voiceover.sh 混入 voiceover.mp3(可选 BGM)
+#   3. 输出 <basename>-narrated.mp4
+#
+# Usage:
+#   bash render-narration.sh <html> --timeline=<path> [options]
+#
+# Required:
+#   <html>                解说动画的 HTML(应内嵌 NarrationStage + recording 模式 rAF 自驱)
+#   --timeline=<path>     timeline.json 路径(自动读 totalDuration 和 voiceover.mp3 路径)
+#
+# Optional:
+#   --bgm-mood=<name>     BGM 预设(educational / tech / tutorial / ...)
+#   --bgm=<path>          自定义 BGM 文件
+#   --bgm-volume=<0-1>    BGM 静态音量,默认 0.18
+#   --no-ducking          关 sidechain ducking
+#   --keep-silent         保留中间产物(无声 MP4),便于 debug
+#   --out=<path>          输出路径,默认 <html-basename>-narrated.mp4
+#   --width=<px>          视频宽度(默认 1920)
+#   --height=<px>         视频高度(默认 1080)
+#
+# Examples:
+#   bash render-narration.sh demo.html --timeline=_narration/timeline.json
+#   bash render-narration.sh demo.html --timeline=_narration/timeline.json --bgm-mood=educational
+#
+set -e
+
+SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
+SKILL_ROOT="$SCRIPT_DIR/.."
+
+HTML=""
+TIMELINE=""
+BGM_MOOD=""
+BGM=""
+BGM_VOLUME="0.18"
+NO_DUCKING=""
+KEEP_SILENT=""
+OUT=""
+WIDTH="1920"
+HEIGHT="1080"
+
+for arg in "$@"; do
+  case "$arg" in
+    --timeline=*)    TIMELINE="${arg#*=}" ;;
+    --bgm-mood=*)    BGM_MOOD="${arg#*=}" ;;
+    --bgm=*)         BGM="${arg#*=}" ;;
+    --bgm-volume=*)  BGM_VOLUME="${arg#*=}" ;;
+    --no-ducking)    NO_DUCKING="--no-ducking" ;;
+    --keep-silent)   KEEP_SILENT="1" ;;
+    --out=*)         OUT="${arg#*=}" ;;
+    --width=*)       WIDTH="${arg#*=}" ;;
+    --height=*)      HEIGHT="${arg#*=}" ;;
+    -*)              echo "未知参数:$arg" >&2; exit 1 ;;
+    *)               HTML="$arg" ;;
+  esac
+done
+
+if [ -z "$HTML" ] || [ ! -f "$HTML" ]; then
+  echo "Usage: bash render-narration.sh <html> --timeline=<path> [options]" >&2
+  exit 1
+fi
+if [ -z "$TIMELINE" ] || [ ! -f "$TIMELINE" ]; then
+  echo "✗ 缺 --timeline=<path>(timeline.json 由 narrate-pipeline.mjs 生成)" >&2
+  exit 1
+fi
+
+# ── 从 timeline.json 读 totalDuration 和 voiceover 路径 ──
+TIMELINE_DIR="$(cd "$(dirname "$TIMELINE")" && pwd)"
+TOTAL_DURATION=$(node -e "console.log(JSON.parse(require('fs').readFileSync('$TIMELINE','utf8')).totalDuration)")
+VOICEOVER_REL=$(node -e "console.log(JSON.parse(require('fs').readFileSync('$TIMELINE','utf8')).voiceover || 'voiceover.mp3')")
+VOICEOVER="$TIMELINE_DIR/$VOICEOVER_REL"
+
+if [ ! -f "$VOICEOVER" ]; then
+  echo "✗ voiceover.mp3 不存在: $VOICEOVER" >&2
+  exit 1
+fi
+
+# 录制时长 = 总时长 + 1s 安全缓冲
+RECORD_DURATION=$(node -e "console.log(Math.ceil($TOTAL_DURATION + 1))")
+
+HTML_ABS="$(cd "$(dirname "$HTML")" && pwd)/$(basename "$HTML")"
+HTML_DIR="$(dirname "$HTML_ABS")"
+HTML_BASE="$(basename "$HTML" .html)"
+SILENT_MP4="$HTML_DIR/$HTML_BASE.mp4"
+
+if [ -z "$OUT" ]; then
+  OUT="$HTML_DIR/$HTML_BASE-narrated.mp4"
+fi
+
+echo "═══ render-narration ═══════════════════"
+echo "  HTML:        $HTML_ABS"
+echo "  Timeline:    $TIMELINE"
+echo "  Voiceover:   $VOICEOVER"
+echo "  Total dur:   ${TOTAL_DURATION}s (录 ${RECORD_DURATION}s)"
+echo "  尺寸:        ${WIDTH}×${HEIGHT}"
+[ -n "$BGM_MOOD" ] && echo "  BGM mood:    $BGM_MOOD"
+[ -n "$BGM" ] && echo "  BGM:         $BGM"
+echo "  最终输出:    $OUT"
+echo "════════════════════════════════════════"
+
+# ── Step 1: 录无声 MP4 ──────────────────────
+echo ""
+echo "▸ Step 1/2 · 录制 HTML 动画 (无声)"
+NODE_PATH=$(npm root -g) node "$SCRIPT_DIR/render-video.js" "$HTML_ABS" \
+  --duration="$RECORD_DURATION" \
+  --width="$WIDTH" \
+  --height="$HEIGHT"
+
+if [ ! -f "$SILENT_MP4" ]; then
+  echo "✗ 无声 MP4 没生成: $SILENT_MP4" >&2
+  exit 1
+fi
+
+# ── Step 2: 混入人声 ──────────────────────
+echo ""
+echo "▸ Step 2/2 · 混入人声"
+MIX_ARGS=("$SILENT_MP4" "--voiceover=$VOICEOVER" "--out=$OUT")
+[ -n "$BGM_MOOD" ] && MIX_ARGS+=("--bgm-mood=$BGM_MOOD")
+[ -n "$BGM" ]      && MIX_ARGS+=("--bgm=$BGM")
+[ -n "$BGM_MOOD$BGM" ] && MIX_ARGS+=("--bgm-volume=$BGM_VOLUME")
+[ -n "$NO_DUCKING" ] && MIX_ARGS+=("$NO_DUCKING")
+
+bash "$SCRIPT_DIR/mix-voiceover.sh" "${MIX_ARGS[@]}"
+
+# 清理中间产物
+if [ -z "$KEEP_SILENT" ]; then
+  rm -f "$SILENT_MP4"
+fi
+
+echo ""
+echo "✓ 完成: $OUT"
+[ -n "$KEEP_SILENT" ] && echo "  (中间产物保留: $SILENT_MP4)"

+ 184 - 0
scripts/tts-doubao.mjs

@@ -0,0 +1,184 @@
+#!/usr/bin/env node
+/**
+ * tts-doubao.mjs · 豆包语音 TTS(火山引擎 openspeech)
+ *
+ * 用法:
+ *   node scripts/tts-doubao.mjs --text "你好" --out demo.mp3
+ *   node scripts/tts-doubao.mjs --text-file script.txt --out out.mp3 --speed 1.0
+ *
+ * 输出:
+ *   - mp3 文件写到 --out 路径
+ *   - stdout 打印一行 JSON: {"path":"...","duration":12.34,"bytes":54321}
+ *
+ * 依赖:Node 18+(自带 fetch/crypto)、ffprobe(测时长,brew install ffmpeg)
+ *
+ * env(自动从 skill 根目录 .env 读取,也可走 process.env 覆盖):
+ *   DOUBAO_TTS_API_KEY     必填
+ *   DOUBAO_TTS_VOICE_ID    必填(音色 id)
+ *   DOUBAO_TTS_CLUSTER     默认 volcano_icl
+ *   DOUBAO_TTS_ENDPOINT    默认 https://openspeech.bytedance.com/api/v1/tts
+ */
+
+import fs from 'node:fs';
+import path from 'node:path';
+import { execFileSync } from 'node:child_process';
+import { fileURLToPath } from 'node:url';
+import { randomUUID } from 'node:crypto';
+
+const __dirname = path.dirname(fileURLToPath(import.meta.url));
+const SKILL_ROOT = path.resolve(__dirname, '..');
+
+function loadEnv() {
+  const envPath = path.join(SKILL_ROOT, '.env');
+  if (!fs.existsSync(envPath)) return;
+  const text = fs.readFileSync(envPath, 'utf8');
+  for (const line of text.split('\n')) {
+    const trimmed = line.trim();
+    if (!trimmed || trimmed.startsWith('#')) continue;
+    const idx = trimmed.indexOf('=');
+    if (idx < 0) continue;
+    const key = trimmed.slice(0, idx).trim();
+    let val = trimmed.slice(idx + 1).trim();
+    if ((val.startsWith('"') && val.endsWith('"')) || (val.startsWith("'") && val.endsWith("'"))) {
+      val = val.slice(1, -1);
+    }
+    if (!(key in process.env)) process.env[key] = val;
+  }
+}
+loadEnv();
+
+function parseArgs(argv) {
+  const args = { speed: '1.0', encoding: 'mp3' };
+  for (let i = 2; i < argv.length; i++) {
+    const a = argv[i];
+    if (a === '--text') args.text = argv[++i];
+    else if (a === '--text-file') args.textFile = argv[++i];
+    else if (a === '--out') args.out = argv[++i];
+    else if (a === '--speed') args.speed = argv[++i];
+    else if (a === '--voice') args.voice = argv[++i];
+    else if (a === '--encoding') args.encoding = argv[++i];
+    else if (a === '--help' || a === '-h') args.help = true;
+  }
+  return args;
+}
+
+function usage() {
+  console.error(`
+tts-doubao.mjs · 豆包语音 TTS
+
+  --text <str>          要合成的文本
+  --text-file <path>    从文件读取文本(与 --text 二选一)
+  --out <path>          输出 mp3 路径(必填)
+  --speed <float>       语速倍率,默认 1.0(0.5-2.0)
+  --voice <voice_id>    覆盖 .env 里的音色 id
+  --encoding <ext>      mp3 / wav / pcm,默认 mp3
+`.trim());
+  process.exit(1);
+}
+
+function getDuration(filePath) {
+  try {
+    const out = execFileSync('ffprobe', [
+      '-v', 'error',
+      '-show_entries', 'format=duration',
+      '-of', 'default=noprint_wrappers=1:nokey=1',
+      filePath,
+    ], { encoding: 'utf8' });
+    return parseFloat(out.trim());
+  } catch (e) {
+    return null;
+  }
+}
+
+async function tts({ text, voice, speed, encoding }) {
+  const apiKey = process.env.DOUBAO_TTS_API_KEY;
+  const cluster = process.env.DOUBAO_TTS_CLUSTER || 'volcano_icl';
+  const endpoint = process.env.DOUBAO_TTS_ENDPOINT || 'https://openspeech.bytedance.com/api/v1/tts';
+  const voiceId = voice || process.env.DOUBAO_TTS_VOICE_ID;
+
+  if (!apiKey) throw new Error('缺 DOUBAO_TTS_API_KEY(检查 .env)');
+  if (!voiceId) throw new Error('缺 DOUBAO_TTS_VOICE_ID(检查 .env 或用 --voice 传)');
+
+  const body = {
+    app: { cluster },
+    user: { uid: 'huashu-design' },
+    audio: {
+      voice_type: voiceId,
+      encoding,
+      speed_ratio: parseFloat(speed),
+    },
+    request: {
+      reqid: randomUUID(),
+      text,
+      operation: 'query',
+    },
+  };
+
+  const res = await fetch(endpoint, {
+    method: 'POST',
+    headers: {
+      'x-api-key': apiKey,
+      'Content-Type': 'application/json',
+    },
+    body: JSON.stringify(body),
+  });
+
+  if (!res.ok) {
+    const errText = await res.text();
+    throw new Error(`HTTP ${res.status}: ${errText.slice(0, 500)}`);
+  }
+
+  const json = await res.json();
+  // 豆包标准返回:{ code, message, data: "<base64 audio>", ... }
+  // code === 3000 表示成功
+  if (json.code !== undefined && json.code !== 3000) {
+    throw new Error(`API 返回错误 code=${json.code} msg=${json.message || JSON.stringify(json)}`);
+  }
+  if (!json.data) {
+    throw new Error(`API 响应无 data 字段:${JSON.stringify(json).slice(0, 500)}`);
+  }
+  return Buffer.from(json.data, 'base64');
+}
+
+async function main() {
+  const args = parseArgs(process.argv);
+  if (args.help) usage();
+
+  let text = args.text;
+  if (!text && args.textFile) {
+    text = fs.readFileSync(args.textFile, 'utf8').trim();
+  }
+  if (!text) {
+    console.error('错:缺 --text 或 --text-file');
+    usage();
+  }
+  if (!args.out) {
+    console.error('错:缺 --out');
+    usage();
+  }
+
+  const outPath = path.resolve(args.out);
+  fs.mkdirSync(path.dirname(outPath), { recursive: true });
+
+  const audio = await tts({
+    text,
+    voice: args.voice,
+    speed: args.speed,
+    encoding: args.encoding,
+  });
+
+  fs.writeFileSync(outPath, audio);
+  const duration = getDuration(outPath);
+  const result = {
+    path: outPath,
+    bytes: audio.length,
+    duration,
+    text_chars: text.length,
+  };
+  console.log(JSON.stringify(result));
+}
+
+main().catch((err) => {
+  console.error(`TTS 失败:${err.message}`);
+  process.exit(1);
+});

Неке датотеке нису приказане због велике количине промена