[{"data":1,"prerenderedAt":17},["ShallowReactive",2],{"blog-post-hqb-historical-video-tutorial-en":3},{"slug":4,"title":5,"description":6,"date":7,"author":8,"tags":9,"lang":13,"image":13,"ogImage":14,"thumbnail":13,"content":15,"html":16},"hqb-historical-video-tutorial-en","Making a Historical Short Film with AI: The Complete Huo Qubing Production Log","A complete production log of creating a 2.5-minute historical short film about ancient China's legendary general Huo Qubing — 12 shots, AI-generated music, AI voiceover, total cost under $20.","2026-03-10","Ima Claw Team",[10,11,12],"AI Creation","Tutorial","Behind the Scenes","","\u002Fima-claw\u002Fblog\u002Fimg\u002Fhqb-tutorial\u002Fframe-I-summit.jpg","\n2,000 years ago, a 19-year-old general named Huo Qubing led 50,000 cavalry north across the Gobi Desert and performed the famous ritual at Wolf Juxu Mountain — the furthest any Han dynasty army had ever reached.\n\nToday, we recreated that story with AI — 12 shots, 2 minutes 24 seconds, entirely AI-generated from script to final cut.\n\n## The Final Result\n\nWatch the finished film first:\n\n\u003Cvideo controls playsinline preload=\"metadata\" poster=\"https:\u002F\u002Fima-ga.esxscloud.com\u002FwebAgent\u002Fprivite\u002F2026\u002F03\u002F11\u002F1773159093222_ima_219a64800b384ae985e3e7688988b3ca_1737f42c304b4436893d3bc810df8314.jpg\" style=\"width:100%;max-width:400px;border-radius:12px;margin:1.5rem auto;display:block\">\n  \u003Csource src=\"https:\u002F\u002Fima-ga.esxscloud.com\u002FwebAgent\u002Fprivite\u002F2026\u002F03\u002F11\u002F1773159086632_ima_219a64800b384ae985e3e7688988b3ca_03eff7d2853347929e2985c89c7828f4.mp4\" type=\"video\u002Fmp4\">\n\u003C\u002Fvideo>\n\nA complete vertical short film featuring:\n- 12 AI-generated video clips (Kling O1 model)\n- AI voiceover narration (deep male voice, documentary style)\n- AI-composed original score (war drums + traditional Chinese instruments)\n\nDesigned for platforms like Douyin (TikTok China), Xiaohongshu, and Instagram Reels.\n\n---\n\n## Step 1: Script Design\n\nGood short films start with good scripts. We didn't jump straight into generation — we spent time crafting a 12-shot narrative structure first.\n\n### Three-Act Structure\n\n| Act | Shots | Narrative Arc |\n|-----|-------|--------------|\n| **Act I: The March** | A–D | Imperial command → Departure → Army rides north |\n| **Act II: The Battle** | E–H | Archery → Close combat → Pursuit → Looking back |\n| **Act III: The Legend** | I–L | Mountain summit ritual → Army cheers → Lone rider at sunset → Epilogue |\n\n### The 12-Shot Breakdown\n\n| Shot | Scene | Description | Mood |\n|------|-------|------------|------|\n| A | Imperial Court | Emperor grants command, young general accepts | Solemn |\n| B | Court Close-up | Huo Qubing salutes, eyes fixed northward | Determined |\n| C | City Gate | Mounting horse outside Chang'an, cavalry in formation | Bold |\n| D | Aerial Desert | 50,000 cavalry charging across the Gobi | Vast |\n| E | Mountain Battle | Drawing bow, arrow flies like thunder | Tense |\n| F | Close Combat | Blood-stained armor, a smile on his face | Fierce |\n| G | Sunset Pursuit | Solo rider chasing at sunset, red sky | Intense |\n| H | Looking Back | Reining horse, surveying 2,000 li of conquest | Reflective |\n| I | Mountain Summit | Sword raised to sky, army kneeling below | **Climax** |\n| J | Army Cheers | Ten thousand voices shake the steppe | Triumphant |\n| K | Lone Rider Sunset | Silhouette against the setting sun | Melancholic |\n| L | Epilogue | Young general looks up, freeze frame | Bittersweet |\n\n**Key Lesson:** Emotional arc matters more than visual spectacle. Each shot serves the overall rhythm — from solemn to vast to tense to climactic to melancholic.\n\n---\n\n## Step 2: Character Design\n\nCharacter consistency is critical for historical films. We established Huo Qubing's look first, then maintained it across all 12 shots.\n\n### Character Specs\n\n- **Reference:** Zhang Ruoyun's facial features (clean-cut, heroic)\n- **Age:** 19 years old\n- **Costume:** Black-gold Han dynasty battle armor, red cape\n- **Style:** Photorealistic cinematic, not animation\n\nWe used **Gemini 3 Pro** (Nano Banana Pro) to generate character design sheets, then used these as reference for all subsequent keyframes.\n\n---\n\n## Step 3: Keyframe Generation\n\nFor each of the 12 shots, we first generated a static keyframe image. Only after the composition was approved did we convert to video.\n\n\u003Cdiv style=\"display:grid;grid-template-columns:1fr 1fr 1fr;gap:8px;margin:1.5rem 0\">\n  \u003Cimg src=\"\u002Fima-claw\u002Fblog\u002Fimg\u002Fhqb-tutorial\u002Fframe-A-court.jpg\" alt=\"Shot A: Imperial Court\" style=\"border-radius:8px;width:100%\">\n  \u003Cimg src=\"\u002Fima-claw\u002Fblog\u002Fimg\u002Fhqb-tutorial\u002Fframe-E-archery.jpg\" alt=\"Shot E: Archery\" style=\"border-radius:8px;width:100%\">\n  \u003Cimg src=\"\u002Fima-claw\u002Fblog\u002Fimg\u002Fhqb-tutorial\u002Fframe-F-battle.jpg\" alt=\"Shot F: Battle\" style=\"border-radius:8px;width:100%\">\n\u003C\u002Fdiv>\n\u003Cdiv style=\"display:grid;grid-template-columns:1fr 1fr;gap:8px;margin:0 0 1.5rem\">\n  \u003Cimg src=\"\u002Fima-claw\u002Fblog\u002Fimg\u002Fhqb-tutorial\u002Fframe-I-summit.jpg\" alt=\"Shot I: Mountain Summit\" style=\"border-radius:8px;width:100%\">\n  \u003Cimg src=\"\u002Fima-claw\u002Fblog\u002Fimg\u002Fhqb-tutorial\u002Fframe-K-sunset.jpg\" alt=\"Shot K: Lone Rider\" style=\"border-radius:8px;width:100%\">\n\u003C\u002Fdiv>\n\n### Prompt Example\n\nFor the climactic summit scene (Shot I):\n\n```\nLow-angle shot of a young Chinese general (19 years old, resembling Zhang Ruoyun) \nstanding triumphant on the summit of Wolf Juxu Mountain, drawing his sword pointed \nskyward. Black-gold Han dynasty battle armor with flowing red cape. \nThousands of soldiers kneel in formation on the mountainside. \nGolden hour sunlight, epic cinematic composition.\n\nNegative: No god rays, no supernatural glow, no light beams, no lens flare.\n```\n\n**Key Lessons:**\n- **Negative prompts are essential** — Shot I initially had unnatural \"ghost rays\" that were only fixed by explicitly excluding them\n- **Lock aspect ratio early** — All keyframes were generated in 9:16 vertical format from the start\n- **Character description consistency** — Every prompt included identical character descriptors\n\n---\n\n## Step 4: Image-to-Video\n\nAfter keyframe approval, each static image was converted to a 5-second video clip.\n\n### Model Selection\n\n| Model | Credits\u002Fclip | Quality | Use Case |\n|-------|-------------|---------|----------|\n| **Kling O1** | 48 | ⭐⭐⭐⭐⭐ | Final version (quality first) |\n| Wan 2.6 | 40 | ⭐⭐⭐⭐ | Initial test round |\n\nWe tested with Wan 2.6 first, then switched everything to **Kling O1** for the final version. The facial detail and motion naturalness were noticeably better.\n\n### Cost Breakdown\n\n| Item | Quantity | Unit Cost | Total |\n|------|----------|-----------|-------|\n| Kling O1 clips (final) | 12 | 48 credits | 576 |\n| Wan 2.6 clips (test) | 12 | 40 credits | 480 |\n| Remakes\u002Ffixes | ~8 | 48 credits | 384 |\n| **Video subtotal** | | | **~1,440 credits** |\n\n---\n\n## Step 5: AI Voiceover\n\nNarration is the soul of a historical short film. We generated 12 segments of narration using AI TTS.\n\n### Technical Setup\n\n- **Model:** Gemini TTS (`gemini-2.5-flash-preview-tts`), Orus voice\n- **Style:** Slow, deliberate, documentary gravitas\n- **Output:** Raw PCM (s16le, 24kHz, mono) → converted to MP3 via ffmpeg\n- **Total duration:** 144 seconds across 12 segments\n\n**Pitfall:** We first tried `seed-tts-1.1` for voice cloning, but its `ref_audio_url` parameter is silently ignored. Gemini TTS worked perfectly as the alternative.\n\n---\n\n## Step 6: AI Music Score\n\nOriginal score composed using **DouBao BGM** (ByteDance music generation).\n\n### Prompt Design\n\n```\nAncient Chinese war epic soundtrack. \nNO orchestra, NO strings, NO violin. \nUse only: massive war drums (taiko), bronze bells, erhu, \nguzheng, dizi flute, powerful male choir. \nFierce, aggressive, triumphant. 150 seconds.\n```\n\nWe went through 3 iterations:\n1. **v1:** Orchestral — too Western, didn't match the period\n2. **v2:** Mixed — better but still had strings\n3. **v3:** Pure traditional instruments + war drums — perfect ✅\n\n**Cost:** 30 credits × 3 iterations = 90 credits\n\n---\n\n## Step 7: Final Assembly\n\n### The Slow-Motion Solution\n\nEach AI video clip is only 5 seconds, but the corresponding narration is 10–14 seconds. Our solution:\n\n**Slow each clip to match its narration duration** using ffmpeg's `setpts` filter. For a historical film, this actually enhanced the epic quality — slow motion adds gravitas.\n\n### Assembly Pipeline\n\n```bash\n# 1. Slow each clip to match narration\nffmpeg -i clip-A.mp4 -vf \"setpts=2.43*PTS,scale=1080:1920\" -an seg-A.mp4\n\n# 2. Concatenate all 12 segments\nffmpeg -f concat -i concat-list.txt -c copy video.mp4\n\n# 3. Mix narration + BGM (BGM at 20% volume)  \nffmpeg -i narration.mp3 -i bgm.mp3 \\\n  -filter_complex \"[1:a]volume=0.2[bgm];[0:a][bgm]amix=inputs=2\" mixed.mp3\n\n# 4. Combine video + audio\nffmpeg -i video.mp4 -i mixed.mp3 -c:v copy -c:a aac -shortest final.mp4\n```\n\n---\n\n## Total Cost\n\n### Generation Cost (IMA Credits)\n\n| Item | Tool | Credits | ~USD |\n|------|------|---------|------|\n| Keyframe images | Nano Banana Pro | ~200 | $2 |\n| Video generation | Kling O1 + Wan 2.6 | ~1,440 | $14 |\n| Voiceover | Gemini TTS | Free | $0 |\n| Music score | DouBao BGM × 3 versions | 90 | $1 |\n| **Generation subtotal** | | **~1,730** | **~$17** |\n\n### AI Conversation Cost (LLM Tokens)\n\n| Item | Description | Est. Cost |\n|------|------------|-----------|\n| Script writing & iteration | 12-shot breakdown, narrative arc, two revision rounds | $2-3 |\n| Prompt engineering | Generation prompts for 12 shots, character descriptions | $1-2 |\n| Feedback & adjustments | Multi-round approvals, visual fixes, music style iteration | $2-4 |\n| **Conversation subtotal** | | **~$5-9** |\n\n### Grand Total\n\n| Category | Cost |\n|----------|------|\n| Generation (images + video + music) | ~$17 |\n| AI conversation (script + prompts + feedback) | ~$7 |\n| **Total** | **Under $25** |\n\n> Traditional production of a comparable historical short — actors, costumes, locations, crew — would cost tens of thousands of dollars minimum. One person + AI, under $25, half a day.\n\n---\n\n## Lessons Learned\n\n### What Worked\n\n1. **Script first** — The 12-shot narrative arc was locked before any generation began\n2. **Keyframes before video** — Approving static images first saved hundreds of credits in video generation\n3. **Model tiering** — Test with cheaper models, finalize with premium\n4. **Negative prompts** — Telling AI what NOT to do is as important as telling it what to do\n5. **Slow motion = epic** — Slowing 5s clips to 12s actually enhanced the historical gravitas\n\n### What We'd Do Differently\n\n1. **Character consistency** — Still the hardest problem; faces varied slightly across shots\n2. **Model selection earlier** — We should have committed to Kling O1 from the start instead of testing with Wan 2.6 first\n3. **Music iteration** — Should have specified \"no Western instruments\" from the first prompt\n\n---\n\n## Tools Used\n\n| Purpose | Tool | Via Ima Claw |\n|---------|------|-------------|\n| Keyframe generation | Gemini 3 Pro Image | ✅ |\n| Video generation | Kling O1 \u002F Wan 2.6 | ✅ |\n| Voiceover | Gemini TTS | ✅ |\n| Music score | DouBao BGM | ✅ |\n| Video editing | ffmpeg | ✅ (CLI) |\n\nAll tools accessible through Ima Claw — no separate accounts or API keys needed.\n\n---\n\n*A coffee break to write the script. Half a day to generate the footage. One ffmpeg command to assemble the final cut. A 2,000-year-old legend, brought back to life by AI.* 🐎\n","\u003Cp>2,000 years ago, a 19-year-old general named Huo Qubing led 50,000 cavalry north across the Gobi Desert and performed the famous ritual at Wolf Juxu Mountain — the furthest any Han dynasty army had ever reached.\u003C\u002Fp>\n\u003Cp>Today, we recreated that story with AI — 12 shots, 2 minutes 24 seconds, entirely AI-generated from script to final cut.\u003C\u002Fp>\n\u003Ch2>The Final Result\u003C\u002Fh2>\n\u003Cp>Watch the finished film first:\u003C\u002Fp>\n\u003Cvideo controls playsinline preload=\"metadata\" poster=\"https:\u002F\u002Fima-ga.esxscloud.com\u002FwebAgent\u002Fprivite\u002F2026\u002F03\u002F11\u002F1773159093222_ima_219a64800b384ae985e3e7688988b3ca_1737f42c304b4436893d3bc810df8314.jpg\" style=\"width:100%;max-width:400px;border-radius:12px;margin:1.5rem auto;display:block\">\n  \u003Csource src=\"https:\u002F\u002Fima-ga.esxscloud.com\u002FwebAgent\u002Fprivite\u002F2026\u002F03\u002F11\u002F1773159086632_ima_219a64800b384ae985e3e7688988b3ca_03eff7d2853347929e2985c89c7828f4.mp4\" type=\"video\u002Fmp4\">\n\u003C\u002Fvideo>\n\n\u003Cp>A complete vertical short film featuring:\u003C\u002Fp>\n\u003Cul>\n\u003Cli>12 AI-generated video clips (Kling O1 model)\u003C\u002Fli>\n\u003Cli>AI voiceover narration (deep male voice, documentary style)\u003C\u002Fli>\n\u003Cli>AI-composed original score (war drums + traditional Chinese instruments)\u003C\u002Fli>\n\u003C\u002Ful>\n\u003Cp>Designed for platforms like Douyin (TikTok China), Xiaohongshu, and Instagram Reels.\u003C\u002Fp>\n\u003Chr>\n\u003Ch2>Step 1: Script Design\u003C\u002Fh2>\n\u003Cp>Good short films start with good scripts. We didn&#39;t jump straight into generation — we spent time crafting a 12-shot narrative structure first.\u003C\u002Fp>\n\u003Ch3>Three-Act Structure\u003C\u002Fh3>\n\u003Ctable>\n\u003Cthead>\n\u003Ctr>\n\u003Cth>Act\u003C\u002Fth>\n\u003Cth>Shots\u003C\u002Fth>\n\u003Cth>Narrative Arc\u003C\u002Fth>\n\u003C\u002Ftr>\n\u003C\u002Fthead>\n\u003Ctbody>\u003Ctr>\n\u003Ctd>\u003Cstrong>Act I: The March\u003C\u002Fstrong>\u003C\u002Ftd>\n\u003Ctd>A–D\u003C\u002Ftd>\n\u003Ctd>Imperial command → Departure → Army rides north\u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003Ctr>\n\u003Ctd>\u003Cstrong>Act II: The Battle\u003C\u002Fstrong>\u003C\u002Ftd>\n\u003Ctd>E–H\u003C\u002Ftd>\n\u003Ctd>Archery → Close combat → Pursuit → Looking back\u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003Ctr>\n\u003Ctd>\u003Cstrong>Act III: The Legend\u003C\u002Fstrong>\u003C\u002Ftd>\n\u003Ctd>I–L\u003C\u002Ftd>\n\u003Ctd>Mountain summit ritual → Army cheers → Lone rider at sunset → Epilogue\u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003C\u002Ftbody>\u003C\u002Ftable>\n\u003Ch3>The 12-Shot Breakdown\u003C\u002Fh3>\n\u003Ctable>\n\u003Cthead>\n\u003Ctr>\n\u003Cth>Shot\u003C\u002Fth>\n\u003Cth>Scene\u003C\u002Fth>\n\u003Cth>Description\u003C\u002Fth>\n\u003Cth>Mood\u003C\u002Fth>\n\u003C\u002Ftr>\n\u003C\u002Fthead>\n\u003Ctbody>\u003Ctr>\n\u003Ctd>A\u003C\u002Ftd>\n\u003Ctd>Imperial Court\u003C\u002Ftd>\n\u003Ctd>Emperor grants command, young general accepts\u003C\u002Ftd>\n\u003Ctd>Solemn\u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003Ctr>\n\u003Ctd>B\u003C\u002Ftd>\n\u003Ctd>Court Close-up\u003C\u002Ftd>\n\u003Ctd>Huo Qubing salutes, eyes fixed northward\u003C\u002Ftd>\n\u003Ctd>Determined\u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003Ctr>\n\u003Ctd>C\u003C\u002Ftd>\n\u003Ctd>City Gate\u003C\u002Ftd>\n\u003Ctd>Mounting horse outside Chang&#39;an, cavalry in formation\u003C\u002Ftd>\n\u003Ctd>Bold\u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003Ctr>\n\u003Ctd>D\u003C\u002Ftd>\n\u003Ctd>Aerial Desert\u003C\u002Ftd>\n\u003Ctd>50,000 cavalry charging across the Gobi\u003C\u002Ftd>\n\u003Ctd>Vast\u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003Ctr>\n\u003Ctd>E\u003C\u002Ftd>\n\u003Ctd>Mountain Battle\u003C\u002Ftd>\n\u003Ctd>Drawing bow, arrow flies like thunder\u003C\u002Ftd>\n\u003Ctd>Tense\u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003Ctr>\n\u003Ctd>F\u003C\u002Ftd>\n\u003Ctd>Close Combat\u003C\u002Ftd>\n\u003Ctd>Blood-stained armor, a smile on his face\u003C\u002Ftd>\n\u003Ctd>Fierce\u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003Ctr>\n\u003Ctd>G\u003C\u002Ftd>\n\u003Ctd>Sunset Pursuit\u003C\u002Ftd>\n\u003Ctd>Solo rider chasing at sunset, red sky\u003C\u002Ftd>\n\u003Ctd>Intense\u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003Ctr>\n\u003Ctd>H\u003C\u002Ftd>\n\u003Ctd>Looking Back\u003C\u002Ftd>\n\u003Ctd>Reining horse, surveying 2,000 li of conquest\u003C\u002Ftd>\n\u003Ctd>Reflective\u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003Ctr>\n\u003Ctd>I\u003C\u002Ftd>\n\u003Ctd>Mountain Summit\u003C\u002Ftd>\n\u003Ctd>Sword raised to sky, army kneeling below\u003C\u002Ftd>\n\u003Ctd>\u003Cstrong>Climax\u003C\u002Fstrong>\u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003Ctr>\n\u003Ctd>J\u003C\u002Ftd>\n\u003Ctd>Army Cheers\u003C\u002Ftd>\n\u003Ctd>Ten thousand voices shake the steppe\u003C\u002Ftd>\n\u003Ctd>Triumphant\u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003Ctr>\n\u003Ctd>K\u003C\u002Ftd>\n\u003Ctd>Lone Rider Sunset\u003C\u002Ftd>\n\u003Ctd>Silhouette against the setting sun\u003C\u002Ftd>\n\u003Ctd>Melancholic\u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003Ctr>\n\u003Ctd>L\u003C\u002Ftd>\n\u003Ctd>Epilogue\u003C\u002Ftd>\n\u003Ctd>Young general looks up, freeze frame\u003C\u002Ftd>\n\u003Ctd>Bittersweet\u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003C\u002Ftbody>\u003C\u002Ftable>\n\u003Cp>\u003Cstrong>Key Lesson:\u003C\u002Fstrong> Emotional arc matters more than visual spectacle. Each shot serves the overall rhythm — from solemn to vast to tense to climactic to melancholic.\u003C\u002Fp>\n\u003Chr>\n\u003Ch2>Step 2: Character Design\u003C\u002Fh2>\n\u003Cp>Character consistency is critical for historical films. We established Huo Qubing&#39;s look first, then maintained it across all 12 shots.\u003C\u002Fp>\n\u003Ch3>Character Specs\u003C\u002Fh3>\n\u003Cul>\n\u003Cli>\u003Cstrong>Reference:\u003C\u002Fstrong> Zhang Ruoyun&#39;s facial features (clean-cut, heroic)\u003C\u002Fli>\n\u003Cli>\u003Cstrong>Age:\u003C\u002Fstrong> 19 years old\u003C\u002Fli>\n\u003Cli>\u003Cstrong>Costume:\u003C\u002Fstrong> Black-gold Han dynasty battle armor, red cape\u003C\u002Fli>\n\u003Cli>\u003Cstrong>Style:\u003C\u002Fstrong> Photorealistic cinematic, not animation\u003C\u002Fli>\n\u003C\u002Ful>\n\u003Cp>We used \u003Cstrong>Gemini 3 Pro\u003C\u002Fstrong> (Nano Banana Pro) to generate character design sheets, then used these as reference for all subsequent keyframes.\u003C\u002Fp>\n\u003Chr>\n\u003Ch2>Step 3: Keyframe Generation\u003C\u002Fh2>\n\u003Cp>For each of the 12 shots, we first generated a static keyframe image. Only after the composition was approved did we convert to video.\u003C\u002Fp>\n\u003Cdiv style=\"display:grid;grid-template-columns:1fr 1fr 1fr;gap:8px;margin:1.5rem 0\">\n  \u003Cimg src=\"\u002Fima-claw\u002Fblog\u002Fimg\u002Fhqb-tutorial\u002Fframe-A-court.jpg\" alt=\"Shot A: Imperial Court\" style=\"border-radius:8px;width:100%\">\n  \u003Cimg src=\"\u002Fima-claw\u002Fblog\u002Fimg\u002Fhqb-tutorial\u002Fframe-E-archery.jpg\" alt=\"Shot E: Archery\" style=\"border-radius:8px;width:100%\">\n  \u003Cimg src=\"\u002Fima-claw\u002Fblog\u002Fimg\u002Fhqb-tutorial\u002Fframe-F-battle.jpg\" alt=\"Shot F: Battle\" style=\"border-radius:8px;width:100%\">\n\u003C\u002Fdiv>\n\u003Cdiv style=\"display:grid;grid-template-columns:1fr 1fr;gap:8px;margin:0 0 1.5rem\">\n  \u003Cimg src=\"\u002Fima-claw\u002Fblog\u002Fimg\u002Fhqb-tutorial\u002Fframe-I-summit.jpg\" alt=\"Shot I: Mountain Summit\" style=\"border-radius:8px;width:100%\">\n  \u003Cimg src=\"\u002Fima-claw\u002Fblog\u002Fimg\u002Fhqb-tutorial\u002Fframe-K-sunset.jpg\" alt=\"Shot K: Lone Rider\" style=\"border-radius:8px;width:100%\">\n\u003C\u002Fdiv>\n\n\u003Ch3>Prompt Example\u003C\u002Fh3>\n\u003Cp>For the climactic summit scene (Shot I):\u003C\u002Fp>\n\u003Cpre>\u003Ccode>Low-angle shot of a young Chinese general (19 years old, resembling Zhang Ruoyun) \nstanding triumphant on the summit of Wolf Juxu Mountain, drawing his sword pointed \nskyward. Black-gold Han dynasty battle armor with flowing red cape. \nThousands of soldiers kneel in formation on the mountainside. \nGolden hour sunlight, epic cinematic composition.\n\nNegative: No god rays, no supernatural glow, no light beams, no lens flare.\n\u003C\u002Fcode>\u003C\u002Fpre>\n\u003Cp>\u003Cstrong>Key Lessons:\u003C\u002Fstrong>\u003C\u002Fp>\n\u003Cul>\n\u003Cli>\u003Cstrong>Negative prompts are essential\u003C\u002Fstrong> — Shot I initially had unnatural &quot;ghost rays&quot; that were only fixed by explicitly excluding them\u003C\u002Fli>\n\u003Cli>\u003Cstrong>Lock aspect ratio early\u003C\u002Fstrong> — All keyframes were generated in 9:16 vertical format from the start\u003C\u002Fli>\n\u003Cli>\u003Cstrong>Character description consistency\u003C\u002Fstrong> — Every prompt included identical character descriptors\u003C\u002Fli>\n\u003C\u002Ful>\n\u003Chr>\n\u003Ch2>Step 4: Image-to-Video\u003C\u002Fh2>\n\u003Cp>After keyframe approval, each static image was converted to a 5-second video clip.\u003C\u002Fp>\n\u003Ch3>Model Selection\u003C\u002Fh3>\n\u003Ctable>\n\u003Cthead>\n\u003Ctr>\n\u003Cth>Model\u003C\u002Fth>\n\u003Cth>Credits\u002Fclip\u003C\u002Fth>\n\u003Cth>Quality\u003C\u002Fth>\n\u003Cth>Use Case\u003C\u002Fth>\n\u003C\u002Ftr>\n\u003C\u002Fthead>\n\u003Ctbody>\u003Ctr>\n\u003Ctd>\u003Cstrong>Kling O1\u003C\u002Fstrong>\u003C\u002Ftd>\n\u003Ctd>48\u003C\u002Ftd>\n\u003Ctd>⭐⭐⭐⭐⭐\u003C\u002Ftd>\n\u003Ctd>Final version (quality first)\u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003Ctr>\n\u003Ctd>Wan 2.6\u003C\u002Ftd>\n\u003Ctd>40\u003C\u002Ftd>\n\u003Ctd>⭐⭐⭐⭐\u003C\u002Ftd>\n\u003Ctd>Initial test round\u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003C\u002Ftbody>\u003C\u002Ftable>\n\u003Cp>We tested with Wan 2.6 first, then switched everything to \u003Cstrong>Kling O1\u003C\u002Fstrong> for the final version. The facial detail and motion naturalness were noticeably better.\u003C\u002Fp>\n\u003Ch3>Cost Breakdown\u003C\u002Fh3>\n\u003Ctable>\n\u003Cthead>\n\u003Ctr>\n\u003Cth>Item\u003C\u002Fth>\n\u003Cth>Quantity\u003C\u002Fth>\n\u003Cth>Unit Cost\u003C\u002Fth>\n\u003Cth>Total\u003C\u002Fth>\n\u003C\u002Ftr>\n\u003C\u002Fthead>\n\u003Ctbody>\u003Ctr>\n\u003Ctd>Kling O1 clips (final)\u003C\u002Ftd>\n\u003Ctd>12\u003C\u002Ftd>\n\u003Ctd>48 credits\u003C\u002Ftd>\n\u003Ctd>576\u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003Ctr>\n\u003Ctd>Wan 2.6 clips (test)\u003C\u002Ftd>\n\u003Ctd>12\u003C\u002Ftd>\n\u003Ctd>40 credits\u003C\u002Ftd>\n\u003Ctd>480\u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003Ctr>\n\u003Ctd>Remakes\u002Ffixes\u003C\u002Ftd>\n\u003Ctd>~8\u003C\u002Ftd>\n\u003Ctd>48 credits\u003C\u002Ftd>\n\u003Ctd>384\u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003Ctr>\n\u003Ctd>\u003Cstrong>Video subtotal\u003C\u002Fstrong>\u003C\u002Ftd>\n\u003Ctd>\u003C\u002Ftd>\n\u003Ctd>\u003C\u002Ftd>\n\u003Ctd>\u003Cstrong>~1,440 credits\u003C\u002Fstrong>\u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003C\u002Ftbody>\u003C\u002Ftable>\n\u003Chr>\n\u003Ch2>Step 5: AI Voiceover\u003C\u002Fh2>\n\u003Cp>Narration is the soul of a historical short film. We generated 12 segments of narration using AI TTS.\u003C\u002Fp>\n\u003Ch3>Technical Setup\u003C\u002Fh3>\n\u003Cul>\n\u003Cli>\u003Cstrong>Model:\u003C\u002Fstrong> Gemini TTS (\u003Ccode>gemini-2.5-flash-preview-tts\u003C\u002Fcode>), Orus voice\u003C\u002Fli>\n\u003Cli>\u003Cstrong>Style:\u003C\u002Fstrong> Slow, deliberate, documentary gravitas\u003C\u002Fli>\n\u003Cli>\u003Cstrong>Output:\u003C\u002Fstrong> Raw PCM (s16le, 24kHz, mono) → converted to MP3 via ffmpeg\u003C\u002Fli>\n\u003Cli>\u003Cstrong>Total duration:\u003C\u002Fstrong> 144 seconds across 12 segments\u003C\u002Fli>\n\u003C\u002Ful>\n\u003Cp>\u003Cstrong>Pitfall:\u003C\u002Fstrong> We first tried \u003Ccode>seed-tts-1.1\u003C\u002Fcode> for voice cloning, but its \u003Ccode>ref_audio_url\u003C\u002Fcode> parameter is silently ignored. Gemini TTS worked perfectly as the alternative.\u003C\u002Fp>\n\u003Chr>\n\u003Ch2>Step 6: AI Music Score\u003C\u002Fh2>\n\u003Cp>Original score composed using \u003Cstrong>DouBao BGM\u003C\u002Fstrong> (ByteDance music generation).\u003C\u002Fp>\n\u003Ch3>Prompt Design\u003C\u002Fh3>\n\u003Cpre>\u003Ccode>Ancient Chinese war epic soundtrack. \nNO orchestra, NO strings, NO violin. \nUse only: massive war drums (taiko), bronze bells, erhu, \nguzheng, dizi flute, powerful male choir. \nFierce, aggressive, triumphant. 150 seconds.\n\u003C\u002Fcode>\u003C\u002Fpre>\n\u003Cp>We went through 3 iterations:\u003C\u002Fp>\n\u003Col>\n\u003Cli>\u003Cstrong>v1:\u003C\u002Fstrong> Orchestral — too Western, didn&#39;t match the period\u003C\u002Fli>\n\u003Cli>\u003Cstrong>v2:\u003C\u002Fstrong> Mixed — better but still had strings\u003C\u002Fli>\n\u003Cli>\u003Cstrong>v3:\u003C\u002Fstrong> Pure traditional instruments + war drums — perfect ✅\u003C\u002Fli>\n\u003C\u002Fol>\n\u003Cp>\u003Cstrong>Cost:\u003C\u002Fstrong> 30 credits × 3 iterations = 90 credits\u003C\u002Fp>\n\u003Chr>\n\u003Ch2>Step 7: Final Assembly\u003C\u002Fh2>\n\u003Ch3>The Slow-Motion Solution\u003C\u002Fh3>\n\u003Cp>Each AI video clip is only 5 seconds, but the corresponding narration is 10–14 seconds. Our solution:\u003C\u002Fp>\n\u003Cp>\u003Cstrong>Slow each clip to match its narration duration\u003C\u002Fstrong> using ffmpeg&#39;s \u003Ccode>setpts\u003C\u002Fcode> filter. For a historical film, this actually enhanced the epic quality — slow motion adds gravitas.\u003C\u002Fp>\n\u003Ch3>Assembly Pipeline\u003C\u002Fh3>\n\u003Cpre>\u003Ccode class=\"language-bash\"># 1. Slow each clip to match narration\nffmpeg -i clip-A.mp4 -vf &quot;setpts=2.43*PTS,scale=1080:1920&quot; -an seg-A.mp4\n\n# 2. Concatenate all 12 segments\nffmpeg -f concat -i concat-list.txt -c copy video.mp4\n\n# 3. Mix narration + BGM (BGM at 20% volume)  \nffmpeg -i narration.mp3 -i bgm.mp3 \\\n  -filter_complex &quot;[1:a]volume=0.2[bgm];[0:a][bgm]amix=inputs=2&quot; mixed.mp3\n\n# 4. Combine video + audio\nffmpeg -i video.mp4 -i mixed.mp3 -c:v copy -c:a aac -shortest final.mp4\n\u003C\u002Fcode>\u003C\u002Fpre>\n\u003Chr>\n\u003Ch2>Total Cost\u003C\u002Fh2>\n\u003Ch3>Generation Cost (IMA Credits)\u003C\u002Fh3>\n\u003Ctable>\n\u003Cthead>\n\u003Ctr>\n\u003Cth>Item\u003C\u002Fth>\n\u003Cth>Tool\u003C\u002Fth>\n\u003Cth>Credits\u003C\u002Fth>\n\u003Cth>~USD\u003C\u002Fth>\n\u003C\u002Ftr>\n\u003C\u002Fthead>\n\u003Ctbody>\u003Ctr>\n\u003Ctd>Keyframe images\u003C\u002Ftd>\n\u003Ctd>Nano Banana Pro\u003C\u002Ftd>\n\u003Ctd>~200\u003C\u002Ftd>\n\u003Ctd>$2\u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003Ctr>\n\u003Ctd>Video generation\u003C\u002Ftd>\n\u003Ctd>Kling O1 + Wan 2.6\u003C\u002Ftd>\n\u003Ctd>~1,440\u003C\u002Ftd>\n\u003Ctd>$14\u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003Ctr>\n\u003Ctd>Voiceover\u003C\u002Ftd>\n\u003Ctd>Gemini TTS\u003C\u002Ftd>\n\u003Ctd>Free\u003C\u002Ftd>\n\u003Ctd>$0\u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003Ctr>\n\u003Ctd>Music score\u003C\u002Ftd>\n\u003Ctd>DouBao BGM × 3 versions\u003C\u002Ftd>\n\u003Ctd>90\u003C\u002Ftd>\n\u003Ctd>$1\u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003Ctr>\n\u003Ctd>\u003Cstrong>Generation subtotal\u003C\u002Fstrong>\u003C\u002Ftd>\n\u003Ctd>\u003C\u002Ftd>\n\u003Ctd>\u003Cstrong>~1,730\u003C\u002Fstrong>\u003C\u002Ftd>\n\u003Ctd>\u003Cstrong>~$17\u003C\u002Fstrong>\u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003C\u002Ftbody>\u003C\u002Ftable>\n\u003Ch3>AI Conversation Cost (LLM Tokens)\u003C\u002Fh3>\n\u003Ctable>\n\u003Cthead>\n\u003Ctr>\n\u003Cth>Item\u003C\u002Fth>\n\u003Cth>Description\u003C\u002Fth>\n\u003Cth>Est. Cost\u003C\u002Fth>\n\u003C\u002Ftr>\n\u003C\u002Fthead>\n\u003Ctbody>\u003Ctr>\n\u003Ctd>Script writing &amp; iteration\u003C\u002Ftd>\n\u003Ctd>12-shot breakdown, narrative arc, two revision rounds\u003C\u002Ftd>\n\u003Ctd>$2-3\u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003Ctr>\n\u003Ctd>Prompt engineering\u003C\u002Ftd>\n\u003Ctd>Generation prompts for 12 shots, character descriptions\u003C\u002Ftd>\n\u003Ctd>$1-2\u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003Ctr>\n\u003Ctd>Feedback &amp; adjustments\u003C\u002Ftd>\n\u003Ctd>Multi-round approvals, visual fixes, music style iteration\u003C\u002Ftd>\n\u003Ctd>$2-4\u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003Ctr>\n\u003Ctd>\u003Cstrong>Conversation subtotal\u003C\u002Fstrong>\u003C\u002Ftd>\n\u003Ctd>\u003C\u002Ftd>\n\u003Ctd>\u003Cstrong>~$5-9\u003C\u002Fstrong>\u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003C\u002Ftbody>\u003C\u002Ftable>\n\u003Ch3>Grand Total\u003C\u002Fh3>\n\u003Ctable>\n\u003Cthead>\n\u003Ctr>\n\u003Cth>Category\u003C\u002Fth>\n\u003Cth>Cost\u003C\u002Fth>\n\u003C\u002Ftr>\n\u003C\u002Fthead>\n\u003Ctbody>\u003Ctr>\n\u003Ctd>Generation (images + video + music)\u003C\u002Ftd>\n\u003Ctd>~$17\u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003Ctr>\n\u003Ctd>AI conversation (script + prompts + feedback)\u003C\u002Ftd>\n\u003Ctd>~$7\u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003Ctr>\n\u003Ctd>\u003Cstrong>Total\u003C\u002Fstrong>\u003C\u002Ftd>\n\u003Ctd>\u003Cstrong>Under $25\u003C\u002Fstrong>\u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003C\u002Ftbody>\u003C\u002Ftable>\n\u003Cblockquote>\n\u003Cp>Traditional production of a comparable historical short — actors, costumes, locations, crew — would cost tens of thousands of dollars minimum. One person + AI, under $25, half a day.\u003C\u002Fp>\n\u003C\u002Fblockquote>\n\u003Chr>\n\u003Ch2>Lessons Learned\u003C\u002Fh2>\n\u003Ch3>What Worked\u003C\u002Fh3>\n\u003Col>\n\u003Cli>\u003Cstrong>Script first\u003C\u002Fstrong> — The 12-shot narrative arc was locked before any generation began\u003C\u002Fli>\n\u003Cli>\u003Cstrong>Keyframes before video\u003C\u002Fstrong> — Approving static images first saved hundreds of credits in video generation\u003C\u002Fli>\n\u003Cli>\u003Cstrong>Model tiering\u003C\u002Fstrong> — Test with cheaper models, finalize with premium\u003C\u002Fli>\n\u003Cli>\u003Cstrong>Negative prompts\u003C\u002Fstrong> — Telling AI what NOT to do is as important as telling it what to do\u003C\u002Fli>\n\u003Cli>\u003Cstrong>Slow motion = epic\u003C\u002Fstrong> — Slowing 5s clips to 12s actually enhanced the historical gravitas\u003C\u002Fli>\n\u003C\u002Fol>\n\u003Ch3>What We&#39;d Do Differently\u003C\u002Fh3>\n\u003Col>\n\u003Cli>\u003Cstrong>Character consistency\u003C\u002Fstrong> — Still the hardest problem; faces varied slightly across shots\u003C\u002Fli>\n\u003Cli>\u003Cstrong>Model selection earlier\u003C\u002Fstrong> — We should have committed to Kling O1 from the start instead of testing with Wan 2.6 first\u003C\u002Fli>\n\u003Cli>\u003Cstrong>Music iteration\u003C\u002Fstrong> — Should have specified &quot;no Western instruments&quot; from the first prompt\u003C\u002Fli>\n\u003C\u002Fol>\n\u003Chr>\n\u003Ch2>Tools Used\u003C\u002Fh2>\n\u003Ctable>\n\u003Cthead>\n\u003Ctr>\n\u003Cth>Purpose\u003C\u002Fth>\n\u003Cth>Tool\u003C\u002Fth>\n\u003Cth>Via Ima Claw\u003C\u002Fth>\n\u003C\u002Ftr>\n\u003C\u002Fthead>\n\u003Ctbody>\u003Ctr>\n\u003Ctd>Keyframe generation\u003C\u002Ftd>\n\u003Ctd>Gemini 3 Pro Image\u003C\u002Ftd>\n\u003Ctd>✅\u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003Ctr>\n\u003Ctd>Video generation\u003C\u002Ftd>\n\u003Ctd>Kling O1 \u002F Wan 2.6\u003C\u002Ftd>\n\u003Ctd>✅\u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003Ctr>\n\u003Ctd>Voiceover\u003C\u002Ftd>\n\u003Ctd>Gemini TTS\u003C\u002Ftd>\n\u003Ctd>✅\u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003Ctr>\n\u003Ctd>Music score\u003C\u002Ftd>\n\u003Ctd>DouBao BGM\u003C\u002Ftd>\n\u003Ctd>✅\u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003Ctr>\n\u003Ctd>Video editing\u003C\u002Ftd>\n\u003Ctd>ffmpeg\u003C\u002Ftd>\n\u003Ctd>✅ (CLI)\u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003C\u002Ftbody>\u003C\u002Ftable>\n\u003Cp>All tools accessible through Ima Claw — no separate accounts or API keys needed.\u003C\u002Fp>\n\u003Chr>\n\u003Cp>\u003Cem>A coffee break to write the script. Half a day to generate the footage. One ffmpeg command to assemble the final cut. A 2,000-year-old legend, brought back to life by AI.\u003C\u002Fem> 🐎\u003C\u002Fp>\n",1775543777936]