Creating a Realistic AI Video Podcast

This playbook covers generating podcast audio with NotebookLM and then turning it into a video podcast using two different methods: HeyGen and Hedra.

Part 1: Generating the Podcast Audio with NotebookLM

The first crucial step is to create the natural-sounding audio dialogue for your podcast.

Go to NotebookLM: Access NotebookLM.
Create a New Note: Start a fresh note within NotebookLM.
Add Your Source: You can provide information from various sources like a web page, a YouTube video, a Google Doc, or even pasted text. The source material will form the basis of the discussion.
- For a YouTube Video: Copy the YouTube video link. Then, in NotebookLM, click ‘YouTube’, paste the link, and hit the big blue magic button.
Generate the Conversation: Click the button to generate the conversation. In just a few minutes, NotebookLM will produce audio of two AI voices discussing the topic from your provided source.
Download the Audio: Once the audio is generated, download it. This audio file will be used to create your video.

The output is designed to be a relaxed, human, and relatable conversational style, presenting the subject in a totally different way to the original source. You can publish this audio on platforms like Apple and Spotify, which are free and have millions of listeners.

Part 2: Turning Audio into Video with AI Avatars

Now that you have your podcast audio, you can turn it into a ‘Talking Heads’ video using AI tools. The source describes two primary methods: using HeyGen avatars or Hedra characters.

Method 1: Using HeyGen Avatars

This method involves making separate videos for each avatar with the entire podcast recording and then editing them together.

Choose Your Avatars in HeyGen:
- You’ll need one male and one female avatar.
- Option A: Pick from Library Avatars: HeyGen offers a library of avatars. Library avatars often come with different angles of the same look and are well-lit with nice interior designs, which can be advantageous if you don’t have a perfect lighting setup. For example, Vernon from the library has a nice suit and great interior, plus a second angle.
- Option B: Generate a New Photo Avatar: You can create your own original photo avatar, like Mumi, if you prefer an avatar not available in the library.
- Note on Personal Avatars: While you can make your own personal avatar and cloned voice in HeyGen, this process is more time-consuming.
Create a Video for the First Avatar:
- Once you’ve chosen an avatar (e.g., Vernon), click “Create with AI Studio”.
- Choose “Landscape” for a landscape video.
- Delete Default Script: The studio will load with a default script and voice; delete this as you’ll use your NotebookLM audio.
- Upload NotebookLM Audio: Click to upload the full podcast audio you downloaded from NotebookLM.
- Consider Time Limits: Be aware of HeyGen’s time limits based on your plan (e.g., Creator plan has a 5-minute limit, Free has 3 minutes, Team has up to 30 minutes). If your audio is longer, you might need to process it in multiple parts.
- Submit for Processing: Give your video a name, set the resolution (e.g., 1080p), format (MP4), and ensure the HeyGen watermark is off if desired. Then click ‘Submit’. HeyGen will lip-sync the avatar to your audio, and processing usually takes a few minutes.
Create a Video for the Second Avatar: Repeat the exact same steps for your second avatar (e.g., Mumi), uploading the entire NotebookLM audio again. This method involves both people talking in each generated video.
Download Both Videos: Once processed, download both avatar videos to your system.
Edit in a Video Editor (e.g., CapCut):
- Import both video clips into your chosen video editor (CapCut is recommended as it’s powerful and has a generous free plan).
- Layer Clips: Place both clips on the timeline so they play simultaneously.
- Cut and Alternate: Make a cut where the speaker changes from one to the other. Delete the part of the track you don’t want and alternate between the two tracks until the entire video is done.
- Optional: If you used avatars with multiple angles (like Vernon), you can have two tracks for that avatar to cut between angles, though this makes editing slightly more complicated.

This process is described as the “most fiddly part” but still an incredibly fast way to go from source material to a finished podcast video.

Method 2: Using Hedra Characters

This method allows for both speakers to be on screen at the same time and requires separating the audio first.

Separate the NotebookLM Audio (using Audacity):
- Import the full NotebookLM audio into Audacity (a free and open-source audio software).
- Split the Audio: Use the keyboard shortcut Command + I (Mac) or Control + I (Windows) to split the audio at every point where the speaker changes.
- Duplicate Tracks: Use Command + D (Mac) or Control + D (Windows) to duplicate the track.
- Isolate Voices: Label one track for the first speaker (e.g., “Mumi”) and the other for the second (e.g., “Vernon”). Then, delete the surplus parts so each track only contains the voice of its respective speaker.
- Export Separate Audio: Once done, export these two separate audio files.
Create Characters in Hedra:
- Go to Hedra.
- Describe Your First Character: Use a text description to create your character (e.g., “a photo of a friendly woman wearing headphones in a recording booth, body and head both facing directly towards the camera”). Then, hit ‘Create’.
- Upload Individual Audio: In the audio section of Hedra, upload the separate audio track for that specific character (e.g., Mumi’s track for Mumi’s character).
- Generate Video: Hit ‘Generate video’.
Create Video for Second Character: Repeat the same process for your second character, describing them and uploading their individual audio track.
Bring Clips into a Video Editor (e.g., CapCut):
- Import both generated Hedra video clips into your video editor.
- Place and Separate: Place both clips on the timeline and arrange them so one is on the left and the other on the right, allowing both speakers to be on screen simultaneously.
- Play to see the effect of them working together.

This method provides a nice effect, allowing you to have both speakers visible at the same time.

And there you have it! You’ve gone from a simple source to a fully-fledged video podcast with realistic AI avatars.