Create a Complete Video Ad with VEO3.1 in 4 Steps
The fastest path from idea to finished video ad
I’ve been itching to play with VEO3.1 more, so over the holidays, that’s what I did.
Below is a video ad I made to promote this Substack. It’s 1:44 minutes long. That’s too long for a video ad, but I wanted to stretch myself to 2 minutes to force more learning.
In putting this ad together, I used a process that is easy to follow and will likely have some helpful tips for you to copy.
First, the gotcha’s:
I ran into numerous hurdles. Unlike the flashy posts on X that promise a Hollywood movie in a single prompt, creating this video wasn’t straightforward for a non-video expert. Two of the biggest were:
I rarely managed to get the model to have Teddy and ChatGPT in the same scene with dialogue. It kept giving ChatGPT Teddy’s lines. It couldn’t understand that the computer could talk if a human was in the same scene, so eventually I gave up and had to work around that. These models are far from perfect and it can take a lot of iteration to get the output you want.
I couldn’t get the character to say the name of my website without triggering a copyright infringement. I couldn’t even get Teddy to say my name in any way, e.g., Kieran Flanagan’s Substack. I even tried adding ‘Keer-un Flan-a-gans’, but I couldn’t get this to work. That’s why there is a low quality screen shot on Teddy’s laptop with my website address.
Ok, onto what you came for, here’s my 4-step process.
1. The Idea: This is the one step you’ll struggle to outsource to AI. The overarching idea really matters. My idea came from a post I did, which got a lot of engagement across LinkedIn and Substack. I wanted to turn this into a quirky buddy sitcom from the 70s, where ChatGPT was the confident buddy who kept getting Teddy into trouble.
2. The Storyboard: Once you have an overarching idea and theme, you can brief your model of choice to help you craft a Storyboard for the Ad. I used ChatGPT 5.2 for this, but I’d 100% be testing Sonnet/Opus and Gemini a well.
I’ll say, I didn’t find any of the models good at point 1 - the idea. They can brainstorm with you, but I didn’t find any of their ideas that creative. All that to say, the value of an original idea matters a lot.
Here’s the first version of my storyboard. You can easily copy and paste this to replicate for your videos. Remember that each scene should be 8 seconds or less to fit with VEO3.1. It’s only showing the first couple of scenes. It breaks the full video out into individual scenes with the visual description and audio.
3. The Storyboard + Reference Images: This is the most critical tip. In VEO3.1, you want to prompt from ingredients to video (which is image to video), not text to video. You should create reference images for each scene that VEO3.1 can use to create the 8-second clip. This will ensure there’s consistency across your video as you stitch all 8-sec clips together.
ChatGPT can help you plan all reference images for each scene. This part is time consuming. I get ChatGPT to summarise what images I need for each 8-sec clip:
Here’s an example of a prompt I crafted with ChatGPT to create the retro 1970s living room.
Create a high-resolution image of a cozy 1970s sitcom living room set.
The room features shag carpet, a low vintage couch, a wooden coffee table, and warm earth-tone colors (browns, oranges, muted yellows). Include retro decor such as patterned curtains, a standing lamp with a fabric shade, framed wall art, and subtle wood paneling.
The space should feel like a classic 1970s television sitcom set — slightly theatrical rather than realistic. Lighting is warm and soft, like studio lighting used for multi-camera sitcoms. Add gentle film grain for a nostalgic feel.
No people visible. The room should feel ready for a character to sit on the floor or couch. The overall mood is friendly, cozy, and unmistakably 1970s.
And here’s the image
The reference images have two purposes:
Reference images for Nano Banana: You’ll use them to ensure all images created are consistent. For example, I created a master image for Teddy and then anytime I wanted to put him in different scenes, clothes, expressions, I’d create a new image by giving Nano Banana the master image of Teddy.
Reference images for VEO3.1: I’ll show this below; you’ll use them to create the 8-sec video clips.
To manage of all of this, I’d recommend transporting your storyboard to Figma (or a Figma alternative) so you can add reference images for each scene. Here’s an example of mine:
It looks messy, but it has everything you need for each VEO3.1 clip, the visual description, dialogue and reference images.
4. Create Videos: Once I had reference images, I worked with ChatGPT to create the VEO3.1 prompt per scene. I found each scene needed a bunch of tweaking to get the output I wanted. I entered the prompt and reference images into VEO3.1.
Here’s an example of a prompt ChatGPT helped me create for the opening scene:
Create an 8-second opening shot for a 1970s sitcom.
Use the provided reference images to maintain identity and styling:
1) the main character’s face, hairstyle, and clothing
2) the 1970s computer design
3) the retro 1970s living room environment
Place the main character seated on the shag carpet of the living room, with the vintage 1970s computer positioned beside him on a small table.
The shot should feel like the opening of a classic 1970s television sitcom. Use warm nostalgic studio lighting, subtle film grain, and a gentle slow zoom-in.
Add authentic 1970s sitcom audio: upbeat funky retro theme music, audience applause, and light canned laughter, as if filmed in front of a live studio audience.
At 1 second, fade in a bold retro title card reading:
“My Buddy, ChatGPT!”
The title card should appear in large yellow 1970s-style lettering. Add a cheesy sparkle or chime sound when the title appears.
Keep the title card visible for approximately 2 seconds, then fade it out while the applause and music continue.
Maintain character and prop consistency throughout the shot. End the clip with the slow zoom still continuing, preserving the classic sitcom intro aesthetic for the full 8-second duration.
You can see it directly references the images for the scene. You’ll select ‘ingredients to video’, which allows you to include the images (3 max).
Expect to iterate on the prompt and clips a lot here.
That’s really it, not a complex process, but the storyboard + reference images + ChatGPT helping with the prompt will speed up your progress.
I used iMovie to edit it. I am not an experienced video editor, and there are likely many better tools!
If you make something and post on LinkedIn, be sure to tag me.
Happy AI’fying,
Kieran










Well done Kieren.
Thanks for all the time you spent.
You saved me a ton of it.
LOL this is amazing, i can’t wait to noodle on this