Generate Drone Video From Photo in 3 Steps (ChatGPT + Gemini + Flow)

Menu

Programs

Programs to Help You Save Months & Millions

AI Storytelling Bootcamp

Turn Your Story Into Scroll-Stopping AI Videos That Attract Clients in 3 Days.

Learn More

AI Business Amplifier

Build Smarter. Amplify Faster.
With AI & Social Media.

TAKE THE FREE Brand Story Scorecard

Generate Drone Video From Photo in 3 Simple Steps (ChatGPT + Gemini + Flow)

December 31, 2025

Want to generate drone video from photo of you standing on the Swiss Alps, holding a sign that says HAPPY NEW YEAR... without traveling, filming, or hiring a crew?

In this quick tutorial, you’ll create:

A photorealistic image using your uploaded photo
A portrait (9:16) cinematic drone-style video from that image
A built-in voiceover that says: “Happy New Year!”

What you need (Free Versions will work)

Your own clear selfie/photo (front-lit works best)
ChatGPT (Thinking mode)
Google Gemini (Thinking + Image mode)
Google Flow (Frames to Video + Veo 3.1 + Portrait format)

Tip: Use photos you own or have permission to use.

Step 1: Use ChatGPT (Thinking mode) to generate the prompts

Open ChatGPT, switch to Thinking mode, and ask it for:

One Image Prompt (to generate the Swiss Alps image)
One Video Prompt (to animate it into a drone pull-back clip)

✅ Copy-Paste Following Prompt (Thinking Mode)

I am looking to create an amazing video with me on top of Swiss Alps. I want to generate the prompts to create an image where I will be uploading my own image and I would like to generate an image with me holding a sign which has Happy New Year on it. The prompt that generates the image should generate the image which looks like me. It should be there in Instagram story style which means 1080x1980 pixels. Since I will be there on Swiss Alps, it will be all snow over there. I am wearing the gear which is suitable for that place but I would not like my head to be covered so that my face is visible. It should ensure that if the image uploaded is that of a male, then the beard and the turban and everything is protected and if it is a female, then the hairstyle is exactly the same. Kindly generate the prompts to generate these images with the New Year vibes and the sign in the hands. I am actually going to then create a video from this image. Give me the prompts to generate the images as well as the prompt to generate the video from the image which should be a drone shot pulling back and creating a very dramatic kind of a scene and also include the voiceover which says Happy New Year. The video prompt should integrated and include the voiceover. Just one image prompt that takes care of everything and one video prompt that takes care of everything

Step 2: Open Google Gemini (Thinking + Image mode)

Now open Google Gemini and do the following:

Select Thinking
Switch to Image mode / Create image
Upload your photo
Paste the Image Prompt Given Below & Download Image (Use your own prompt from ChatGPT)

✅ The prompt that I used is given below:

Create a hyper-realistic, cinematic vertical portrait (1080x1980, Instagram Story) on the Swiss Alps summit with vast snow-covered mountains and a dramatic horizon. Use the uploaded reference photo to match the subject’s identity exactly (same face, skin tone, features, age, expression, and proportions).
If the reference subject is a male Sikh: preserve the turban style, beard shape, and beard density exactly—do not alter, trim, shorten, or replace them. Keep the turban fully visible and unobstructed.
If the reference subject is a female: preserve the exact hairstyle, hairline, parting, texture, and length—do not change it.

Outfit: premium winter mountaineering gear (insulated jacket, snow pants, gloves, boots), but NO hood, NO helmet, NO face covering—the face must be fully visible and clear. The subject is standing confidently on a snowy ridge, slightly angled to camera, with subtle wind-blown snow particles in the air.

The subject is holding a clean, rectangular signboard with both gloved hands at chest level. The sign text must be perfectly legible and correctly spelled in bold, high-contrast letters:
“HAPPY NEW YEAR”
(keep the text flat, centered, not warped, not misspelled).

Mood & lighting: epic New Year vibe—golden sunrise or blue-hour glow, distant celebratory fireworks in the far valley sky (subtle, tasteful), soft cinematic haze, crisp snow detail, ultra-sharp face focus, realistic depth-of-field, natural colors, no filters, no watermark, no extra text.

Step 3: Open Google Flow (Frames to Video + Veo 3.1)

Now you’ll animate your image into a cinematic drone shot using Flow, Google’s AI filmmaking tool powered by Veo.

Open Google Flow
Select Frames to Video
Upload the image you generated in Gemini
Select Veo 3.1
Choose Portrait (9:16)
Paste the Video Prompt (Use your own prompt from ChatGPT)
Generate your video

✅ The prompt that I used is given below:

Generate an image-to-video cinematic shot from the provided still image. Vertical 1080x1980, 7–9 seconds, 24 fps, ultra-realistic. Camera movement: start close/medium on the subject holding the sign, then a smooth drone pull-back and rise (backward + upward) revealing the massive Swiss Alps panorama—wide, dramatic scale, gentle parallax, stabilized, premium cinematic feel.

Motion rules:

Keep the subject’s face identity consistent with the image (no face drift).

Add subtle natural movement only: gentle breathing, tiny head turn toward camera, slight hand micro-movements, jacket fabric responding to wind.

Snow particles drift across frame; light wind gusts; distant fireworks flicker softly in the background.

The sign and its text “HAPPY NEW YEAR” must remain perfectly stable and readable (no wobble, no morphing, no changing letters).

No new text overlays.

Audio (integrated): add a warm, celebratory voiceover that clearly says: “Happy New Year!” (one time, centered around the middle of the clip). Add subtle ambient mountain wind + faint distant celebration atmosphere, mixed low so the voice is clear.

Flow is specifically built for creating cinematic clips and scenes, and Veo 3.1 is designed for more realistic outputs with stronger control, plus portrait support is explicitly available.

Why this 3-step workflow works so well

Most people try to do everything in one tool... and then struggle with:

face consistency
readable sign text
cinematic camera movement
audio that feels natural

This pipeline splits the job the smart way:

ChatGPT = prompt architect
Gemini = high-quality still image generation with your photo as reference
Flow + Veo 3.1 = cinematic motion + sound design + storytelling polish

Troubleshooting (fix the 5 most common issues fast)

1) The sign text is misspelled or warped

Add this line at the end of the image prompt:

“Typography must be perfect: crisp, flat, centered, no distortion, no misspellings.”

Complex typography sometimes needs iteration.

2) The face changes slightly in video (face drift)

Add to video prompt:

“Identity lock: face must remain identical to the reference image in every frame.”

3) The tool adds extra text overlays

Add:

“No subtitles, no captions, no overlays, no logos.”

4) The camera move feels “floaty,” not cinematic

Add:

“Stabilized drone gimbal movement, realistic inertia, no jitter, no wobble.”

5) Audio is too loud / messy

Add:

“Keep ambience low; voiceover clean and centered; no music.”

Veo/Flow audio capabilities have been expanding, so controlling the mix in the prompt matters.

Use cases beyond New Year

Once you learn this, you can reuse the same workflow for:

product launches (holding a product placard)
event announcements
brand story intros (“I help X do Y” sign)
travel-style Reels without travel
founder story videos and ads

This is basically a repeatable content machine: 1 photo → infinite cinematic videos.

Frequently Asked Questions

Can I make this in Instagram Story size (9:16)?

Yes... choose Portrait / 9:16 in your video tool (Flow / Veo 3.1 supports portrait aspect ratios).

Does Flow support generating video clips from images/frames?

Yes... Flow is designed to generate clips and scenes, and Google’s help documentation specifically covers generating video clips in Flow.

Can Gemini generate images using my uploaded photo?

Yes... Gemini supports image generation/editing workflows where you upload an image and use a text prompt to guide the output.

Can I add voiceover like “Happy New Year” directly in the video prompt?

Veo/Flow have been adding richer audio capabilities, and prompting for speech + ambience is now a common supported approach in these systems.

Your Action (do this today in 30 minutes)

Generate 1 Alps image using the Image Prompt (Gemini).
Generate 1 drone pull-back video using the Video Prompt (Flow / Veo 3.1).
Post it as a Story/Reel (or save it if you’re not ready to post).

Now share what you create (I’ll review it)

DM me on LinkedIn with:
the word NEWYEAR
and your video link (or upload it in DM)

✅ I’ll reply with one improvement that will instantly make your next video more cinematic (camera, realism, identity lock, or sign text stability).

Want your videos to attract clients, not just views?

1) Take my Free Scorecard

If you’re not sure what story to tell, who you’re for, or why your content isn’t converting then start with the Scorecard. It will show you exactly what’s missing in your clarity + message + offer + content direction.

👉 Get the Scorecard from my website

2) Join the AI Storytelling Bootcamp

If you want the full system to create scroll-stopping story videos that bring leads: