Want to generate drone video from photo of you standing on the Swiss Alps, holding a sign that says HAPPY NEW YEAR... without traveling, filming, or hiring a crew?
In this quick tutorial, you’ll create:
Tip: Use photos you own or have permission to use.
Open ChatGPT, switch to Thinking mode, and ask it for:
✅ Copy-Paste Following Prompt (Thinking Mode)
I am looking to create an amazing video with me on top of Swiss Alps. I want to generate the prompts to create an image where I will be uploading my own image and I would like to generate an image with me holding a sign which has Happy New Year on it. The prompt that generates the image should generate the image which looks like me. It should be there in Instagram story style which means 1080x1980 pixels. Since I will be there on Swiss Alps, it will be all snow over there. I am wearing the gear which is suitable for that place but I would not like my head to be covered so that my face is visible. It should ensure that if the image uploaded is that of a male, then the beard and the turban and everything is protected and if it is a female, then the hairstyle is exactly the same. Kindly generate the prompts to generate these images with the New Year vibes and the sign in the hands. I am actually going to then create a video from this image. Give me the prompts to generate the images as well as the prompt to generate the video from the image which should be a drone shot pulling back and creating a very dramatic kind of a scene and also include the voiceover which says Happy New Year. The video prompt should integrated and include the voiceover. Just one image prompt that takes care of everything and one video prompt that takes care of everything
Now open Google Gemini and do the following:
✅ The prompt that I used is given below:
Create a hyper-realistic, cinematic vertical portrait (1080x1980, Instagram Story) on the Swiss Alps summit with vast snow-covered mountains and a dramatic horizon. Use the uploaded reference photo to match the subject’s identity exactly (same face, skin tone, features, age, expression, and proportions).
If the reference subject is a male Sikh: preserve the turban style, beard shape, and beard density exactly—do not alter, trim, shorten, or replace them. Keep the turban fully visible and unobstructed.
If the reference subject is a female: preserve the exact hairstyle, hairline, parting, texture, and length—do not change it.
Outfit: premium winter mountaineering gear (insulated jacket, snow pants, gloves, boots), but NO hood, NO helmet, NO face covering—the face must be fully visible and clear. The subject is standing confidently on a snowy ridge, slightly angled to camera, with subtle wind-blown snow particles in the air.
The subject is holding a clean, rectangular signboard with both gloved hands at chest level. The sign text must be perfectly legible and correctly spelled in bold, high-contrast letters:
“HAPPY NEW YEAR”
(keep the text flat, centered, not warped, not misspelled).
Mood & lighting: epic New Year vibe—golden sunrise or blue-hour glow, distant celebratory fireworks in the far valley sky (subtle, tasteful), soft cinematic haze, crisp snow detail, ultra-sharp face focus, realistic depth-of-field, natural colors, no filters, no watermark, no extra text.
Now you’ll animate your image into a cinematic drone shot using Flow, Google’s AI filmmaking tool powered by Veo.
✅ The prompt that I used is given below:
Generate an image-to-video cinematic shot from the provided still image. Vertical 1080x1980, 7–9 seconds, 24 fps, ultra-realistic. Camera movement: start close/medium on the subject holding the sign, then a smooth drone pull-back and rise (backward + upward) revealing the massive Swiss Alps panorama—wide, dramatic scale, gentle parallax, stabilized, premium cinematic feel.
Motion rules:
Keep the subject’s face identity consistent with the image (no face drift).
Add subtle natural movement only: gentle breathing, tiny head turn toward camera, slight hand micro-movements, jacket fabric responding to wind.
Snow particles drift across frame; light wind gusts; distant fireworks flicker softly in the background.
The sign and its text “HAPPY NEW YEAR” must remain perfectly stable and readable (no wobble, no morphing, no changing letters).
No new text overlays.
Audio (integrated): add a warm, celebratory voiceover that clearly says: “Happy New Year!” (one time, centered around the middle of the clip). Add subtle ambient mountain wind + faint distant celebration atmosphere, mixed low so the voice is clear.
Flow is specifically built for creating cinematic clips and scenes, and Veo 3.1 is designed for more realistic outputs with stronger control, plus portrait support is explicitly available.
Most people try to do everything in one tool... and then struggle with:
This pipeline splits the job the smart way:
1) The sign text is misspelled or warped
Add this line at the end of the image prompt:
“Typography must be perfect: crisp, flat, centered, no distortion, no misspellings.”
Complex typography sometimes needs iteration.
2) The face changes slightly in video (face drift)
Add to video prompt:
“Identity lock: face must remain identical to the reference image in every frame.”
3) The tool adds extra text overlays
Add:
“No subtitles, no captions, no overlays, no logos.”
4) The camera move feels “floaty,” not cinematic
Add:
“Stabilized drone gimbal movement, realistic inertia, no jitter, no wobble.”
5) Audio is too loud / messy
Add:
“Keep ambience low; voiceover clean and centered; no music.”
Veo/Flow audio capabilities have been expanding, so controlling the mix in the prompt matters.
Once you learn this, you can reuse the same workflow for:
This is basically a repeatable content machine: 1 photo → infinite cinematic videos.
Can I make this in Instagram Story size (9:16)?
Yes... choose Portrait / 9:16 in your video tool (Flow / Veo 3.1 supports portrait aspect ratios).
Does Flow support generating video clips from images/frames?
Yes... Flow is designed to generate clips and scenes, and Google’s help documentation specifically covers generating video clips in Flow.
Can Gemini generate images using my uploaded photo?
Yes... Gemini supports image generation/editing workflows where you upload an image and use a text prompt to guide the output.
Can I add voiceover like “Happy New Year” directly in the video prompt?
Veo/Flow have been adding richer audio capabilities, and prompting for speech + ambience is now a common supported approach in these systems.
Now share what you create (I’ll review it)
✅ I’ll reply with one improvement that will instantly make your next video more cinematic (camera, realism, identity lock, or sign text stability).
1) Take my Free Scorecard
If you’re not sure what story to tell, who you’re for, or why your content isn’t converting then start with the Scorecard. It will show you exactly what’s missing in your clarity + message + offer + content direction.
👉 Get the Scorecard from my website
2) Join the AI Storytelling Bootcamp
If you want the full system to create scroll-stopping story videos that bring leads:
👉 Join the AI Storytelling Bootcamp