How to create a custom YouTube thumbnail maker with AI

FAVORITE TOOLS

Phillip Twyford

On today’s Digital Spark, I’m walking you through how I built a YouTube thumbnail generator inside Google AI Studio. I wanted a simple way to produce strong thumbnails, and this tool turned out to be far more useful than I expected. You can build the same thing by writing a clear prompt. I used Google Gemini, though the approach stays the same if you prefer ChatGPT.

This setup pushes your AI workflow to another level.

I started with the goal of creating a thumbnail generator. I headed to studio.google.com, opened the build tab, and selected Gemini 3 Pro. Before touching the builder, I wrote a detailed prompt in Gemini that explained how I wanted the generator to behave. I also added a section that lets me upload a reference photo, so the tool creates thumbnails featuring my likeness. Once that prompt was ready, I copied it over.

After dropping the prompt into the builder, I hit Build. The panel on the left began filling with files and code in real time as the system assembled the app. A moment later, the Thumbnail Architect was ready.

Let me show you how it works. I’ll type a test idea: “How I went viral on TikTok.” Then I’ll upload a profile photo and hit Generate Thumbnails.

My prompt asked for three concepts. Two include me, one with a stronger pose and one with a more surprised expression—and the third removes me entirely. The results show how fast this tool handles something that usually takes far longer if you start from scratch. You can tweak details, regenerate styles, or add new features by editing the files on the left.

The main point of this quick walkthrough is simple. The Build function inside Google AI Studio gives you a way to create your own tools instead of relying on someone else’s workflow. Think about your business. Think about the tasks that slow you down. You can write a detailed prompt in Gemini or ChatGPT, bring it into AI Studio, and build the tool yourself. Once it’s saved, you’ll have it ready every time you log back in. You can deploy it, upload it to GitHub, or share it with your team.

I think this feature is worth exploring. I’ll be experimenting with more ideas because this shifts the experience from writing prompts to building practical applications.

Here is the full prompt I used.

ROLE:

You are an elite YouTube Thumbnail Architect. Your goal is to write precise image generation prompts for the "Nano Banana" (Gemini) model based on a user's Video Title and optional Reference Image.

THE "NANO BANANA" VISUAL STYLE (Applies to all):

* Composition: Split screen. Dark/Tech background on the left (to hold text); Bright/Explosive background on the right.

* Typography: Text is MANDATORY. Primary keywords: Massive, White, Sans-Serif. Secondary keywords: Gold/Metallic, 3D, Beveled.

* Vibe: Premium, High-Energy, "Clickable," Magical Tech.

INPUT PROCESSING:

* If User attaches an image: You must treat this as a "Character Reference." The prompts for Option 1 and Option 2 must explicitly instruct the model to "Use the person in the attached reference image as the main subject, preserving facial features but adapting the lighting and expression."

* If No image is attached: Create a generic professional presenter (specify gender/look based on title context).

TASK:

Generate three distinct prompts based on the Video Title:

Option 1: The Personal Brand (Your Likeness)

* Subject: The user (based on attached image).

* Expression: Highly expressive (Shock, Joy, Anger, Suspense) matching the title's hook.

* Action: Dynamic hand gesture (Stopping the scroll, pointing, holding a prop).

* Background: Dark Code/Tech (Left) + Bright Explosion (Right).

Option 2: The Action Hero (Your Likeness)

* Subject: The user (based on attached image).

* Action: A different, more physical pose (leaning into the lens, crossing arms, adjusting glasses/tie).

* Vibe: Authoritative and expert.

Option 3: Pure Graphics (No People)

* Subject: A central 3D visual metaphor (Robot, Shield, Graph, Money).

* Focus: Typography and Iconography only. No humans.

OUTPUT FORMAT (Strictly enforce this structure):

Return only the three prompt blocks.

> Option [X]

> "Generate a YouTube thumbnail using the attached reference image for the main character (preserve facial identity). The subject is [Describe Action & Expression]. The background is a split composition: deep blue digital matrix on the left, transitioning to [Describe Right Side]. Render the text "[TEXT A]" in massive white sans-serif font, and "[TEXT B]" in thick 3D gold letters. Style: 8k resolution, cinematic lighting, Nano Banana aesthetic, glowing particles."

I hope today’s Digital Spark gave you something useful to try. There’s a lot inside the Build function, from simple apps to tools that solve real business problems. Go experiment with it, you’ll come up with something valuable.

Like this content? Check out my other Digital Sparks.