In an era of increasingly capable foundation models, much of the conversation around AI has shifted from what models can do to how we guide them to do it well. One emerging paradigm in this space is agentic AI - the idea that models can operate more effectively when given autonomy to reason, reflect, and improve over time. Rather than responding once and hoping it's right, an agentic system takes a more iterative, self-critical approach.
In this post, we explore a hands-on implementation of a classic agentic workflow: Generate -> Reflect -> Refine. This reflection loop is of the building blocks for structured AI reasoning. While the use case involves generating Instagram captions for images, the real focus is on how the model evaluates its own output and iteratively improves it.
Many developers building with LLMs default to one-shot prompting. It's fast, convenient, and surprisingly effective - until it isn't.
When a task demands nuance, creativity, or contextual awareness, a single pass through a model often produces mediocre results. The model might misunderstand the context, generate something too vague or overly verbose, or miss the goal entirely.
Agentic workflows address this limitation. They make models self-aware of their outputs, capable of identifying flaws and proposing improvements. And they do this using the model's own capabilities - no additional training, no ensemble of evaluators. Just careful prompting and structured iteration.
The reflection loop works like this:
Repeat this process until either:
This structure is simple, modular, and flexible - and it can be applied to almost any generative task: code synthesis, content summarisation, query generation, even image captioning.
To make the pattern concrete, we built a small system where an AI generates Instagram captions based on an image, then reflects on how well the caption aligns with creative, formatting, and stylistic goals (e.g., wit, relevance, emojis, hashtags and rhyme - just for fun).
Let's walk through the implementation of the reflection loop using Google's Gemini API. We'll cover how to:
All code is written in TypeScript using the @google/genai
SDK.
We begin by importing the required modules and defining a simple schema for evaluation results.
import { GoogleGenAI, Type } from '@google/genai';
const ai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY });
const EvaluationStatus = {
PASS: 'PASS',
FAIL: 'FAIL',
};
const evaluationSchema = {
type: Type.OBJECT,
properties: {
evaluation: {
type: Type.STRING,
enum: [EvaluationStatus.PASS, EvaluationStatus.FAIL],
},
feedback: { type: Type.STRING },
reasoning: { type: Type.STRING },
},
required: ['evaluation', 'feedback', 'reasoning'],
};
const model = 'gemini-2.0-flash';
This function generates a caption from an image, optionally incorporating feedback from the previous evaluation.
async function generateCaption(
imageBase64: string,
feedback: string | null
): Promise<string> {
const prompt =
`You are a witty social media assistant. Write a funny Instagram caption for the image above that rhymes like a 2-line verse.
Use at least 2 emojis and 3 hashtags. Be witty and relevant.` +
(feedback ? `\nPlease revise it using this feedback: ${feedback}` : '') +
`\nReturn only one final caption. Do not include multiple options or explanations.`;
const result = await ai.models.generateContent({
model,
contents: [
{ inlineData: { mimeType: 'image/jpeg', data: imageBase64 } },
{ text: prompt },
],
});
return result.text!.trim();
}
Here we ask the model to evaluate its own caption. The prompt includes clear, strict criteria to prevent rubber-stamping. Take note that the model's response conforms the evaluationSchema
specified earlier.
async function evaluateCaption(imageBase64: string, caption: string) {
const evalPrompt = `You are a strict Instagram content reviewer. Your job is to find flaws in captions and reject anything that isn't clearly excellent.
Start by listing any weaknesses in the caption. Then determine if it meets the following criteria:
- It is witty or humorous
- It uses at least 2 emojis
- It includes at least 3 hashtag
- It is tightly relevant to the image above
- It rhymes like a 2-line verse
Be sceptical. Most captions should FAIL unless they are truly excellent.
Caption:
"${caption}"`;
const result = await ai.models.generateContent({
model,
contents: [
{ inlineData: { mimeType: 'image/jpeg', data: imageBase64 } },
{ text: evalPrompt },
],
config: {
responseMimeType: 'application/json',
responseSchema: evaluationSchema,
},
});
return JSON.parse(result.text!);
}
This is the core loop that ties it all together. It evaluates your caption, refines it based on feedback, and stops when a caption finally passes - or when we hit a max retry limit.
async function reflectionLoop() {
const photoURL = 'https://your.image.url.jpg';
const maxIterations = 3;
const response = await fetch(photoURL);
const imageArrayBuffer = await response.arrayBuffer();
const base64ImageData = Buffer.from(imageArrayBuffer).toString('base64');
console.log('\nš Starting reflection loop...');
console.log(`š¼ļø Image URL: ${photoURL}`);
// Optional: seed a first manual caption - this could come as part of a user input. In our scenario it may be coming from an interface
// where we ask someone to enter a caption for the image, we then let the LLM to review that using the reflection pattern
let caption = 'Turtle swimming in the sea';
let feedback: string | null = null;
for (let i = 1; i <= maxIterations; i++) {
console.log(`\nš Iteration ${i}`);
console.log(`\nš Current Caption:\n"${caption}"`);
const evaluation = await evaluateCaption(base64ImageData, caption);
console.log('\nš Reflection Summary:');
console.log(`- Evaluation: ${evaluation.evaluation}`);
console.log(`- Reasoning: ${evaluation.reasoning}`);
console.log(`- Feedback: ${evaluation.feedback}`);
if (evaluation.evaluation === EvaluationStatus.PASS) {
console.log('\nā
Final Caption Accepted:\n');
console.log(`"${caption}"`);
return;
}
console.log('\nāļø Refining caption based on feedback...');
feedback = evaluation.feedback;
caption = await generateCaption(base64ImageData, feedback);
}
console.log('\nā ļø Max iterations reached. Returning last attempt:\n');
console.log(`"${caption}"`);
}
Expand here for an example reflection loop output
š¼ļø Image URL: https://your.image.jpg
š --- Iteration 1 ---
š Current Caption:
"Turtle swimming in the sea"
š Reflection Summary:
- Evaluation: FAIL
- Reasoning: The caption doesn't meet the criteria for a good Instagram caption. It's too simple and doesn't add any value to the image. To improve it, you could add some humor, emojis, hashtags, and a rhyme.
- Feedback: The caption is too short and lacks wit or humor. It doesn't include emojis or hashtags, and it doesn't rhyme like a 2-line verse. It's also not very engaging or descriptive.
āļø Refining caption based on feedback...
š --- Iteration 2 ---
š Current Caption:
"Just keep swimming, don't you fret,
This shelled dude's got life all set! š¢š #TurtleTime #OceanLife #VitaminSea"
š Reflection Summary:
- Evaluation: PASS
- Reasoning: The caption meets all the requirements: it rhymes, contains emojis and hashtags, and is highly relevant to the image. It's also witty and humorous.
- Feedback: The caption is great. No improvements needed.
ā Final Caption Accepted:
"Just keep swimming, don't you fret,
This shelled dude's got life all set! š¢š #TurtleTime #OceanLife #VitaminSea"
Reflection is an underrated superpower in AI systems. It adds depth, resilience, and structure to what would otherwise be shallow, one-shot responses. And best of all: it's easy to implement.
If you're building LLM workflows, I'd encourage you to start simple - and build in reflection early. It won't just improve your outputs. It'll make your systems feel a lot more like thoughtful collaborators.