Building with Reflection: A Practical Agentic AI Workflow

In an era of increasingly capable foundation models, much of the conversation around AI has shifted from what models can do to how we guide them to do it well. One emerging paradigm in this space is agentic AI - the idea that models can operate more effectively when given autonomy to reason, reflect, and improve over time. Rather than responding once and hoping it's right, an agentic system takes a more iterative, self-critical approach.

In this post, we explore a hands-on implementation of a classic agentic workflow: Generate -> Reflect -> Refine. This reflection loop is of the building blocks for structured AI reasoning. While the use case involves generating Instagram captions for images, the real focus is on how the model evaluates its own output and iteratively improves it.

Why Reflection?

Many developers building with LLMs default to one-shot prompting. It's fast, convenient, and surprisingly effective - until it isn't.

When a task demands nuance, creativity, or contextual awareness, a single pass through a model often produces mediocre results. The model might misunderstand the context, generate something too vague or overly verbose, or miss the goal entirely.

Agentic workflows address this limitation. They make models self-aware of their outputs, capable of identifying flaws and proposing improvements. And they do this using the model's own capabilities - no additional training, no ensemble of evaluators. Just careful prompting and structured iteration.

The Reflection Loop Pattern

The reflection loop works like this:

  • Generate: The model produces an initial output. (Optionally a sample output can be sent to the model which can later on be corrected, if needed, by the LLM)
  • Reflect: The model evaluates that output against a set of well-defined criteria.
  • Refine: The model generates a new version, incorporating feedback from the evaluation step.

Repeat this process until either:

  • The output meets all criteria.
  • A maximum number of iterations is reached.

This structure is simple, modular, and flexible - and it can be applied to almost any generative task: code synthesis, content summarisation, query generation, even image captioning.

A Real-World Implementation

To make the pattern concrete, we built a small system where an AI generates Instagram captions based on an image, then reflects on how well the caption aligns with creative, formatting, and stylistic goals (e.g., wit, relevance, emojis, hashtags and rhyme - just for fun).

Let's walk through the implementation of the reflection loop using Google's Gemini API. We'll cover how to:

  • Load and encode an image
  • Generate a caption based on that image
  • Evaluate the caption using strict criteria
  • Iterate with feedback until success

All code is written in TypeScript using the @google/genai SDK.

1. Setup

We begin by importing the required modules and defining a simple schema for evaluation results.

import { GoogleGenAI, Type } from '@google/genai';

const ai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY });

const EvaluationStatus = {
PASS: 'PASS',
FAIL: 'FAIL',
};

const evaluationSchema = {
type: Type.OBJECT,
properties: {
evaluation: {
type: Type.STRING,
enum: [EvaluationStatus.PASS, EvaluationStatus.FAIL],
},
feedback: { type: Type.STRING },
reasoning: { type: Type.STRING },
},
required: ['evaluation', 'feedback', 'reasoning'],
};

const model = 'gemini-2.0-flash';

2. Caption Generation

This function generates a caption from an image, optionally incorporating feedback from the previous evaluation.

async function generateCaption(
imageBase64: string,
feedback: string | null
): Promise<string> {
const prompt =
`You are a witty social media assistant. Write a funny Instagram caption for the image above that rhymes like a 2-line verse.
Use at least 2 emojis and 3 hashtags. Be witty and relevant.
`
+
(feedback ? `\nPlease revise it using this feedback: ${feedback}` : '') +
`\nReturn only one final caption. Do not include multiple options or explanations.`;

const result = await ai.models.generateContent({
model,
contents: [
{ inlineData: { mimeType: 'image/jpeg', data: imageBase64 } },
{ text: prompt },
],
});

return result.text!.trim();
}

3. Caption Evaluation (Reflection Step)

Here we ask the model to evaluate its own caption. The prompt includes clear, strict criteria to prevent rubber-stamping. Take note that the model's response conforms the evaluationSchema specified earlier.

async function evaluateCaption(imageBase64: string, caption: string) {
const evalPrompt = `You are a strict Instagram content reviewer. Your job is to find flaws in captions and reject anything that isn't clearly excellent.

Start by listing any weaknesses in the caption. Then determine if it meets the following criteria:

- It is witty or humorous
- It uses at least 2 emojis
- It includes at least 3 hashtag
- It is tightly relevant to the image above
- It rhymes like a 2-line verse

Be sceptical. Most captions should FAIL unless they are truly excellent.

Caption:
"
${caption}"`
;

const result = await ai.models.generateContent({
model,
contents: [
{ inlineData: { mimeType: 'image/jpeg', data: imageBase64 } },
{ text: evalPrompt },
],
config: {
responseMimeType: 'application/json',
responseSchema: evaluationSchema,
},
});

return JSON.parse(result.text!);
}

4. Reflection Loop

This is the core loop that ties it all together. It evaluates your caption, refines it based on feedback, and stops when a caption finally passes - or when we hit a max retry limit.

async function reflectionLoop() {
const photoURL = 'https://your.image.url.jpg';
const maxIterations = 3;

const response = await fetch(photoURL);
const imageArrayBuffer = await response.arrayBuffer();
const base64ImageData = Buffer.from(imageArrayBuffer).toString('base64');

console.log('\nšŸš€ Starting reflection loop...');
console.log(`šŸ–¼ļø Image URL: ${photoURL}`);

// Optional: seed a first manual caption - this could come as part of a user input. In our scenario it may be coming from an interface
// where we ask someone to enter a caption for the image, we then let the LLM to review that using the reflection pattern
let caption = 'Turtle swimming in the sea';
let feedback: string | null = null;

for (let i = 1; i <= maxIterations; i++) {
console.log(`\nšŸ” Iteration ${i}`);
console.log(`\nšŸ“ Current Caption:\n"${caption}"`);

const evaluation = await evaluateCaption(base64ImageData, caption);

console.log('\nšŸ“‹ Reflection Summary:');
console.log(`- Evaluation: ${evaluation.evaluation}`);
console.log(`- Reasoning: ${evaluation.reasoning}`);
console.log(`- Feedback: ${evaluation.feedback}`);

if (evaluation.evaluation === EvaluationStatus.PASS) {
console.log('\nāœ… Final Caption Accepted:\n');
console.log(`"${caption}"`);
return;
}

console.log('\nāœļø Refining caption based on feedback...');
feedback = evaluation.feedback;
caption = await generateCaption(base64ImageData, feedback);
}

console.log('\nāš ļø Max iterations reached. Returning last attempt:\n');
console.log(`"${caption}"`);
}
Expand here for an example reflection loop output

šŸ–¼ļø Image URL: https://your.image.jpg

šŸ” --- Iteration 1 ---

šŸ“ Current Caption:
"Turtle swimming in the sea"

šŸ“‹ Reflection Summary:
- Evaluation: FAIL
- Reasoning: The caption doesn't meet the criteria for a good Instagram caption. It's too simple and doesn't add any value to the image. To improve it, you could add some humor, emojis, hashtags, and a rhyme.
- Feedback: The caption is too short and lacks wit or humor. It doesn't include emojis or hashtags, and it doesn't rhyme like a 2-line verse. It's also not very engaging or descriptive.

āœļø Refining caption based on feedback...

šŸ” --- Iteration 2 ---

šŸ“ Current Caption:
"Just keep swimming, don't you fret,
This shelled dude's got life all set! 🐢🌊 #TurtleTime #OceanLife #VitaminSea"

šŸ“‹ Reflection Summary:
- Evaluation: PASS
- Reasoning: The caption meets all the requirements: it rhymes, contains emojis and hashtags, and is highly relevant to the image. It's also witty and humorous.
- Feedback: The caption is great. No improvements needed.

āœ… Final Caption Accepted:

"Just keep swimming, don't you fret,
This shelled dude's got life all set! 🐢🌊 #TurtleTime #OceanLife #VitaminSea"

Conclusion

Reflection is an underrated superpower in AI systems. It adds depth, resilience, and structure to what would otherwise be shallow, one-shot responses. And best of all: it's easy to implement.

If you're building LLM workflows, I'd encourage you to start simple - and build in reflection early. It won't just improve your outputs. It'll make your systems feel a lot more like thoughtful collaborators.