Skip to main content

Building with Reflection: A Practical Agentic AI Workflow

7 min read

Most of the conversation around AI has shifted from what models can do to how we guide them to do it well. One emerging paradigm in this space is agentic AI: models operating with autonomy to reason, reflect, and improve over time. Rather than responding once and hoping for the best, an agentic system takes an iterative, self-critical approach.

This post walks through a hands-on implementation of a classic agentic workflow: Generate, Reflect, Refine. The use case is generating Instagram captions for images, but the real focus is on how the model evaluates its own output and iteratively improves it.

Why Reflection?

Many developers building with LLMs default to one-shot prompting. It’s fast, convenient, and surprisingly effective, until it isn’t.

When a task demands nuance, creativity, or contextual awareness, a single pass through a model often produces mediocre results. The model might misunderstand the context, generate something too vague or too verbose, or miss the goal entirely.

Agentic workflows address this. They make models self-aware of their outputs, capable of spotting flaws and proposing improvements. And they do this using the model’s own capabilities: no extra training, no ensemble of evaluators. Just careful prompting and structured iteration.

The Reflection Loop Pattern

The reflection loop works like this:

  • Generate: The model produces an initial output. (Optionally a sample output can be sent to the model which can later be corrected, if needed, by the LLM)
  • Reflect: The model evaluates that output against a set of well-defined criteria.
  • Refine: The model generates a new version, incorporating feedback from the evaluation step.

Repeat this process until either:

  • The output meets all criteria.
  • A maximum number of iterations is reached.

Simple, modular, and flexible. You can apply it to almost any generative task: code synthesis, content summarisation, query generation, even image captioning.

A Real-World Implementation

To make the pattern concrete, we built a small system where an AI generates Instagram captions based on an image, then reflects on how well the caption aligns with creative, formatting, and stylistic goals (e.g., wit, relevance, emojis, hashtags and rhyme, just for fun).

Here’s what we’ll cover:

  • Load and encode an image
  • Generate a caption based on that image
  • Evaluate the caption using strict criteria
  • Iterate with feedback until success

All code is written in TypeScript using the @google/genai SDK.

1. Setup

We start by importing the required modules and defining a simple schema for evaluation results.

import { GoogleGenAI, Type } from '@google/genai';

const ai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY });

const EvaluationStatus = {
  PASS: 'PASS',
  FAIL: 'FAIL',
};

const evaluationSchema = {
  type: Type.OBJECT,
  properties: {
    evaluation: {
      type: Type.STRING,
      enum: [EvaluationStatus.PASS, EvaluationStatus.FAIL],
    },
    feedback: { type: Type.STRING },
    reasoning: { type: Type.STRING },
  },
  required: ['evaluation', 'feedback', 'reasoning'],
};

const model = 'gemini-2.0-flash';

2. Caption Generation

This function generates a caption from an image, optionally incorporating feedback from the previous evaluation.

async function generateCaption(
  imageBase64: string,
  feedback: string | null
): Promise<string> {
  const prompt =
    `You are a witty social media assistant. Write a funny Instagram caption for the image above that rhymes like a 2-line verse.
Use at least 2 emojis and 3 hashtags. Be witty and relevant.` +
    (feedback ? `\nPlease revise it using this feedback: ${feedback}` : '') +
    `\nReturn only one final caption. Do not include multiple options or explanations.`;

  const result = await ai.models.generateContent({
    model,
    contents: [
      { inlineData: { mimeType: 'image/jpeg', data: imageBase64 } },
      { text: prompt },
    ],
  });

  return result.text!.trim();
}

3. Caption Evaluation (Reflection Step)

Here we ask the model to evaluate its own caption. The prompt includes clear, strict criteria to prevent rubber-stamping. The model’s response conforms to the evaluationSchema specified earlier.

async function evaluateCaption(imageBase64: string, caption: string) {
  const evalPrompt = `You are a strict Instagram content reviewer. Your job is to find flaws in captions and reject anything that isn't clearly excellent.

Start by listing any weaknesses in the caption. Then determine if it meets the following criteria:

- It is witty or humorous
- It uses at least 2 emojis
- It includes at least 3 hashtag
- It is tightly relevant to the image above
- It rhymes like a 2-line verse

Be sceptical. Most captions should FAIL unless they are truly excellent.

Caption:
"${caption}"`;

  const result = await ai.models.generateContent({
    model,
    contents: [
      { inlineData: { mimeType: 'image/jpeg', data: imageBase64 } },
      { text: evalPrompt },
    ],
    config: {
      responseMimeType: 'application/json',
      responseSchema: evaluationSchema,
    },
  });

  return JSON.parse(result.text!);
}

4. Reflection Loop

This is the core loop that ties it all together. It evaluates your caption, refines it based on feedback, and stops when a caption passes, or when we hit a max retry limit.

async function reflectionLoop() {
  const photoURL = 'https://your.image.url.jpg';
  const maxIterations = 3;

  const response = await fetch(photoURL);
  const imageArrayBuffer = await response.arrayBuffer();
  const base64ImageData = Buffer.from(imageArrayBuffer).toString('base64');

  console.log('\n🚀 Starting reflection loop...');
  console.log(`🖼️ Image URL: ${photoURL}`);

  // Optional: seed a first manual caption - this could come as part of a user input. In our scenario it may be coming from an interface
  // where we ask someone to enter a caption for the image, we then let the LLM to review that using the reflection pattern
  let caption = 'Turtle swimming in the sea';
  let feedback: string | null = null;

  for (let i = 1; i <= maxIterations; i++) {
    console.log(`\n🔁 Iteration ${i}`);
    console.log(`\n📝 Current Caption:\n"${caption}"`);

    const evaluation = await evaluateCaption(base64ImageData, caption);

    console.log('\n📋 Reflection Summary:');
    console.log(`- Evaluation: ${evaluation.evaluation}`);
    console.log(`- Reasoning: ${evaluation.reasoning}`);
    console.log(`- Feedback: ${evaluation.feedback}`);

    if (evaluation.evaluation === EvaluationStatus.PASS) {
      console.log('\n✅ Final Caption Accepted:\n');
      console.log(`"${caption}"`);
      return;
    }

    console.log('\n✏️ Refining caption based on feedback...');
    feedback = evaluation.feedback;
    caption = await generateCaption(base64ImageData, feedback);
  }

  console.log('\n⚠️ Max iterations reached. Returning last attempt:\n');
  console.log(`"${caption}"`);
}

Here’s the reflection loop in action. Notice how the model rejects its own first attempt and refines until the caption passes:

$

Conclusion

Reflection is an underrated capability in AI systems. It bolts on depth, resilience, and structure to what would otherwise be shallow, one-shot responses. And it’s easy to implement.

If you’re building LLM workflows, start simple and wire in reflection early. It won’t just improve your outputs. It’ll make your systems feel a lot more like thoughtful collaborators.