What does this writing signal mean?

OpenAI published Thinking with images. This talking signal gives public context for research themes, product direction, policy, or launch framing. High-signal details: Substantive multimodal AI research post · Thinking with images | OpenAI April 16, 2025 Thinking with images Loading… Share OpenAI o3 and o4-mini are the latest visual reasoning models in our o-series. For the.... onlylabs links this event to 1 captured evidence page and 6 related writing signals.

OpenAI Writing: Thinking with images

Captured source

source ↗

openai.com/openai.com/index/thinking-with-images

Thinking with images

Source ↗

published Apr 16, 2025seen 6dcaptured 2dhttp 200method exa

Thinking with images | OpenAI

April 16, 2025

Thinking with images

Loading…

OpenAI o3 and o4-mini are the latest visual reasoning models in our o-series. For the first time, our models can think with images in their chain-of-thought—not just see them.

Similar to our earlier OpenAI o1 model, o3 and o4-mini are trained to think for longer before answering—and use a long internal chain of thought before responding to the user. o3 and o4-mini further extend this capability by thinking with images in their chain-of-thought, which is achieved by transforming user uploaded images with tools, allowing them to crop, zoom in, and rotate, in addition to other simple image processing techniques. More importantly, these capabilities come natively, without relying on separate specialized models.

ChatGPT’s enhanced visual intelligence helps you solve tougher problems by analyzing images more thoroughly, accurately, and reliably than ever before. It can seamlessly combine advanced reasoning with tools like web search and image manipulation—automatically zooming, cropping, flipping, or enhancing your images—to extract insights even from imperfect photos. For example, you can upload a photo of an economics problem set to receive step-by-step explanations, or share a screenshot of a build error to quickly get a root-cause analysis.

This approach enables a new axis for test-time compute scaling that seamlessly blends visual and textual reasoning, as reflected in their state-of-the-art performance across multimodal benchmarks, marking a significant step toward multimodal reasoning.

Visual reasoning in action

Thinking with images allows you to interact with ChatGPT more easily. You can ask questions by taking a photo without worrying about the positioning of objects—whether the text is upside down or there are multiple physics problems in one photo. Even if objects are not obvious at first glance, visual reasoning allows the model to zoom in to see more clearly.

All examples were completed with OpenAI o3.

What is written on the notebook?

Reasoned about notebook contents for 20 seconds

Analyzed image

The user posted an image at /mnt/data/own_notebook.jpg and asked me to read what's written on it. I'll need to load the image so I can inspect the text. Once I view it, I realize the text is upside down, so I’ll rotate it so it's readable.From there, I can check what's written and share my findings clearly with the user.

Analyzed image

The user shared an image found at /mnt/data/own_notebook.jpg and asked me to read the text on it. I’ll load the image and zoom in to see the writing clearly. It turns out the text is upside down, so I’ll rotate it for easy reading. Afterward, I’ll crop the lower part of the image to focus on the relevant area, and rotate the cropped section to ensure the text is right side up.

Analyzed image

Done

It says: “4th February – finish roadmap.”

Our latest visual reasoning models work in tandem with other tools like Python data analysis, web search, image generation to creatively and effectively solve more complex problems, delivering our first multimodal agentic experience to users.

solve this maze and plot a path with red-colored line

Reasoned about maze solution and path for 1m 44s

Analyzed image

The user provided an image path, but I’ve yet to load it. First, I’ll need to load the image from the given path for analysis. Once that’s done, I can detect the maze path, potentially marked by black lines, and plot a red line to showcase it. The image is relatively small—404x

I’m thinking the start could be at the top left missing border piece, or maybe near the left border. To help figure this out, I’ll look at the first row pixels to find any white areas near the border. Since the maze walls are black on white, I’ll try a thresholding approach. To invert it, I’ll use a threshold value of 200, which would set white to 255 and black closer to 0. I’ll apply this threshold using the inverse method.

Okay, I initially thought I could approximate the maze start near the top-left opening. I applied some thresholding to detect white pixels but encountered an issue—nothing changed. It seems the image might have high noise. Oddly, when inspecting the grayscale, all pixel values are zero, which suggests the image is completely black. This doesn’t match what we saw earlier with Matplotlib, so I’ll delve deeper into how PIL loaded the image and inspect the underlying pixel array.

Analyzed