With the help of AI, Japanese scientists are able to visualize thoughts from MRI scans better than ever before. They also make it particularly easy for themselves, using the popular and freely available text-to-image generator Stable Diffusion.
For several years, scientists have been working on “reading” thoughts with the help of mathematical models or artificial intelligence. That is, they try to visualize what people see or imagine. Japanese researchers have now succeeded in creating particularly realistic images with a rather unusual tool. The scientists at Osaka University have published their results in a preprint and a brief summary. The study has not yet been reviewed.
So-called functional magnetic resonance tomography (fMRT) is the basis of a pictorial reconstruction of thoughts. It makes neuronal activities in the brain visible by showing changes in blood flow. The University of California at Berkley, where one of the two authors of the study, Shinji Nishimoto, was doing research in the field at the time, described how this basically works.
At the time, Nishimoto himself was one of the props who spent several hours in an MRI scanner. He watched movie trailers while measuring blood flow in the area of ??his brain that processes visual information (visual cortex).
On the computer, the brain has been divided into small, three-dimensional cubes called volumetric pixels, or “voxels.” The recorded brain activity was fed into a computer program that learned to associate visual patterns in the film with the corresponding brain activity.
So far, very complex models have been used to decode the information obtained, which nevertheless often only produced very vague images. For their new approach, Nishimoto and his colleague Yu Tagagi used Stable Diffusion instead. This is a so-called diffusion model, which is actually there to generate photorealistic images from text input.
In principle, diffusion models learn by first “noising” an image with more and more pixels beyond recognition and then undoing the process. Thus trained, a model is able to produce data by processing randomly sampled noise through the learned denoising process. Michael Katzlberger explains this very nicely in an article on Stable Diffusion on “Artificial Creativity”.
According to the AI ??expert Salvator Raieli, the information extracted from the image is more important in the solution implemented by the Japanese, and the text conditioning takes place later. So you could say it is used for refinement, which corresponds to the images shown in the pre-study.
Stable Diffusion is not only special because it is open source, i.e. accessible to everyone. The authors of the study also point out that it works easily and cheaply. In principle, Stable Diffusion is an off-the-shelf model that does not have to be specially developed and trained from scratch. And it’s lightweight enough to run on home PCs too.
However, the model is not only a very efficient solution. According to the Japanese scientists, the combination of image and text coding that is possible with the model also produces high-resolution images with high fidelity “at the highest level”.
The Japanese study could be an important step towards a practical implementation of thought visualization technology. Among other things, one could better understand what goes on in people who cannot express themselves verbally, such as stroke victims or coma patients. Or paralyzed people can use an interface to control computers with their minds.