Article reprinted from: Machine Heart
AI tools that make images look better often cause image distortion, while making images look more realistic often lacks beauty. How do we balance this issue?
Image source: Generated by Unbounded AI
In suspense and science fiction works, we often see such scenes: a blurry photo is displayed on the computer screen, and then the investigator asks to enhance the image, and then the image magically becomes clear, revealing important clues.
It looks great, but it’s been complete fiction for decades. Even for a while, when AI’s ability to generate images began to grow, it was hard to do: “If you just blow up an image, it’s going to be blurry. There’s a lot of detail, but it’s all wrong,” says Bryan Catanzaro, vice president of applied deep learning research at Nvidia.
However, researchers have recently begun incorporating AI algorithms into image enhancement tools, making the process easier and more powerful, but there are still limits to the data that can be retrieved from any image. But as researchers continue to push the boundaries of enhancement algorithms, they are finding new ways to cope with these limitations, or even find ways to overcome them.
Over the past decade, researchers have begun enhancing images using generative adversarial network (GAN) models, which are capable of generating detailed and impressive pictures.
Tomer Michaeli, an electrical engineer at the Teonium Institute of Technology in Israel, said: "The images suddenly look much better." But he was also surprised to find that the images generated by GAN showed a high level of distortion, which measures the closeness between the enhanced image and the underlying reality displayed. The images generated by GAN look beautiful and natural, but in fact they are "fabricating" or "imagining" those inaccurate details, which leads to a high degree of distortion.
Michaeli observed that the field of photo restoration fell into two categories: one that showed beautiful pictures, many of which were generated by GANs, and another that showed the data but didn’t show many pictures because they didn’t look good.
In 2017, Michaeli and his graduate student Yochai Blau more formally explored how various image enhancement algorithms performed in terms of distortion versus perceived quality, using known metrics of perceptual quality that correlate with human subjective judgment. As Michaeli expected, some algorithms had very high visual quality, while others were very accurate with very low distortion. But none had the best of both worlds; you had to choose one or the other. This is called the perceptual-distortion tradeoff.
Michaeli also challenged other researchers to come up with algorithms that produce the best image quality for a given level of distortion, in order to make a fair comparison between pretty picture algorithms and good statistics algorithms. Since then, hundreds of AI researchers have presented the distortion and perceptual quality of their algorithms, citing the Michaeli and Blau paper that described this tradeoff.
Sometimes the impact of the perceptual-distortion tradeoff isn’t terrible. For example, Nvidia discovered that high-definition screens don’t render some low-definition visual content well, so in February it launched a tool that uses deep learning to improve the quality of streaming video. In this case, Nvidia’s engineers chose perceptual quality over accuracy, accepting the fact that when the algorithm increases the resolution of a video, it will generate some visual details that weren’t in the original video.
“The model is fantasizing. It’s all guesswork,” Catanzaro said. “It’s OK if the super-resolution model guesses wrong most of the time, as long as it’s consistent.”
A view of blood flow in a mouse brain (left) and the same view after using AI tools to improve image quality and accuracy. Image credit: Junjie Yao and Xiaoyi Zhu, Duke University.
In particular, applications in research and medicine will require greater accuracy. AI technology has made significant progress in imaging, but “it can sometimes bring undesirable side effects, such as overfitting or adding spurious features, so it needs to be treated with extreme caution,” said Junjie Yao, a biomedical engineer at Duke University.
Last year, he described in a paper how to use AI tools to improve existing methods for measuring brain blood flow and metabolism while operating safely on the accurate side of the perception-distortion trade-off.
One way to get around the limits on how much data can be extracted from an image is to simply combine data from more images. Previously, researchers studying the environment through satellite imagery have made some progress in integrating visual data from different sources: In 2021, researchers in China and the United Kingdom fused data from two different types of satellites to get a better look at deforestation in the Congo Basin, the second largest rainforest in the world and one of the most biodiverse regions. The researchers took data from two Landsat satellites, which have been measuring deforestation for decades, and used deep learning techniques to increase the resolution of the images from 30 meters to 10 meters. They then fused this set of images with data from two Sentinel-2 satellites, which have slightly different detector arrays. Their experiments showed that this combined imagery "enables the detection of 11% to 21% more disturbed areas than when using Sentinel-2 or Landsat-7/8 imagery alone."
If a direct breakthrough is not possible, Michaeli proposes another way to hard-limit the information available. Instead of seeking a definitive answer on how to enhance a low-quality image, it is better to let the model show multiple different interpretations of the original image. In the paper "Explorable Super Resolution", he showed how image enhancement tools can provide multiple suggestions to users. A blurry, low-resolution image of a person wearing what appears to be a gray shirt can be reconstructed into a higher-resolution image in which the shirt can be black and white vertical stripes, horizontal stripes, or plaid, all of which are equally reasonable.
In another example, Michaeli took a low-quality photo of a license plate and used AI image enhancement to show that the number 1 on the plate most likely looked like a 0. But when the image was processed through a different, more open-ended algorithm designed by Michaeli, the number looked equally likely to be a 0, 1, or 8. This approach can help rule out other numbers without incorrectly concluding that the number is 0.
We can alleviate these illusions, but that powerful, crime-solving “boost” button remains a dream.
As various disciplines in different fields explore the perceptual-distortion tradeoff in their own ways, the question of how much information can be extracted from AI images and to what extent these images can be trusted remains a core issue.
“We should remember that in order to output these beautiful images, the algorithm is just making up the details,” Michaeli said.
Original link: https://www.quantamagazine.org/the-ai-tools-making-images-look-better-20230823/
