A Deep Look Inside Paintings

February 28, 2019 3 minute read

Light and space is the origin of everything, the two gametes which engendered the Universe. Likewise, in the painting as a window to the Universe, the space becomes canvas and the light emerges in the form of colorful pigments. Somebody said once that the painting is just that, light and space. And it is still mysterious how a canvas and a bunch of pigments can condense the deepest corners of the human soul.

In particular, the light confined in the Meninas’ room can manifest itself the history of painting. Everything can be explained therein, any emotion can be found impregnated in those Velazquez’s strokes. Not in vain the atmosphere in that room has inspired painters, fascinated artists, and hypnotized audiences for centuries until our days. The mere idea of having the possibility to get not inside the canvas, but inside the room itself, is so attractive that triggers innumerable imaginary visions. By imaging yourself inside the painting next to Velazquez, walking between the Meninas, facing the Kings’ mirror, or looking through that mysterious door at the end of the room, conforms a spatial-temporal transportation through the wrinkles of the Universe.

I don’t remember when I started to be deeply fascinated by paintings. Recently, several months ago, I discovered that a neural network could estimate depth from a single image. I instantly dreamed about the idea of having a deep look inside that Meninas’ room.

A Deep Look Inside Paintings is a video artwork which invites to an imagination journey through the hidden third dimension of paintings. The depth embedded in a 2D canvas by means of the conferred painter’s perspective is not only unfolded, but also a 3D space is created here to navigate through, from which a whole new vision of the painting arises. In this process, the rendered inaccuracies only contribute to the drama and the beauty of these paintings. The electronic textures in the Debussy’s Clair de lune, as magic powders, confers life to the audio-reactive canvases by connecting both classical and actual digital worlds.

Neural Models

One of the evolutionary strategies to understand depth in monocular vision, which means looking through a single eye, has to do with the ability to (visually) remember that size decreases with distance. In other words, if a person in a picture has the same size as a building, the former is probably nearer than the latter. Actually, size-scaling was one of the first strategies which painters used to give perspective to their paintings. And the other way around, many people use it to create disparate funny visual illusions like the well-known Ames room.

Nowadays, deep neural networks for computer vision have been trained with so many objects, in so many perspectives, in so many locations and illuminations, that in the end they are able to learn what should be near and far in the scene. One of the first neural networks to estimate depth from a single image was Monodepth. However, this network was trained with street images and does not generalize properly with other types of scenes. That took me later to Megadepth, which was trained with thousands of images of city landscapes and impressively generalized better for indoor scenes and human bodies, delivering much cooler results.

Projection and lights

In order to project depth into space, focal projection is used instead of parallel projection. However most of the paintings do not present an evident optical perspective. Thus, for compromise, all the 3D models are projected with a similar and quite long focal distance. Regarding lights, no external spotlight illuminates the 3D canvases. The light contained inside the paintings is so marvelous that inherently generates the chiaroscuro atmospheres.

The deep look

Looking carefully at the generated depths maps, specially when they are 3D projected, it can be seen that buildings and other elements like roofs, windows, and bridges are prettily reconstructed. Actually, Megadepth was specifically trained for that purpose. Furthermore, human silhouettes were also somehow considered during training and so people are gracefully reconstructed too. However, it is noticeable that several heads, and curiously heads of women, failed to be correctly estimated in their depth. On the contrary, the Millet’s pitchfork, for instance, is nicely molded, even though the network has probably never seen a tool like this before. Regardless the inaccuracies and warped volumes, from an artistic point of view new pictorial landscapes emerged beautifully, which filled me with excitement.