Magical realism and image labeling

Today a labmate showed me a beautiful paper by Adela Barriuso. She works in a small clothing shop in Spain; during the long hours without customers, she annotates images for computer vision datasets. Over the years she has annotated over 250,000 objects.

What does it mean to see for yourself after hours and days and years of seeing for machines? In the foreword, Antonio Torralba writes:

After a full day of labeling images, when you walk on the street or drive back home, you see the world in a different way. You see polygons outlining objects, you start thinking about what they are, and you are especially bothered by occlusions.

And then, Barriuso’s words:

The road to my home (I live far away from the city) passes by a landscape with lots of trees, wineries and even a small river. Now I look at this landscape in a different light because I want to recognize every tree, every bush and I try to think how each of those elements would look inside a photograph.

Perhaps it was that Barriuso’s words were translated to English from Spanish by Torralba—perhaps it was that idyllic imagery of trees, wineries, and river—but I was reminded immediately of my college lessons on Latin American magical realism (which I also read in translation, on idyllic spring afternoons at Stanford).

But what exactly was the connection? I thought at first of Gabriel García Márquez’ One Hundred Years of Solitude. In the novel, the inhabitants of Macondo begin suffering from a mysterious amnesia. In response, José Arcadio Buendía begins labeling:

With an inked brush he marked everything with its name: table, chair, clock, door, wall, bed, pan. He went to the corral and marked the animals and plants: cow, goat, pig, hen, cassava, caladium, banana. Little by little, studying the infinite possibilities of a loss of memory, he realized that the day might come when things would be recognized by their inscriptions but that no one would remember their use. Then he was more explicit. The sign that he hung on the neck of the cow was an exemplary proof of the way in which the inhabitants of Macondo were prepared to fight against loss of memory: This is the cow. She must be milked every morning so that she will produce milk, and the milk must be boiled in order to be mixed with coffee to make coffee and milk.

The figures in the paper, with every pixel marked by a label, are like an illustration of Macondo during this period, with every object dutifully tagged. The sign on the cow, which explains the relationship between the animal and other objects in the world, reminds us that labels alone are not enough for understanding. Of course, this is an important lesson learned by computer vision researchers who are still far from human-like “common-sense” visual reasoning.

Tugging further on the thought of memory, I found myself thinking next of Jorge Luis Borges’ short story Funes the Memorious. Ireneo Funes is a boy who suffers from the opposite of amnesia: perfect memory of all his experiences. It affects his perception, forcing him to swallow the endless complexity of the visual world:

We, at one glance, can perceive three glasses on a table; Funes, all the leaves and tendrils and fruit that make up a grape vine.

Re-reading this moment in the story, I was reminded of Barriuso’s frustration at labeling a photograph of a fruit-market, which is so busy as to make the eye glaze over. But something curious happens when she takes a closer look in LabelMe, the image labeling software she uses: “as I wanted to write about the fact that some images are impossible to label, I realized that labeling this image was not impossible.” Held still in LabelMe’s zoomed viewport, even the busiest scenes become tractable for the human eye to digest bit by bit.

Funes, too, finds himself doing some “labeling” at one point:

Locke, in the seventeenth century, postulated (and rejected) an impossible language in which each individual thing, each stone, each bird and each branch, would have its own name; Funes once projected an analogous language, but discarded it because it seemed too general to him, too ambiguous. In fact, Funes remembered not only every leaf of every tree of every wood, but also every one of the times he had perceived or imagined it.

Here I thought of Barriuso’s meditations on taxonomy. She labels images in English, which is not her first language. To work with this unfamiliar vocabulary, of which she can only memorize a small portion, she keeps a series of carefully organized notebooks: “I was making my own ontologies on the fly,” she writes. Compare this to Funes, whose infinite memory allows him a kind of intellectual wastefulness that, according to the narrator, denies Funes higher-level intelligence.

With no effort, he had learned English, French, Portuguese and Latin. I suspect, however, that he was not very capable of thought. To think is to forget differences, generalize, make abstractions. In the teeming world of Funes, there were only details, almost immediate in their presence.

As a researcher it is hard to read that passage and not think of today’s massive data-driven AI systems: language models that have memorized nearly every fact on the internet but continue to fail at everyday reasoning tasks. Reading Barriuso’s reflections, I am struck by the thoughtfulness and sheer ingenuity that she put into her work. It contrasts dramatically with the relative heavy-handedness of the Funes-like overparametrized convolutional neural networks that would eventually ingest her labels to do the same task without any thoughtfulness or inguenuity at all.

There is more to say about how this paper conjures for me the fantastical— about the Lilliputian difficulty of distinguishing a large bush from a small tree when absolute scale is lost, and about Barriuso’s choice to not label objects visible “through the looking-glasses” of mirrors, windows, and crystal cake-domes. But this is enough for now— I have a paper deadline soon, and it is getting dark outside!