Can humans figure out how deep learning AI thinks?

It all started with centipedes, pandas, and a floating fire truck…

May 19, 2020

Deep learning AI is all around us.

It runs self-driving cars. It makes medical diagnoses. It recognizes your face and your voice — for translating, for voice to text, for “hey, Alexa.” For a variety of applications with different levels of urgency, we trust deep learning AI.

Incredibly powerful, capable of sifting through and finding patterns in unimaginably large data sets, like finding constellations in the stars, deep learning algorithms can often outperform human beings. So we use them; sometimes, we trust them. But deep learning AI, with their neural networks, are not like other machine learning algorithms.

We have no idea what they are thinking; they are black boxes. And researchers are working to crack them open with two main methods: dissection and mapping.

At Auburn University, Anh Nguyen, an assistant professor of computer science and software engineering, is dissecting image recognition deep learning algorithms to analyze them bit-by-bit. And Sameer Singh, an assistant professor of computer science at UC Irvine, is creating attribution maps — essentially a “heatmap” of what an algorithm is focusing on — to help understand what could make a natural language algorithm (how Alexa understands, and speaks with, you) start saying things that are, well…racist.

Both approaches have benefits and drawbacks. Breaking the algorithm apart can help computer scientists and programmers gain a granular understanding of what is happening, but could mean nothing to a lay person. And an attribution map, while easier to read, does not provide the same amount of detail as a dissection.

But why are these AIs black boxes to begin with? Why don’t we understand what computer programs, written by humans, are thinking?

Deep Learning vs Machine Learning

Machine learning is a form of artificial intelligence in which the AI uses a vast amount of “ground truth” data to train itself for a given output; the classic example is recognizing a cat. Feed a machine learning algorithm thousands of photos labeled “cat,” and it can learn to identify cats. Playing the equivalent of thousands of years of a game can teach it to play that game.

The idea of machine learning dates back to the 1950s, Nguyen says, but only more recently have computers had the horsepower to effectively crunch enough data to make it useful. By the 1990s, machine-learning algorithms were using simple but effective concepts to learn. But more complex problems required more complex algorithms, inspired by the one behind our eyes. And that’s where deep learning comes in.

“Deep learning” AI, unlike machine learning, doesn’t need structured data to feed on. It utilizes what is called an artificial neural net. Inspired by the human brain, where many neurons work together, a neural net creates layers upon layers of “neurons” through which the AI considers the data — and each layer provides a different interpretation. These interpretations work together to classify the data.

The problem is that these systems are so dense and complex, human beings cannot understand them.

The problem is that these systems are so dense and complex, human beings cannot understand them. We know the input (the data or task), and we know the output (the answers or results) that the deep learning AI provides. But what happens in between is a black box.

How and why the AI got from A to B is locked up under those layers of neural feedback.

Besides being unsettling, computer programs that we don’t understand can do unpredictable things, and it’s hard to reverse engineer or correct them when they go wrong.

“In general, it boils down to the question of why,” says Nguyen. “Why neural networks behave this way, and not the other way.”

Centipedes, Pandas, and a Floating Fire Truck

Neural networks are exceptionally good at detecting images. Feed them enough data, they can tease out patterns and differences invisible to the human eye. This ability gets put to use in a variety of applications. Some are life and death — like an autonomous vehicle detecting a pedestrian or a diagnostic tool detecting cancer.

No matter how advanced, a neural net is still brittle: when presented with something outside of its parameters, it will crash. A deep learning AI is often superior to a human in a specific, narrowly defined task. But because of its brittleness, when it fails, it fails spectacularly.

Since the algorithm is a black box, it can be difficult-to-impossible to identify why it botched its output. When the image being recognized incorrectly is a tumor or a pedestrian, the consequences can be fatal. And some very strange images can cause these failures.

Data that can royally mess up an AI is called “adversarial,” and it can cause a usually reliable neural network to make truly bizarre mistakes. Fields of static, wavy chevrons, and reef-fish-colorful stripes will confidently be declared centipedes or pandas.

“We discovered, shockingly, that somehow the networks are fooled by these bizarre patterns,” Nguyen says, “something we never imagined.”

Normal images can confuse deep learning AI too. Like a prankster god, Nguyen can take a 3D model of a firetruck or school bus and put it anywhere, any way, in a photo. Flip the firetruck upside down, and the AI sees a bobsled; zoom in close on a bus’s windows, and it becomes a punching bag. Something inside the black box is going haywire.

To find out what, Nguyen created a tool called DeepVis to dissect the algorithm. The program isolates and displays what individual neurons are recognizing.

“The idea here is to understand what each single neuron is doing,” Nguyen says. Using this program, Nguyen can see which neuron is detecting what basic objects in an image. From here, he can begin to break apart how it is learning. By examining every neuron, it should be possible to understand how the deep learning algorithm arrived at its output.

Even with DeepVis, the black box may not be fully opened. The sheer complexity of the legion of neurons can be hard for a human to understand: it is a grey box.

Because of that complexity, dissection is most useful for AI developers, Nguyen says. The amount of detail they provide can help them gain the deeper understanding of the neural network’s training needed to crack their black box.

But that same amount of detail makes it very difficult for a person who isn’t a computer scientist — like the doctor looking at the tumors — to understand what is going on.

For a more user-friendly peek under the hood, mapping may be the way to go.

What Is Deep Learning Thinking? Attribution Maps Try to Show Us

It’s hilarious — and a little sick — what adversarial data can do to a deep learning AI. A text-generating AI presented with a car crash of random letters and words can react in … interesting ways.

“It started generating racist text,” says UC Irvine’s Sameer Singh, who focuses on cracking the black box of natural language processing (NLP) algorithms, the algorithms that understand and reply to us. “Completely racist.”

To find out what parts of speech the AI is looking at, Singh uses a tool called an attribution map. Insert language into the text-generating NLP algorithm, and the attribution map will highlight certain parts, showing you what is “lighting up” inside the neural net — perhaps a certain letter combination.

It started generating racist text.

Sameer Singh

“The things that are highlighted have a big impact on the prediction or the output of the model,” Singh says. Using this information, Singh can then use intentional adversarial triggers to try to find problems and understand connections in the deep learning algorithms.

Singh’s team developed these particular triggers by using words they discovered the algorithm was keying in on. They then modified those words, following the template of what their maps said the algorithm was most “interested” in. The end result is a chain of words and semi-misspellings that evoked racist rhetoric.

Attribution maps are especially helpful for interpreting AIs for lay people. A doctor using a medical model may not know how to read a neuron-by-neuron breakdown. But if they see an attribution map that shows where on an image the algorithm is focusing, that could be enough to give them the gist of what the AI is thinking.

Like the complexity of a neuron-by-neuron approach, attribution maps have drawbacks as well.

“The thing to understand is that these attribution maps are also approximate,” Singh says — and different map generators may not agree with each other.

But an approximate understanding may be the best we can get.

The Human Black Box

Complex problems practically require deep learning algorithms. Deep learning AI is ubiquitous and mysterious, says Nguyen — like Ripley on LV-426, we are surrounded by aliens. As the algorithms become more complex, capable, and impenetrable, the questions around black boxes move further into the philosophical: Is it fair to require complete transparency of a neural net when our own is still mysterious?

Human thought itself may be, at best, a grey box. We know some structures and have an idea how they work. But the exact meanings of “thought” and “consciousness” are still unknown. Why would a system modeled after the human brain be any different?

Although important, such abstract puzzles are not as immediately pressing as the black boxes driving down the street or identifying cancerous tumors. Researchers like Nguyen and Singh seem to be fighting something as unstoppable as night itself: the more challenging the problem, the more complex the neural net, the blacker the box.

“The entire field is far from solving it,” Nguyen says. He believes we may end up settling for a grey box. If you have a complex model that works marvelously, it will likely not be interpretable, Nguyen says.

“No free lunch.”