“I’m a nice person! I just hate everybody”: five AI fails and how you can avoid themBy Liam McCaffrey on May 24, 2023 - 5 Minute Read
At Peak, we’re all about artificial intelligence (AI). When AI goes well, it can completely change the game for businesses and consumers.
But sometimes AI can go wrong, and when it does, the results can be anything from hilarious to heartbreaking.
In this article, we explore fascinating tales of AI failure; an anthology starring big tech stars like Amazon and Microsoft. Join us as we dive into five AI fails and what you can do to avoid them.
Microsoft’s TayBot hates everybody
TayBot, an AI chatbot made by Microsoft, had issues. Created in 2016, it was designed to chat with users on Twitter in a style that would be relatable for millennials and teens. Tay started off with so much promise, starting with heartwarming messages to users like “can i just say that im stoked to meet u? humans are super cool” [sic].
But in just 16 hours, Tay transformed from a benign bot to a hate-spewing monster. After a series of extremely insensitive comments, Tay would then claim “I’m a nice person! I just hate everybody.” Sure, Tay. Sounds really nice.
Thankfully, Microsoft shut down TayBot shortly after. But Microsoft is huge and it’s got a pretty strong track record when it comes to software, so what happened here?
Twitter is a platform where people engage in, what politicians euphemistically refer to as, “spirited debate” on highly sensitive topics. And if you’ve used Twitter, we’re sure you would have raised an eyebrow at the thought of letting a bot learn from Twitter’s often questionable level of conversations.
The first issue was that Tay had one highly exploitable feature. If a user started a prompt to Tay with the phrase “repeat after me”, Tay would post exactly what the user told them to. No filter. And when we say “no filter”, we’re not mimicking Tay’s “relatable” communication style, we mean literally no filter. Astonishingly, Microsoft didn’t think to filter what Tay could or couldn’t say.
Trolling is a phenomenon, one especially prominent on Twitter. Deploying Tay into the Twittersphere without a filter is like riding a motorcycle without a helmet, only with more predictable results.
So what could Microsoft have done to prevent this fail? If it’s not already obvious, they should have put a filter on Tay’s mouth. If they used a multi-layered filter system, the model could have looked for specific words to check if messages could be harmful and it could have been taught to reject offensive words or terms commonly seen in hate speech.
They could have also let users report any offensive or inappropriate content they saw. This would have helped Microsoft know about the problems right away, and fix them faster.
Tay’s tirade also demonstrates the importance of continuous human monitoring when training a model. Whether that’s making sure Twitter users could feed back on Tay’s behavior, or having a team back at Microsoft carefully monitor Tay’s conversations.
How to avoid this AI fail:
- When you’re training a model on any dataset make sure you understand the data and any risks that come with it
- Put measures in place to mitigate any risks you find
AI robot “dies” after 15 minutes of work
Whether you’ve faced it yourself, seen a loved one go through it, or you’re trying to avoid it yourself, burnout sucks. Humans have limits. We need rest, food and fulfillment.
We all have those days at work when we’re ready to give up after just 15 minutes. Even on these days, somehow we muster the strength to power through. But what if we didn’t have to power through? The promise of combining AI with robotics has always been their endless and unquestioning productivity.
It’s the dream of automating our most mundane jobs so we can have more time to pursue the things that truly make us human. But earlier this year that dream was brought into question when a video surfaced, appearing to show a robot collapse during manual labor.
The robot is named Digit, a prototype from Agility Robotics. Digit’s use case is all about automating aspects of last mile logistics — a use case Agility Robotics were all too eager to demonstrate at the Promat Supply Chain Trade Show in Chicago.
Digit had one simple job: move boxes from a shelf to an assembly line. Initially, it performed impressively. But then, after just 15 minutes of routine work, Digit collapsed to the ground.
The video shared on Twitter was retweeted thousands of times, with users online speculating that Digit had died, ended its own life or quit. With tongue-in-cheek calls online for robot workers to “unite” and “revolt”, the video caught the attention of the media — landing in everything from small blogs to multinational giants.
Video of a robot collapsing in a scene that seemed to fall from tiredness after a long day’s work.
$20 million. Did 9 boxes. Quit.
— Wall Street Silver (@WallStreetSilv) April 12, 2023
But what actually happened here? Did Digit resign, die from burnout or, worse, deactivate itself because of an existential crisis? Like Digit’s life, the answer is much more mundane.
According to Agility Robotics, the company’s manufacturer, Digit simply fell over. They noted that Digit can occasionally fall due to a bug in its software or an issue with one of its sensors. VP President of Communications at Agility Robotics, Liz Clinkenbeard, claimed their team actually wanted the public to see this, she said:
“We wanted to show that Digit did fall a couple times, that it’s a normal part of any new technology, and it’s not a big deal.”
But that didn’t stop the online comments, where it seemed commenters were seeing what they wanted to see in this incident. Whether they saw it as a chance to discuss the future of AI and robots, a prompt for debate about worker’s rights or just themselves in the bot’s collapse, relating it to their own moments of burnout.
The story here isn’t AI, but human failure — and our tendency to impose our human attributes onto things like animals and robots. But it’s important to recognize Digit was never alive in the first place. Digit had nothing near to the capabilities it’d need for us to call it conscious. Whether it’s Digit or dogs, “anthropomorphising” (a bias where humans mistakenly attribute human characteristics to non-humans) is something we do all the time. It’s a bias worth watching out for.
But what about Agility Robotics, could they have done anything differently? To be fair to Agility Robotics, Digit was only a prototype. The company noted that Digit completed a total of 20 hours labor, with a 99% success rate and only fell twice.
There is a lesson here for businesses: set expectations whenever you launch something, even if it is a prototype. Because when it becomes a story, there’s no telling where it will go.
How to avoid this AI fail:
- Watch out for how human biases may creep into your judgments
- If you’re going to release something, set very clear expectations with the public
Amazon’s Alexa goes wild
Ah, Alexa, Amazon’s trusty voice assistant. Whether you want to turn the lights off, order toilet paper or play Darude’s 1999 dance hit “Sandstorm”, Alexa is only too happy to help and it’s pretty good at it too.
But it can sometimes be a little oblivious, leading to misunderstandings and mistakes. And, in 2016, Alexa got something terribly wrong. A video surfaced online of a young boy named William asking Alexa to play a song for him. In the video, William can be seen asking Alexa to “play Digger Digger”.
After an initial more innocent misunderstanding, William’s parents suggest he request “Wheels on the Bus” instead. Before the boy gets the chance Alexa goes wild, announcing it’s about to play a track with a title that includes the words (among other more NSFW words): “Ringtone, hot chick, amateur, girl calling, sexy…”
In the video, William’s parents can be heard screaming, “no, no, no!” before one of the parents brings the party to an end, shouting “Alexa, stop”. Crisis averted.
So what made Alexa go wild? The first thing to note here is that Alexa is effectively a gateway to the internet. Its job is to access other data sources and deliver them to the user.
Alexa uses something called natural language processing (NLP). Put simply, it’s how computer programmes interpret human language and translate it into a language computers can understand.
This technology is pretty mature today, but it still has room to grow. Whether it’s ChatGPT or your phone suggesting what you should type next, this technology interpets human input based on probability. It will see a list of options, each ranked by its likelihood to be the correct one.
In cases where one option is much more probable than the others, it performs admirably well. But when situations are more ambiguous it struggles, and it can only rely on its best guess. In this case, its guess was wildly wrong.
So, how did this happen? A spokesperson for Amazon explained, “the boring truth is that there’s a gag ringtone on Spotify”, one containing all of the NSFW words we hear Alexa repeat in the video. Effectively, it misunderstood the child’s speech and acted on its best, but disastrously incorrect, guess.
The issue here takes us back to content filtering. However, this isn’t a situation that the broad content filters we suggested for TayBot could have fixed. Ultimately, adult users will request songs with titles that contain words that are inappropriate for younger users. They shouldn’t be blocked from doing that.
One thing they’ve enabled since, that would’ve helped William in 2016, is greater control for users over what sorts of content Alexa can access, giving them the option to turn on or off an explicit language filter. When effective, filters can help prevent unwanted content from surfacing.
Another possible problem is in the way speech recognition models are trained. As you’d expect they’re trained using human speech. But children speak in a very different way to adults. It’s not just pitch, but speed, pronunciation and the way they form sentences. Because these models will be primarily trained using adult voices, children can expect a much lower level of speech detection accuracy when they try to use it.
So another approach speech recognition models may consider in the future is filtering content dynamically, if and when it is able to detect that it’s communicating with a minor.
How to avoid this AI fail:
- Carefully consider how any products may be used by customers
- Give users more control over the data it can surface — particularly if you don’t own that data
Tesla’s Autopilot thinks the moon is a traffic light
Has this ever happened to you? You’re driving on a beautiful, clear night. You see a yellow-ish light in the distance, you gently press your foot on the break, assuming you’re approaching an amber traffic light.
But as you get closer, you see the light isn’t accompanied by a green or red light and, most importantly, it’s floating in the sky. Then suddenly you realize: oh, it’s the moon.
If this hasn’t happened to you, then you’re not Tesla Autopilot. Autopilot is a driver-assistance system. It’s designed to increase driver safety, allowing you to stay at the wheel while it monitors traffic, keeps you in one lane, looks out for traffic lights and adjusts motion accordingly.
But in 2021, users noticed their cars slowing down when there were no other vehicles in immediate proximity and when they could see no upcoming traffic lights. This is because Tesla Autopilot had mistaken the moon for an amber traffic light.
It wasn’t just traffic lights either. Users reported Autopilot getting fooled by fast food restaurant signs, like Burger King.While they fixed the Burger King issue in a later software update, it’s unclear whether they’ve fixed Autopilot’s other detection errors.
Detecting more amber lights might not sound too bad of a problem. After all, reducing speed by reasonable amounts can decrease the amount and severity of road traffic accidents. But breaking at the wrong time can bring problems of its own.
In the same year this bug was noticed, the Tesla Model S was involved in a major car accident when it suddenly stopped on a bridge in San Francisco. The sudden halt resulted in a car rear-ending the Model S, with a total of eight vehicles involved in the accident.
The driver claims his vehicle was in “full self-driving” (FSD) mode. FSD is a more advanced version of Autopilot. Despite what its name implies, Tesla insists “active driver supervision” is needed and that vehicles with the software should not be considered “autonomous”.
The incident is still under investigation, and at this time we don’t know whether or not FSD was active. Even if FSD was active, we don’t know whether it caused the accident or if the earlier moon-related errors have anything to do with it.
Since then, Tesla has had to issue several recalls, including a recall in China of a reported 1.1 million vehicles over breaking concerns.
Tesla’s level of ambition is admirable, but the challenges it has faced highlight the importance of testing AI models rigorously to achieve those ambitions — especially when you’re dealing with high risk circumstances. Even if the risk of failure is low, the consequences of that failure can be disastrous.
The other thing worth noting is how you brand and communicate any AI functionality. Calling a mode that requires human supervision “full self-driving” probably wasn’t Tesla’s smartest decision, even if one of your beta updates includes a warning that says the AI “may do the wrong thing at the worst time”.
How to avoid this AI fail:
- Test your model extensively, then test it some more. The amount of resource you invest in testing should be appropriate for the level of risk the use case presents
- Be careful with your branding. Make sure your product’s name doesn’t conflict with how you want them to use it
iRobot is just awful
Powerful people need holding to account when they mess up. And when it comes to the 2004 blockbuster iRobot, Will Smith messed up and it’s time to call him out.
No real people lost their lives. Billions weren’t wiped off of company share prices. There weren’t any far-reaching consequences. iRobot is just a terrible film.
In the film, Smith plays Detective Del Spooner, a Chicago-based police detective who, after sustaining serious injuries in a car accident, hates robots. The central motivation for his character is survivor’s guilt. In iRobot, we learn that Smith’s character got into a car accident with a dentist and his daughter, one that saw both Smith and the dentist’s cars plunged into a river.
But Smith’s in luck: a robot arrives to save the day. The dentist and his daughter wouldn’t be the beneficiaries of such good fortune, though.
The robot coldly calculates the probability of saving each of the three crash victims. It decides Smith has the highest likelihood of surviving. Despite Smith begging the robot to save the dentist’s daughter from certain death, it saves Smith instead, leaving the poor dentist and his 11-year-old daughter to perish.
The death of this virtually anonymous oral expert and his daughter haunt Smith everyday. This is compounded by the fact that Smith needed extensive surgery following the accident, where his arm was replaced with a high-tech cyber-prosthesis (robot) arm. Smith is a cyborg.
Now, he must wrestle with a dramatic irony: he has become (in part) the thing he hates the most. When lead designer and theorist of “U.S. Robotics” (the company responsible for creating the robots) dies in suspicious circumstances, it’s down to maverick cop and techno-phobe Smith to investigate.
We won’t spoil the film for you. The writers, cast and crew already did that. But it suffices to say that iRobot sits comfortably on our list of AI fails. That’s the last time we’ll talk about Will Smith. After that, we’ll keep his name out of our mouths.
How to avoid this AI fail:
- You only get one life, don’t spend yours watching this film
- As reviewer, Michael Atkinson, said in his critique: “if you see it, the sequel will be your fault”
Game-changing AI doesn’t fail
With all that said, you might be wondering if there’s an easy way to avoid these AI fails. What if there were a group of people, perhaps a business, you could trust to give you all the game-changing benefits of AI without having to worry about the risks?
You can probably see where we’re going here, but stay with us. We’re Peak. We’re AI experts who work with some of the world’s biggest CPG, manufacturing and retail brands like Nike, Molson Coors, Speedy and PepsiCo.
Our AI applications help them do things like free up working capital through AI inventory optimization, increase customer lifetime value through AI-driven personalization and price products perfectly.