The article explores the journey of developing AI-powered robots at Google X, focusing on the challenges of creating autonomous machines capable of performing everyday tasks in unpredictable real-world environments.
It highlights the distinction between traditional programming and AI-driven end-to-end learning, emphasizing the complexity of teaching robots to adapt and learn tasks like humans.
The piece reflects on the potential and limitations of current robotics technology, stressing that meaningful breakthroughs will require vast amounts of real-world data and patient innovation.

In early January 2016, I joined Google X, Alphabet’s secretive innovation lab. My role was to figure out what to do with the employees and technology left over from nine robot companies that Google had acquired. It was a confusing time. Andy Rubin, the father of Android who had overseen these acquisitions, had suddenly left, and the project lacked clear direction. Larry Page and Sergey Brin occasionally dropped by to offer guidance, but it was Astro Teller, head of Google X, who had taken responsibility for the robot group. At the time, the lab was affectionately known as the “moonshot factory,” a place dedicated to solving the world’s hardest problems.

Astro Teller convinced me that Google X—or simply X, as we began to call it—would be different from other corporate innovation labs. The company had patient capital, allowing for long-term investments and risk-taking. For someone like me, with a history of starting and selling tech companies, the idea of tackling large, meaningful problems in a place like X felt right. I believed that Google had the resources and vision to take big bets, and AI-powered robots—those that could live and work alongside humans—seemed like an audacious but necessary endeavor.

By 2010, Google X had already launched projects like Waymo, Google Glass, flying energy windmills, and stratospheric balloons aimed at delivering internet access. The lab was deliberately separated from the main Google campus to foster its own culture of bold experimentation. To succeed at X, projects needed to meet three criteria. First, they had to address a problem affecting hundreds of millions, or even billions, of people. Second, they needed a breakthrough technology offering a new way to solve the problem. Finally, they required a radical business or product solution that sounded, at least initially, just on the right side of crazy. This framework became the hallmark of a “moonshot.”

When I sat down with Astro to discuss what we might do with the acquired robot companies, it became clear that the task was daunting. Most robots at that time were large, dangerous, and confined to factories, where they operated in controlled environments. Our challenge was to create robots that would be safe and useful in everyday settings, working side by side with humans. The problem was massive—aging populations, shrinking workforces, and global labor shortages loomed on the horizon. The breakthrough technology would be artificial intelligence, which even in 2016, we knew would play a crucial role. Our radical solution was to create fully autonomous robots capable of performing a growing list of tasks in daily life.

Building such robots was going to take time, a lot of patience, and a willingness to try crazy ideas and fail. We knew it would require significant breakthroughs in AI and robotics technology, and very likely cost billions of dollars. Still, there was a deep conviction within the team that the convergence of AI and robotics was inevitable. What had previously been the realm of science fiction was about to become reality.

Every week or so, my mother, who lived in Oslo and suffered from advanced Parkinson’s disease, would call and ask the same question: “When are the robots coming?” She depended on caregivers to assist her with daily tasks, but she hoped robots might one day help her with smaller, more personal challenges. I would tell her that it would take time, but her desire for help echoed the broader need we were addressing. Our robots were intended to assist people in everyday tasks, allowing them to live more independently.

However, building robots that could navigate the messy, unpredictable human world was incredibly hard. Robotics is a systems problem, and robots are only as strong as their weakest link. For example, if a robot’s vision system has trouble perceiving objects in direct sunlight, it may go blind when a ray of sunlight hits it through a window. If its navigation system can’t handle stairs, the robot may tumble down and hurt itself—or worse, hurt someone nearby. These are the real challenges of building robots that can work safely and reliably in human environments. The unpredictability of the world, from changing light conditions to cluttered spaces, makes it extraordinarily difficult to program robots to perform even simple tasks like grasping a cup or opening a door.

There are two main approaches to integrating AI in robotics. The first is a hybrid method, where certain subsystems, like vision or object recognition, are powered by AI, while the rest of the robot’s behavior is governed by traditional programming. For instance, a robot might use AI to identify a cup on a table, but once the cup is recognized, it follows pre-programmed instructions to pick it up. The second approach, known as end-to-end learning (e2e), aims to have the robot learn an entire task, like tidying up a room, by exposing it to large amounts of training data. In this way, the robot learns through trial and error, much like a child learns to pick up objects by watching others and practicing.

At one point, Larry Page told me that to solve our robot problem, all we needed were 17 machine-learning researchers. It was a classic Page insight, both difficult to grasp and profoundly simple. The number was not what mattered; the message was. For Larry, the key to success lay in small, focused teams working on big problems. End-to-end learning, where robots would eventually learn to perform complex tasks without step-by-step programming, was the real breakthrough we needed to pursue. If we could demonstrate that robots could master an e2e task, we would be well on our way to success.

One of our researchers, Peter Pastor, ran a “robot arm farm”—a lab filled with robot arms repeatedly practicing the task of picking up objects like sponges or Lego blocks. When the project began, the robots had a 7% success rate. Over time, through trial and error, they improved. After months of training, they reached a 70% success rate. This was an important milestone because it marked the first time a robot had learned a task through reinforcement, rather than being explicitly programmed to do it.

Even with this progress, it was clear that teaching robots to perform useful tasks in the real world would require much more data. We built a cloud-based simulator, creating more than 240 million virtual robots to run tasks in parallel. These simulated robots practiced picking up objects, interacting with their environments, and refining their skills. The idea was that by the time we transferred these algorithms to physical robots, they would have already “learned” a great deal in their simulated worlds.

The rise of systems like ChatGPT, which took enormous amounts of data to create, underscored the complexity of the problem we faced. While language models like GPT could generate convincing text, robots needed an even greater amount of real-world data to learn to perform complex tasks. Robots are already leveraging large language models and vision systems to understand speech and recognize objects, but these capabilities alone aren’t enough for them to function autonomously in our unpredictable world. It will take many thousands, perhaps millions, of robots doing tasks in the real world to collect the data needed to train robust AI models.

At Everyday Robots, we also wrestled with the question of what form these robots should take. Some argued they should mimic humans, with legs and arms, to better navigate spaces designed for people. Others believed a simpler design, such as robots with wheels, would be more efficient and easier to build. The debates were intense, but one remark by Vincent Dureau, a senior engineering manager who used a wheelchair, brought clarity. He said, “I figure that if I can get there, the robots should be able to get there.” His comment helped shift the focus from mimicry to functionality.

Eventually, our robots started performing small, useful tasks, like tidying desks in the office. Using AI to identify objects and people, they picked up cups and wrappers and disposed of them. While these achievements seemed minor, they represented significant progress in solving one of the most difficult parts of robotics: making AI systems work reliably in messy, everyday environments.

Although we had come far, the road ahead was still long. Building robots that could clean hotel rooms, tidy restaurants, or perform other complex tasks would require a blend of AI and traditional programming for years to come. Despite the potential of end-to-end learning, we were realistic about the pace of progress. The dream of fully autonomous robots working seamlessly alongside humans is still on the horizon, but at Everyday Robots, we were determined to make it happen, one small task at a time.

WRITTEN BY

Parker Kleinman

Don’t forget to share this post!

Facebook Twitter LinkedIn Email

← Mayonnaise: A Key Ingredient in Nuclear Fusion Research OpenAI Secures $6.6 Billion as It Reshapes AI Leadership and Faces a Future of Boundless Potential →

Engineering the Future: How AI and Robotics Are Shaping Our Everyday Lives

0 Comments

Search

Tags

Recent Posts

Stay Connected

0 Comments

Search

Tags

Recent Posts

Subscribe to Our Newsletter

Pin It on Pinterest