Install open-source AI in a commercial robot and it’ll clean your room

Meta and NYU’s robot can navigate and clean rooms it’s never seen before.

Using just open-source AIs, researchers got a commercial robot to find and move objects around a room it had never entered before. The bot isn’t perfect, but it suggests we might not be as far from sharing our homes with domestic robots as experts previously believed.

“Just completely impossible”: Demo videos of robots cleaning kitchens, making snacks, and doing other chores might have you hoping your days of loading the dishwasher are numbered, but AI experts predict we’re still a decade away from handing even a fraction of our chores over to bots.

“There is a very pervasive feeling in the [robotics] community that homes are difficult, robots are difficult, and combining homes and robots is just completely impossible,” Mahi Shafiullah, a PhD student at NYU Courant, told MIT Technology Review.

“Simply tell the robot what to pick and where to drop it in natural language, and it will do it.”

Lerrel Pinto

Open-source, off-the-shelf: A major holdup in the home robot revolution is the fact that building a robot that could work in anyone’s home is a lot harder than training one to work in a controlled lab environment.

A new study — co-led by Shafiullah and involving researchers from NYU and AI at Meta — suggests we might be closer to domestic robots than we think, though.

Using only open-source software, they modified a commercially available robot so that it could move objects around a room it had never entered before on demand. They call the system “OK-Robot,” and detail the work in a paper shared on the preprint server arXiv.

“Simply tell the robot what to pick and where to drop it in natural language, and it will do it,” tweeted Lerrel Pinto, who co-led the study along with Shafiullah.

How it works: The bot at the core of the OK-Robot system is called Stretch (you can buy one for just $19,950, plus shipping and taxes). Stretch has a wheeled base, a vertical pole, and a robotic arm that can slide up and down the pole. At the end of the arm is a gripper that allows the bot to grasp objects.

To turn the robot into something humans can talk to, the team equipped it with vision-language models (VLMs) — AIs trained to understand both images and words — as well as pre-trained navigation and grasping models.

They then created a 3D video of a room using the iPhone app Record3D and shared it with the robot — that process took about six minutes. After that, they could give the robot a text command to move an object in the room to a new location, and it would locate the object and move it.

They tested OK-Robot in 10 rooms. In each room, they choose 10-20 objects that could fit in the robot’s gripper and told it to move them (one at a time) to another part of the room (“Move the soda can to the box,” “Move the Takis on the desk to the nightstand,” etc.).

Overall, the robot had a 58.5% success rate at completing the tasks. But in rooms that were less cluttered, its success rate was much higher: 82.4%.

a flow-chart showing where OK-Robot ran into trouble while moving objects
Liu et al (2024)
This flow-chart shows where OK-Robot ran into trouble.

Looking ahead: Even though OK-Robot can only do one thing (and doesn’t always do it right), the fact that it relies on off-the-shelf models and doesn’t require any special training to work in a new environment — just a video of the room — is pretty remarkable.

The next step for the team will be open sourcing their code so that others can build off of what they’ve started — and potentially help get domestic robots doing our chores sooner than predicted.

“I think once people start believing home robots are possible, a lot more work will start happening in this space,” said Shafiullah.

We’d love to hear from you! If you have a comment about this article or if you have a tip for a future Freethink story, please email us at [email protected].

Related
Aerospace engineer explains why AI can’t replace air traffic controllers
For everyone’s safety, humans are likely to remain a necessary central component of air traffic control for a long time to come.
Nvidia’s free tool lets you create your own chatbot right on your PC
Nvidia’s Chat with RTX tool lets you create a custom chatbot that runs locally on your PC and can answer questions about your personal files.
How does studying 500 years of the printing press help us tackle the era of AI?
For around 500 years, the printed word shaped our education and culture. What lessons can we learn from it in the new age of AI?
NASA tests autonomous space robots for off-world construction
NASA is developing autonomous space robots to build shelters, solar arrays, and more on the moon and Mars.
OpenAI’s text-to-video AI, Sora, is futurism come to life
Sora will let anyone transform their ideas directly into video and the implications are breathtaking.
Up Next
hands holding a phone with the OpenAI logo on the screen
Subscribe to Freethink for more great stories