Generative and embodied AI may lead to general-purpose robots, but developers must avoid speed bumps, says Converge.
The post When will we get the ChatGPT of robotics? The future of embodied AI is bright appeared first on The Robot Report.

Micropsi has developed probabilistic, rather than deterministic, robot programming. Source: Micropsi Industries
With the success of generative AI, there has been much discussion around the potential for bringing the kind of flexible intelligence found in large language models into the physical world. This is often called “embodied AI,” and it is one of the most profoundly transformative opportunities in the global economy.
I would like to argue that the future for embodied AI is bright, but the path forward is far less straightforward than the path for AI in the purely digital realm. The road to a “ChatGPT for robotics” has a number of speedbumps, and new breakthroughs are required for the idea to become a reality. This has implications for startup founders and investors, which I’ll attempt to distill into a few recommendations.
More robotic automation is an inevitability, and all of the uncertainty resides in the question of how, not if. Since its acquisition of Kiva Systems in 2012, Amazon has deployed over 750,000 robots in its warehouses. Startups and investors are attempting to triangulate the next applications that can achieve this level of alignment between robotic capabilities and market needs.
The trajectory of AI is a key variable in this triangulation process, and powerful new models could be absolute game changers. So where do we stand in the development of these models? I have spoken with experienced roboticists and those developing robotic foundation models to better understand this question.
Register now so you don’t miss out!
One step at a time toward embodied AI
The aim of cutting-edge embodied AI research is to create robot intelligence that is general-purpose rather than task specific — flexible enough to handle new or highly dynamic use cases without the need for dedicated training. The promise of general-purpose robotic foundation models is twofold.
First, they would dramatically expand the number of use cases addressable by robotics. Second, they would shorten the historically long commercialization timelines for robotics systems.
Both of these promises are being fulfilled in the purely digital realm by foundation models like GPT-4, Gemini, Claude, and Llama. These models have opened the door to countless new use cases while putting small, single-purpose AI models on a fast track to obsolescence, as the former can do the same job as the latter while eliminating the engineering investment required to do bespoke model training.
General-purpose models have become the de facto way to build almost anything in AI. One might speculate that a new ChatGPT-like model will come to dominate robotics application development.
However, I do not think this will be the case in the near term. Instead, my expectation is generative AI techniques will gradually infuse robotics rather than reshape the landscape overnight, and they will co-exist with classical robotics for some time.
Robotics has been advancing steadily thanks to generative AI techniques, even if they may not be headline-grabbing. Startups building today are already using techniques that promise more flexible, generalized intelligence and faster time-to-market. They are just not relying on a single “world model” as the foundation of their application.
For example, Diffusion Policy leverages diffusion models, the same technique that underpins AI image generators, to generate robot behavior. The resulting models are highly flexible and require less training data, but for now they are usually still trained on a task-by-task basis. Another promising generative AI technique is Neural Radiance Fields (NeRF), which can reconstruct 3D scenes from 2D images and have applications in robotics like the creation of novel training data.
General-purpose models do have the potential to become the basis for robotics development, and the promise of the approach has been highlighted by research models like Google’s RT-X and Physical Intelligence’s π0.
An important proof point from these models is that they have been demonstrated to be greater than the sum of their parts. When training data from many tasks is included, the model performs better on an individual task than if it had been trained solely on that task.
Yet the approach faces speedbumps on the road to adoption related to data, determinism, and compute. More breakthroughs are needed before this category of models is ready for production.
Three speedbumps with foundation models
The first speedbump is that there does not appear to be a corpus of data ready-made for training a foundation model about interacting with the physical world, in contrast to the abundance of web-scale text, image, and audio data that made existing foundation models possible. Perception models have become very powerful, but connecting perception to actuation is challenging.
To achieve the scale necessary for a true foundation model, I believe significant investment will need to go into mechanisms for collecting more data, as well as experimentation to understand the effectiveness of different types of training data. For example, it remains unclear the extent to which videos of humans performing tasks can contribute to model performance. I do believe that with a combination of ingenuity and investment, powerful large-scale training data can be assembled.
A trajectory that seems likely is that powerful models with significant pre-training will emerge in the next few years, but they will require additional supplemental training data to be performant at any specific task. This is akin to the fine-tuning of large language models, but it will be more essential because fewer capabilities will work “out of the box” with robotics models.
The second speedbump relates to determinism and reliability. Outside of robotics, the importance of determinism varies widely by application, and the most successful early generative AI applications are ones where determinism is not important. In robotics, determinism is critical. Setting aside safety, the return on investment (ROI) of robotics is usually dependent on throughput, and time spent on error resolution destroys throughput.
So far, research on robotics foundation models has emphasized novelty and not reliability. There is a substantial amount of effort going into methods for mitigating the non-determinism of generative AI models — broadly, not just in robotics — so I believe this problem can be addressed, but probably not in one fell swoop. This is an argument for a co-existence of deterministic and non-deterministic models.
In order to balance flexibility with reliability, our portfolio company Micropsi Industries, which automates high-variance tasks for some of the world’s largest manufacturers, uses neural networks that are deterministic rather than probabilistic.
The third speedbump for robotics foundation models is that in robotics, compute often needs to be done at the edge, making inference a challenge. Robots must be cost-effective, and today, many applications will not support the cost of adding enough GPUs to run inference for the most powerful models.
This problem is potentially the most tractable of the three I’ve mentioned. It is expected that roboticists will take large models as a starting point and use distillation techniques to create smaller, more focused models with fewer resource requirements. However, this will necessarily reduce the models’ generality and is contrary to the idea of a robot that can do anything.
Our portfolio company RGo Robotics supplies its Perception Engine to a broad range of mobile robot OEMs, and across this base of robot makers, it expects smaller, cheaper models to continue to be popular in cost-sensitive use cases. Hardware is continuously improving in price/performance, so what is practical to run at the edge will evolve.
Techniques like quantization are also making it possible to effectively reduce the size of large models. Hybrid approaches are also possible, in which some compute is done in the cloud and some on-device.
Recommendations for the genAI, embodied AI era
While the world is increasingly digital, we still live in a physical world, and the interaction of the digital with the physical has unbounded scope for growth.
Observers often ask why AI can write an essay or a piece of music, but not do something menial like load the dishwasher. The latter will likely be feasible in the near-term, but moreover, the same question is being asked about physical processes in industries worth trillions of dollars. This makes embodied AI one of the most profound opportunities in the global economy.
Robotics is making tremendous progress, and I see robots becoming critical enablers in industries where they were never present before, while established robotics markets are benefiting from new embodied AI innovations. Generative AI will be a transformative element of the path forward for robotics, but my conclusion for now is that it will be a gradual process rather than an overnight shift that fundamentally changes how robotics companies are built.
At the same time, it would be foolish to underestimate the ability of innovators to overcome the challenges I’ve outlined, but it is very difficult to predict when a breakthrough will occur. As a result, my recommendations for entrepreneurs starting robotics companies today are:
- Focus on a high-value application and determine the best way to address that application, without being wedded to any one approach. Know all the nuances of the application inside out, because often the devil in the details is what kills the economic viability of a robotics solution.
- Assess where new generative AI techniques can solve previously unsolvable problems. View generative AI as a tool rather than a solution in itself.
- Expect that most of your engineering hours will be devoted to robustness and hardening, not new capabilities.
- Study the playbooks of the most successful robotics companies and see what aspects make sense to emulate. I don’t believe the recipe for a successful robotics company, whether in regard to value proposition, product development, or go-to-market strategy, has fundamentally changed.
If you are working on a new robotics startup or innovating around the application of generative AI to physical world automation, I’d love to hear from you.
About the author
James Falkoff is a partner at Converge, a venture capital firm based in Boston and Silicon Valley focused on intelligent automation and the intersection of the physical and digital worlds. He has been an investor in the technology industry for 19 years.
The post When will we get the ChatGPT of robotics? The future of embodied AI is bright appeared first on The Robot Report.