What Went Down

  • On March 7, Zhiyuan Robotics co-founder Peng Zhihui teased, “Something big is coming next week.”
  • The internet went wild. Over 100,000 people tuned in to see what the buzz was about.
  • On March 10, AGIbot revealed Genie Operator-1 (GO-1) — their first large-scale, universal embodied base model.

As a result, on the morning of March 10, AGIbot Robotics revealed the answer – Genie Operator-1 (GO-1), AGIbot’s first large-scale universal embodied base model. In the video, the robot can toast, make coffee, and deliver breakfast to your hands without any problem.

Officials claim that GO-1 not only has strong generalization capabilities, but can also quickly adapt to new scenarios and new tasks with very little data or even zero samples .As early as the end of 2024, AGIbot launched AgiBot World, a large-scale high-quality dataset containing more than 1 million trajectories, covering 217 tasks, and involving five major scenarios. It is based on this huge “data gold mine” that GO-1 can achieve efficient training and extensive generalization in a short period of time. It can be said that AgiBot World is the “invisible hero” behind GO-1.So how does the GO-1 robot base model actually perform, and what does it mean for the robotics industry?

According to official statements, in addition to expanding the robot’s athletic capabilities, GO-1 more importantly strengthens its AI capabilities, thereby greatly increasing the robot’s practical value .

In the demonstration video released by AGIbot, GO-1 showed strong learning ability: by watching human operation videos, it can quickly master new skills and efficiently apply them to actual tasks. For example, the video shows GO-1’s powerful object tracking ability: even if the cup is moved randomly, it can still accurately complete the pouring action.Secondly, GO-1 has demonstrated very strong generalization capabilities.

Unlike traditional models that require massive amounts of data for training, GO-1 can achieve rapid generalization with only hundreds of pieces of data . For example, in the demonstration, after completing the task of pouring water, GO-1 can seamlessly switch to a new task of toasting bread and spreading jam without additional training. This ability not only demonstrates GO-1’s adaptability to diverse tasks, but also reflects its core advantage of minimalist learning.

At the same time, GO-1’s cross-body capability provides strong technical support for multi-robot collaboration. In the video released by AGIbot, it shows a scene where two robots work together to complete a complex task: one robot receives guests at the front desk, and the other robot focuses on making coffee. This collaboration reflects the efficiency and adaptability of GO-1.

Traditional embodied models are usually designed for a single robot body (Hardware Embodiment), which leads to two major problems: low data utilization and limited deployment. However, GO-1 can enable multiple bodies and quickly migrate between different robot forms, significantly improving data utilization efficiency and reducing deployment costs .

It is worth mentioning that the GO-1 large model can also be used with a complete set of AGIbot data reflow systems, which can continuously evolve and learn from the problem data encountered in actual execution. This system can capture problem data from the actual execution process, especially execution errors or abnormal situations, and continuously improve the performance of GO-1 through manual review and model optimization.

For example, in the demonstration scenario, the robot made a mistake when placing a coffee cup. The system will immediately reflow the relevant data and optimize the model in a targeted manner to ensure that the next operation is more accurate.

At the same time, the GO-1 large model also adds a new voice interaction method for the robot, which greatly facilitates users to freely express their needs in real scenarios.

The reason behind GO-1’s amazing performance is its different model architecture.

GO-1 uses the Vision-Language-Latent-Action (ViLLA) architecture, which combines a multimodal large model (VLM) and a hybrid expert system (MoE) and is divided into three modules that work together:

VLM (Very Large Multimodal Model): Based on InternVL-2B, it processes multi-view visual, force signals and language input to achieve scene perception and command understanding.

Latent Planner: By predicting Latent Action Tokens, it transfers action knowledge from heterogeneous Internet data to robot tasks, solving the problem of insufficient high-quality real machine data.

Action Expert: Generates high-frequency and flexible action sequences based on the Diffusion Model to ensure precise execution.

Industry insiders believe thatGO-1The model architecture is very simple, with not much innovation . 

It mainly integrates existing work, data, and training methods .Compared with the previous model, the only new addition is a layer of Latent Planner, but it is only a few layers of Transformer and is not complicated.

Sui Wei, vice president of Digua Robotics, said that AGIbot’s work directly addresses the pain point of the industry – data issues, and has a very good promoting effect on the embodied intelligence industry. However, compared with the big model, the most valuable thing here is the data set .


According to reports, the underlying support of GO-1 is a super-large-scale robot dataset called AgiBot World. It is understood that the AgiBot World dataset contains more than 1 million trajectories, collected by 100 real robots, covering more than 100 real-world scenarios and 217 specific tasks.


The dataset is built on the AgiBot G1 hardware platform and is collected by more than 100 homogeneous robots. It provides high-quality open-source robot operation data and supports solving challenging tasks in a variety of real-life scenarios. The latest version of the AgiBot World dataset contains 1 million trajectories with a total duration of 2976.4 hours, covering 87 skills and 106 scenarios.


Meanwhile, AgiBot World goes beyond basic tabletop tasks in laboratory environments, such as grasping and placing, to focus on real-world scenarios involving dual-arm manipulation, dexterous hands, and collaborative tasks.


Compared with the existing data set in the industry (Open X-Embodiment), AGIbot’s data is larger in quantity and has better data quality, standardization and consistency. The Open X-Embodiment data set contains many different forms of ontologies, and the data forms vary greatly, which will greatly interfere with the training of the model.

However, although AGIbot’s data set has reached a certain scale, it is still just a small starting point and has not led to a significant improvement in the robot’s capabilities.
The test results show that GO-1’s performance has been greatly improved compared to previous models, but the success rate in pouring water, table bussing and restocking beverages is still less than 80%.


Sui Wei said that at this stage, the model is not the core bottleneck of the robotics industry . The real challenges lie in two aspects: first, the convergence of hardware, for example, bionic designs such as grippers, dexterous hands, and tactile sensors have not yet been standardized; second, because the main body cannot be promoted on a large scale, the amount of data is always insufficient.


At present, in terms of data collection, the robotics industry mainly relies on teleoperation technology, including virtual reality (VR) equipment, isomorphic strap-type equipment, and motion capture equipment. However, the data collection cost of the robotics industry is high and lacks clear commercial value support, which makes it difficult for the flywheel of the data closed loop to run quickly.


In comparison, the data collection cost in the autonomous driving industry is almost negligible. The on-board perception system can continuously transmit data back, forming an efficient data closed loop.


At the end of the GO-1 release video, everyone found an Easter egg – AGIbot Robotics previewed the next embodied intelligent robot product, although the specific time has not yet been announced. However, AGIbot immediately posted on Weibo that “there will be a surprise tomorrow”, and this news instantly filled the industry with expectations again.


The rise of big models has led to explosive evolution in the AI ​​industry. People are particularly curious about how big models can promote the robotics and embodied intelligence industries. Zhiyuanhe founder Zhihuijun’s GO-1 seems to be a good starting point. Obviously, embodied AI is difficult to be completed by a company independently. Only open source cooperation can truly achieve the rapid evolution of the robotics industry.

Posted by Leo Jiang
PREVIOUS POST
You May Also Like

Leave Your Comment:

Your email address will not be published. Required fields are marked *