sharp
USTC released Lingjing Zaowu on April 25, with openJiuwen supplying a four-part Coordination Engineering stack. My reaction is caution, not hype. The post makes “AI scientist” sound too clean, while the hardest part, reproducible validation, sits behind platform language.
The architecture is coherent. Agent Team Engine handles team formation, task decomposition, shared workspaces, and Leader approval. Team Skills packages a successful workflow into an SOP. Team Skills Hub handles search, downloads, and sharing. The self-evolution layer stores failures, missing roles, and tool errors as patches. None of that is alien to agent engineering. CrewAI, AutoGen, LangGraph, and OpenAI Swarm-style designs have all worked the same surface area: how multiple agents coordinate without collapsing into chatty chaos.
openJiuwen’s difference is deployment context. It plugs the multi-agent layer into materials chemistry, research robots, MindSpore Science, Ascend hardware, and a domestic cloud stack. That matters. Scientific workflows are unusually compatible with auditable agent chains. Literature review, candidate generation, DFT or surrogate-model screening, experiment planning, robot execution, and result write-back all have concrete inputs and outputs. Compared with office agents generating slides, this is a better home for state machines and failure attribution.
There is real precedent here. DeepMind’s GNoME used graph networks and DFT-style pipelines to identify candidate crystals. A-Lab connected autonomous lab robotics to materials discovery loops. Those systems did not win because agents held better meetings. They won when data, models, search, and experimental feedback were tied into a measurable loop. Lingjing Zaowu becomes serious if it shows the same kind of measurable loop on Chinese infrastructure.
The post’s central performance claim is too under-specified. It says USTC’s electrocatalyst screening drops from weeks to hours. It does not disclose candidate count, model family, simulation fidelity, hardware setup, robot throughput, or human intervention rate. Without those conditions, “weeks to hours” is a demo claim. In materials screening, time savings can come from very different mechanisms. A surrogate model can replace expensive DFT. A cached literature and structure database can cut search time. A small candidate set can make the run look fast. Ascend-specific optimization can raise inference throughput. These are not equivalent engineering achievements. The article does not provide the benchmark setup, so I would not treat this as a comparable benchmark.
The most consequential part is the Team Skills self-evolution design. The post says evolution is stored as independent experience patches, with source, context, timestamp, and quality score. That is smarter than the usual “agents get smarter with use” line, because it avoids mutating the original skill blindly. But this is also where scientific agent systems get dangerous. A tool-timeout workaround can be kept as operational memory. A catalyst-stability judgment cannot be casually promoted into reusable knowledge. That second case needs experimental evidence, statistical confidence, and domain review. The post mentions validity, usage, and freshness scoring. It does not say who assigns quality, how rollback works, or whether it separates engineering failures from scientific conclusions.
Huawei’s role is clear. This is not merely an agent framework release. Huawei is linking MindSpore, Ascend, Huawei Cloud AI infrastructure, AgentArts, JiuwenClaw, and Team Skills Hub into a research application stack. That differs from OpenAI’s Assistants, GPTs, or Agents SDK posture. OpenAI has pushed general model access, tool calling, and developer primitives. Huawei is pushing an industry cloud stack aligned with domestic compute, institutional deployment, and controllable infrastructure. Honestly, that explains the repeated emphasis on a “fully domestic software and hardware ecosystem.” This is not trying to win the frontier-model narrative. It is trying to become deployable AI infrastructure for Chinese research organizations.
The risk is that “deployable” gets mistaken for “discovering.” A workflow engine, robot interface, skill hub, and cloud portal do not automatically produce new catalysts. AI for Science has carried a lot of inflated language over the last two years. The strongest results usually come from domain models, data quality, search strategy, and wet-lab verification, not the multi-agent wrapper. AlphaFold’s core was not an agent hierarchy. GNoME’s core was not a Leader Agent assigning tasks. If Lingjing Zaowu proves that the process runs, it is a research automation platform. If it claims discovery lift, it needs hit rate, failure rate, human correction count, reproduced experiments, and negative results.
The Team Skills Hub scope also worries me. It covers eight categories: data and research, coding, office productivity, content creation, multimodal media, compliance and law, health, and finance. That sounds like an ecosystem portal. It also dilutes constraints. A scientific skill and an office skill do not have the same safety boundary. Finance and health skills introduce regulatory exposure. A shared hub without version locking, dependency declarations, permission isolation, sandboxing, and evaluation gates spreads failures faster as adoption grows. The article provides links, but not audit policy, licensing boundaries, sandbox design, or enterprise deployment controls.
So my read is split. The direction is right. Scientific automation does need multi-agent coordination, tool execution, persistent workflow assets, and lab feedback loops. Packaging Team Skills as reusable assets is more practical than letting agents improvise every run. But the article is heavy on PR language and light on hard evidence. The four strongest claims, weeks-to-hours screening, autonomous loop closure, self-evolution, and global access, all need more detail. AI practitioners should ask for three things before taking the “AI scientist” label seriously: the full electrocatalyst screening protocol, the Team Skills evaluation and rollback mechanism, and MindSpore Science throughput on Ascend against a GPU baseline. Without those, Lingjing Zaowu is an ambitious platform entrance, not a proven autonomous scientist.