[WBC 01] - Whole Body Control as Software Architecture

A technical overview of Whole Body Control and the software boundaries needed to implement it on floating-base robots.

WBC01. What Is Whole-Body Control?

Whole-Body Control, or WBC, is a term that appears frequently in humanoid robotics.

The phrase is used in several different contexts. In classical robotics, WBC often refers to a model-based controller that coordinates multiple tasks under contact and actuation constraints. More recently, the same phrase also appears in learning-based humanoid control, where a neural policy coordinates locomotion, manipulation, balance, and recovery behaviors across the full body.

These usages are related, but they are not identical.

At the most general level, WBC means that the robot is not controlled as a set of isolated limbs. The arm, legs, torso, contacts, balance, and posture are controlled together as one coupled system.

For a humanoid, this matters immediately.

If the robot reaches toward an object, the hand trajectory is only one part of the problem. The robot also has to maintain balance, avoid slipping, respect joint limits, keep the torso stable, and stay within actuator limits. A controller that only tracks the hand pose while ignoring the rest of the body is not a whole-body controller. It is an arm controller attached to a robot that may fall over.

That is the basic motivation.

WBC is the control layer that tries to answer the following question:

Given a robot with many degrees of freedom,
a floating base,
contacts with the environment,
and multiple objectives,

what should the whole robot do right now?

The exact answer depends on the control paradigm. In classical model-based WBC, this question is usually answered through rigid-body dynamics, task Jacobians, contact constraints, and optimization. In learning-based WBC, this question may be answered by a neural policy trained across many simulated situations.

This post starts from the model-based definition, then discusses how it relates to modern learning-based WBC.

1. Why WBC is not just task-space control

A fixed-base manipulator is a useful starting point.

For a conventional 6-DoF or 7-DoF robot arm bolted to a table, the main objective is often end-effector pose tracking.

desired end-effector position
desired end-effector orientation

The control structure is relatively direct:

desired end-effector motion
→ Jacobian
→ joint velocity, acceleration, or torque

The base is fixed to the world. The robot motion is mostly described by the actuated joints.

In that setting, it is often acceptable to think:

q   ≈ joint positions
q̇   ≈ joint velocities
τ   ≈ joint torques

This approximation is no longer sufficient for humanoids.

A humanoid has a floating base. The torso is not attached to the world. The robot is connected to the environment only through contacts: feet on the ground, hands on objects, or other body parts depending on the task.

The floating base has six degrees of freedom:

3 translational DoF
3 rotational DoF

But there is no actuator directly attached to the floating base in world coordinates. The motors are located at the joints. As a result, the robot controls its base indirectly through joint torques and contact forces.

The control path looks like this:

joint torques
→ joint motion
→ contact forces
→ whole-body dynamics
→ base and CoM motion

This is the main difference from a fixed-base arm.

A humanoid does not simply move its hand. It moves its hand while maintaining a physically feasible whole-body state.

For example:

The hand should move toward the object.
The torso should remain balanced.
The center of mass should stay in a stable region.
The feet should not slip.
The joints should avoid hard limits.
The actuator torques should remain within limits.

These objectives do not have equal priority.

Balance is usually more important than hand tracking. Contact feasibility is more important than posture preference. Torque limits are not optional. Friction constraints are not tuning suggestions.

This naturally leads to task hierarchy.

Higher-priority objectives:
  maintain balance
  preserve contacts
  satisfy physical constraints

Lower-priority objectives:
  track hand pose
  regulate posture
  use redundancy efficiently

This is the classical motivation for WBC.

WBC is not just “control for robots with many joints.” It is a framework for coordinating many objectives on a floating-base, underactuated, contact-constrained robot.

2. Classical model-based WBC

A useful reference point is the Stanford line of whole-body control work by Sentis and Khatib.

In the 2005 IJHR paper, Synthesis of Whole-Body Behaviors Through Hierarchical Control of Behavioral Primitives, whole-body behavior is described as the composition of multiple behavioral primitives. These primitives can represent different movement criteria: center of gravity behavior, hand behavior, leg behavior, head behavior, body posture, joint-limit constraints, contacts, collision avoidance, and so on.

The important part is the hierarchy.

The paper organizes primitives into three broad categories:

Constraints
Operational tasks
Postures

Constraints are the highest priority. They represent conditions that should not be violated, such as contacts, joint limits, near-body obstacles, and self-collision constraints.

Operational tasks are the main tasks the robot wants to accomplish, such as manipulation, locomotion, vision, hand tracking, foot tracking, or center-of-mass control.

Postures use the residual redundancy. They can regulate body shape, symmetry, comfort, effort, or other secondary objectives.

This hierarchy is a central idea.

The robot should not allow an operational task to violate a constraint. The hand task should not break a contact constraint. A posture objective should not disturb balance. Lower-priority behaviors are projected into the null space of higher-priority behaviors.

In simple terms:

constraints first
tasks second
posture last

The 2006 ICRA paper, A Whole-Body Control Framework for Humanoids Operating in Human Environments, extends this view into a more explicit humanoid control framework. It discusses control primitives, task prioritization, operational-space control, free-floating dynamics, and support contacts.

This is where WBC becomes more clearly connected to humanoids.

A humanoid is modeled as a free-floating system with six virtual unactuated degrees of freedom attached to the base. Supporting contacts, such as feet on the ground, are included in the control hierarchy. The controller must account for both free-floating dynamics and reaction forces at the contact points.

This is the model-based WBC view:

The robot has a dynamics model.
The base is unactuated.
Contacts create support forces.
Tasks are represented through Jacobians.
Constraints have priority.
Posture uses remaining redundancy.
The controller computes feasible whole-body motion and force.

This is different from a simple end-effector controller.

The controller is not only asking:

Where should the hand go?

It is asking:

Can the hand move there
while contacts remain feasible,
balance is maintained,
torques stay within limits,
and higher-priority constraints are not violated?

3. A compact equation, a large implementation

A common starting point for floating-base control is the rigid-body dynamics equation:

M(q) q̈ + h(q, q̇) = Sᵀτ + Jᵀλ

The notation varies by paper and codebase, but the meaning is roughly:

M(q)       mass matrix
q̈         generalized acceleration
h(q, q̇)   nonlinear effects: Coriolis, centrifugal, gravity
Sᵀτ        actuator torques mapped into generalized coordinates
Jᵀλ        contact forces mapped into generalized coordinates

The equation is compact. The implementation is not.

Each symbol implies a software decision.

How is q represented?
Does q include the floating base?
Is the base orientation stored as a quaternion?
Is q̇ the same dimension as q?
Which coordinates are actuated?
How is the selection matrix S defined?
How are contact Jacobians stacked?
What is the ordering of λ?
Who owns the robot model?
Who updates the dynamics?
Who decides which contacts are active?

This is why WBC quickly becomes a software architecture problem.

The equation itself is not enough. A deployable controller needs a structure for robot state, dynamics computation, task construction, contact management, optimization, state machines, command conversion, and hardware communication.

4. Learning-based WBC

Recently, the phrase WBC has also become common in learning-based humanoid control.

In this context, WBC often means a learned policy that coordinates the entire body rather than controlling locomotion and manipulation as separate modules.

For example, learning-based whole-body control methods may train a policy to handle:

locomotion
manipulation
disturbance recovery
whole-body motion imitation
multi-mode command following
loco-manipulation

This direction is natural.

Some behaviors are difficult to specify cleanly with a hand-written model-based controller. Footstep recovery, dynamic balance recovery, rich contact transitions, and multi-contact behaviors can be hard to encode as explicit tasks and constraints. Reinforcement learning can expose the robot to many situations in simulation and optimize a policy over a broad distribution of states.

Learning-based WBC also changes the software shape.

At runtime, the control interface may look simple:

observation → neural network → action

The action might be joint position targets, velocity targets, torque commands, residual commands, contact schedules, or task-space targets. The architecture can appear much simpler than a classical WBC stack with task objects, contact specifications, QP matrices, and command adapters.

But the complexity has not disappeared. It has moved.

In learning-based WBC, much of the complexity moves into:

reward design
curriculum design
policy architecture
simulation fidelity
domain randomization
dataset quality
action space design
debugging policy failures
sim-to-real transfer

The controller may be simple to run, but it can be difficult to inspect.

If a model-based controller fails, we can often ask specific questions:

Which task became infeasible?
Which constraint is active?
Which torque limit is binding?
Which Jacobian is singular?
Which contact force violated the friction cone?

With a learned policy, the failure mode may be harder to localize. The policy may output an action that is internally consistent from the network’s perspective but physically difficult for the low-level controller or hardware to track.

This is one of the main practical issues.

A learned policy can generate a target that is outside the tracking capability of the lower-level system. It may request a motion that violates actuator bandwidth, produces excessive torque demand, changes contact too aggressively, or assumes a contact condition that is not actually present.

In that sense, learning-based WBC is not automatically simpler. It is simpler at the runtime interface, but often harder at the training, debugging, and validation layers.

5. Model-based and learning-based WBC have different purposes

The point is not that model-based WBC is still important and therefore learning-based control is wrong.

That framing is not useful.

The better view is:

model-based WBC and learning-based WBC solve different parts of the problem

Learning-based control is strong when the behavior is difficult to explicitly model or hand-design.

Examples include:

disturbance recovery
dynamic footstep adjustment
multi-contact adaptation
whole-body motion imitation
locomotion-manipulation coordination
robust behavior over a wide state distribution

Instead of writing every contact transition and recovery strategy by hand, the policy can learn behavior from simulation, demonstrations, rewards, or motion data.

Model-based WBC is strong when we want interpretability and direct control over physical behavior.

Examples include:

task-space position tracking
force control
impedance control
torque limit handling
contact force constraints
task prioritization
posture regulation
debuggable constraint violations

In operational-space-control-style WBC, position and force control have a clear physical interpretation. Gains correspond to stiffness, damping, or desired impedance. A task has a Jacobian. A force has a frame. A constraint has a direction. A torque limit has a meaning.

This makes model-based WBC useful when we care about predictable low-level behavior.

For example, if we want to change the apparent stiffness of a hand task, a model-based controller gives us a direct knob:

increase Kp → higher apparent stiffness
increase Kd → more damping
change force target → different contact behavior

That does not mean tuning is easy. It means the tuning parameter has a physical interpretation.

A neural policy can also learn compliant behavior, but the relationship between a network weight and the resulting impedance is not usually something we can inspect directly.

So the difference is not:

model-based = old
learning-based = new

The difference is closer to:

learning-based control:
  learns behavior over many situations

model-based WBC:
  exposes physical structure and enforces interpretable objectives

Both are useful.

For real humanoid deployment, a hybrid architecture is likely.

A learned policy may generate high-level intent, task targets, contact schedules, recovery behaviors, or residual commands. A model-based layer may still handle tracking, impedance, torque limits, contact constraints, safety filters, or hardware command conversion.

The boundary will depend on the robot and the task.

6. Why model-based WBC software is difficult

Learning-based WBC can have a simple runtime interface:

input observation
→ neural network
→ output action

Model-based WBC does not look like that.

The controller must explicitly represent the robot, tasks, constraints, contacts, priorities, dynamics, solver variables, and actuator commands.

This becomes clear when looking at an inverse-dynamics WBC formulation.

A typical IHWBC-style problem may optimize over generalized acceleration and reaction force:

minimize over q̈, f_r

  Σᵢ wᵢ || Jᵢ q̈ + J̇ᵢ q̇ - ẍᵢᵈ ||²
  + w_f || f_rᵈ - f_r ||²
  + λ_q || q̈ ||²
  + λ_f || f_r ||²

subject to constraints such as:

floating-base dynamics
friction cone constraints
reaction force limits
joint acceleration limits
torque limits

The important point is not the exact equation.

The important point is what the equation requires from the software.

The WBC input is not simply:

x_des
λ_des

The input is closer to:

robot state
active tasks
active contacts
task priorities
task gains
desired accelerations
force targets
contact constraints
actuator limits

A task is not just a target pose.

A task contains information such as:

controlled frame
task dimension
task Jacobian
J̇q̇ term
desired position or force
desired velocity
desired acceleration
feedback gains
weight or hierarchy level
activation condition

A contact specification contains information such as:

contact frame
contact Jacobian
contact force dimension
friction coefficient
normal direction
force limits
contact activation state

A solver solution is also not automatically a hardware command.

The solver may output:

q̈_sol
f_r_sol
τ_sol

But the actuator interface may require:

q_cmd
qdot_cmd
kp_cmd
kd_cmd
tau_ff_cmd

Therefore, the controller needs a command conversion layer:

q̈_sol
→ integrate
→ qdot_cmd
→ q_cmd
→ clamp
→ rate limit
→ hardware command

This is where WBC software becomes complicated.

The difficulty is not only solving the QP. The difficulty is keeping the software boundaries clean.

7. WBC as software architecture

For this project, I want to treat WBC as a software architecture problem.

The goal is not to implement one isolated model-based controller. The goal is to design a ROS 2 control-based architecture that can support several control modes under a common robot interface.

The architecture should be able to support:

model-based WBC
learning-based planning
learning-based WBC
model-based trajectory optimization
MPC
low-level hardware command conversion

The central idea is to separate responsibilities.

A practical stack should have at least the following layers:

RobotSystem
  Owns robot state, kinematics, dynamics, frame transforms, and model queries.

Task
  Represents motion, force, constraint, or posture objectives.

ContactSpec
  Represents active contacts, contact Jacobians, friction cones, and force limits.

Planner / Policy
  Generates task targets, contact schedules, trajectories, or policy actions.

WBCSolver
  Converts active tasks and contacts into a dynamics-level solution.

CommandAdapter
  Converts solver or policy outputs into hardware-compatible commands.

FSM
  Selects the active control mode, task set, contact set, and safety behavior.

HardwareInterface
  Sends commands to drivers and reads actuator states.

The important part is that these layers should not collapse into each other.

The solver should not know about CAN/EtherCAT/RS485 packet encoding. The hardware interface should not know task hierarchy. A task should not directly write motor commands. A learned policy should not bypass safety and command limits unless that is an explicit design choice. The FSM should coordinate modes, not secretly patch every controller failure.

This separation is what makes hybrid control possible.

For example, a learned policy and a model-based WBC solver can both produce commands through the same command adapter, or they can both generate targets for a shared low-level controller. A model-based trajectory optimizer can generate references that become tasks. MPC can provide short-horizon targets. A learned policy can provide residuals or contact schedules.

The architecture should make these combinations possible without rewriting the hardware layer every time.

That is the main project goal. `

References

O. Khatib, “A Unified Approach for Motion and Force Control of Robot Manipulators: The Operational Space Formulation,” 1987.
L. Sentis and O. Khatib, “Synthesis of Whole-Body Behaviors Through Hierarchical Control of Behavioral Primitives,” International Journal of Humanoid Robotics, 2005.
L. Sentis and O. Khatib, “A Whole-Body Control Framework for Humanoids Operating in Human Environments,” IEEE International Conference on Robotics and Automation, 2006.
Z. Fu, X. Cheng, and D. Pathak, “Deep Whole-Body Control: Learning a Unified Policy for Manipulation and Locomotion,” CoRL, 2022.
T. He et al., “HOVER: Versatile Neural Whole-Body Controller for Humanoid Robots,” 2024.
Agility Robotics, “Training a Whole-Body Control Foundation Model,” 2025.
NVIDIA, “GR00T-WholeBodyControl.”