[CAN 07] - CAN I/O Logic in ROS 2 Hardware Interfaces
Read/write timing, CAN I/O patterns, TX pacing, buffering, and stale feedback handling for CAN-based ROS 2 hardware interfaces
Previous Posts:
- What is CAN?
- Setting up SocketCAN on Linux
- SocketCAN Communication with ESP32
- Gripper Motor Control with CAN Bus
- PCAN Device Driver Installation on Linux
- From CAN Frames to ROS 2 Control
Overview
In the previous post, we discussed the basic role of a ROS 2 Control hardware interface.
read()
hardware feedback → ROS 2 state
write()
ROS 2 command → hardware command
For CAN-based hardware, this becomes:
read()
CAN frame → decoded joint state
write()
joint command → encoded CAN frame
This description is correct, but incomplete.
In practice, the difficult part is not only encoding and decoding CAN frames. The difficult part is deciding:
- when to read
- when to write
- how often to write
- how to handle incoming bursts
- how to avoid blocking the control loop
- how to detect stale feedback
- how to avoid overwhelming the actuator driver
A CAN hardware interface is therefore not just a protocol translator. It is also an I/O scheduling layer between an event-driven CAN bus and a periodic control loop.
flowchart LR
CAN[CAN bus<br/>event-driven frames]
HW[Hardware Interface<br/>I/O scheduling]
CTRL[ROS 2 Control<br/>periodic loop]
CAN <--> HW
HW <--> CTRL
Control Loops Are Periodic, CAN Frames Are Not
A controller usually runs at a fixed rate.
sequenceDiagram
participant C as Controller Loop
loop Fixed control period
C->>C: read()
C->>C: update()
C->>C: write()
end
CAN traffic does not necessarily follow the same timing. Frames arrive when devices send them. Some devices broadcast periodically. Some only respond to queries. Some reply after receiving commands.
sequenceDiagram
participant D1 as Sensor
participant D2 as Motor Driver
participant H as Host
D1->>H: periodic state frame
H->>D2: command frame
D2->>H: feedback frame
H->>D2: diagnostic query
D2->>H: diagnostic response
The hardware interface must connect these two timing models.
CAN side:
asynchronous frame stream
ROS 2 Control side:
periodic read → update → write loop
This is the main reason CAN I/O logic needs explicit design.
Common CAN I/O Patterns
Before implementing read() and write(), the device I/O pattern should be identified.
Most CAN devices used in robot hardware follow one of three patterns.
| Pattern | Description | Typical Devices |
|---|---|---|
| Periodic broadcast | The device sends state at a fixed rate without being asked. | sensors, IMUs, encoder boards |
| Query-response | The host sends a query, and the device replies with state or configuration data. | diagnostics, parameter reads, low-rate devices |
| Command-response | The host sends a command, and the device replies with feedback. | motor drivers, smart actuators |
Pattern 1: Periodic Broadcast
In this pattern, the device sends state using its own internal timer.
sequenceDiagram
participant Device
participant Host
loop Device update rate
Device->>Host: State frame
end
This pattern is common for sensors.
Examples:
- IMU
- force sensor
- tactile sensor board
- encoder board
- temperature monitor
The host does not request every sample. The hardware interface should collect incoming frames and update the latest-state cache.
flowchart LR
CAN[CAN frames] --> RX[Receive / decode]
RX --> Cache[Latest-state cache]
Cache --> Read[read()]
Read --> State[ROS 2 state interfaces]
The main risk is that frames arrive faster than the application drains them. If the receive buffer accumulates old frames, the controller may use stale data even though the device is still transmitting correctly.
Pattern 2: Query-Response
In this pattern, the host explicitly requests data.
sequenceDiagram
participant Host
participant Device
Host->>Device: Query frame
Device->>Host: Response frame
This pattern is useful for diagnostics, configuration reads, and low-rate data.
Examples:
- firmware version
- device parameter read
- diagnostic status
- error code request
- calibration data
It is usually not ideal as the main mechanism for high-rate joint feedback.
A blocking query inside read() can delay the control loop.
flowchart TD
Read[read()] --> Query[Send query]
Query --> Wait[Wait for response]
Wait --> Decode[Decode response]
Decode --> State[Update state]
Wait -.-> Problem[Control loop blocked]
A safer structure is to keep query-response logic outside the main periodic control path.
flowchart LR
Loop[Control loop] --> Read[Read latest state]
Loop --> Write[Write command]
QueryTask[Query task] --> Send[Send query]
Send --> Resp[Receive response]
Resp --> Diag[Update diagnostic state]
Pattern 3: Command-Response
In this pattern, the host sends a command and the device returns feedback.
sequenceDiagram
participant Host
participant Device
loop Control rate
Host->>Device: Command frame
Device->>Host: Feedback frame
end
This pattern is common for motor drivers and smart actuator modules.
The command frame may contain:
- desired position
- desired velocity
- control gains
- feedforward torque
- mode command
The response frame may contain:
- measured position
- measured velocity
- measured current
- temperature
- status flags
This pattern maps naturally to a control loop.
flowchart TD
Read[read()] --> State[Update joint state]
State --> Ctrl[Controller update]
Ctrl --> Cmd[Command interface]
Cmd --> Write[write()]
Write --> CAN[Send CAN command]
CAN --> FB[Receive feedback frame]
FB --> Read
However, the feedback frame should not automatically be interpreted as the state after the command has been applied.
Depending on the driver firmware, the response may contain:
- the latest measured state before the command was applied
- the latest measured state after the command was accepted
- a cached state from the actuator control loop
- a status packet generated independently of the command timing
The hardware interface should treat the response as timestamped feedback, not as a perfectly synchronized result of the command.
I/O Frequency Is a Hardware Constraint
The update frequency of a CAN hardware interface should not be chosen only from the ROS 2 control rate.
It must also consider:
- CAN bitrate
- number of devices on the bus
- number of frames per device
- feedback frame rate
- command frame rate
- frame size
- driver-side receive buffer
- firmware parsing rate
- actuator control-loop rate
A common mistake is to assume that if the ROS 2 controller runs at 1 kHz, every actuator command should also be sent at 1 kHz.
That is not always possible or useful.
For example, with 8 actuators:
8 command frames × 1000 Hz = 8000 command frames/s
8 feedback frames × 1000 Hz = 8000 feedback frames/s
This gives 16000 frames/s before considering diagnostics, retransmissions, or protocol overhead.
Even if the host CAN adapter accepts the outgoing frames, the actuator driver may not process them reliably if they arrive as a burst.
flowchart LR
Host[Host PC] -->|write succeeds| Kernel[Kernel TX queue]
Kernel --> Adapter[CAN adapter]
Adapter --> Bus[CAN bus]
Bus --> Driver[Motor driver RX]
Driver --> Firmware[Firmware parser]
Firmware -. may not keep up .-> Drop[Command ignored<br/>or overwritten]
A successful CAN_Write() or socket write() call means the frame was accepted by the host-side transmit path. It does not necessarily mean the actuator firmware consumed and applied the command.
This distinction matters when several frames are sent back-to-back.
for (const auto& frame : command_frames) {
can_socket.writeFrame(frame);
}
This can create a burst.
sequenceDiagram
participant Host
participant Driver
Host->>Driver: command motor 1
Host->>Driver: command motor 2
Host->>Driver: command motor 3
Host->>Driver: command motor 4
Host->>Driver: command motor 5
Host->>Driver: command motor 6
Host->>Driver: command motor 7
Host->>Driver: command motor 8
Note over Driver: Frames arrive close together.<br/>Firmware may not process all of them.
For this reason, the transmit path often needs pacing.
for (const auto& frame : command_frames) {
can_socket.writeFrame(frame);
sleep_for(tx_frame_delay);
}
The delay does not need to be large. The correct value depends on the CAN adapter, bus bitrate, actuator driver, and firmware implementation.
The goal is not to slow the robot unnecessarily. The goal is to avoid sending bursts that the receiving device cannot process.
sequenceDiagram
participant Host
participant Driver
Host->>Driver: command motor 1
Note over Host: small delay
Host->>Driver: command motor 2
Note over Host: small delay
Host->>Driver: command motor 3
Note over Host: small delay
Host->>Driver: command motor 4
Note over Host: small delay
Host->>Driver: command motor 5
Note over Host: small delay
Host->>Driver: command motor 6
Note over Host: small delay
Host->>Driver: command motor 7
Note over Host: small delay
Host->>Driver: command motor 8
The correct command rate and inter-frame delay must be tuned on the actual hardware.
Two devices can use the same CAN bitrate and still behave differently.
Same CAN bitrate
Same number of command frames
Different actuator firmware
Different RX buffer size
Different parser timing
Different result
This difference is often strongly hardware-dependent. High-end motor drivers may handle dense traffic, deeper buffering, and bursty command streams. Cheaper actuator modules may have smaller receive buffers, slower firmware loops, or weaker handling of back-to-back frames.
CAN I/O frequency should therefore be treated as a hardware parameter, not only a software parameter.
Blocking vs Non-Blocking Read
A direct blocking read is simple.
can_frame frame;
read(socket_fd, &frame, sizeof(frame));
decode(frame);
However, if no frame is available, the call can block.
flowchart TD
Read[read()] --> HasFrame{Frame available?}
HasFrame -->|yes| Decode[Decode frame]
HasFrame -->|no| Block[Block]
Block --> Delay[Control loop delayed]
This is usually not acceptable inside a periodic control loop.
A non-blocking read avoids this.
can_frame frame;
while (can_socket.readFrameNonBlocking(frame)) {
decode(frame);
}
The idea is to drain all currently available frames without waiting for new ones.
flowchart TD
Read[read()]
Read --> Check{Frame available?}
Check -->|yes| Decode[Decode frame]
Decode --> Check
Check -->|no| Snapshot[Use latest decoded state]
Snapshot --> Return[Return to control loop]
This pattern is useful when the control loop itself is responsible for polling RX frames.
Latest-State Cache
The controller should not depend on a specific CAN frame arriving exactly during the current read() call.
A more robust structure is to maintain a latest-state cache.
flowchart LR
CAN[CAN frames] --> Decode[Decode]
Decode --> Cache[Latest-state cache]
Cache --> Read[read()]
Read --> State[ROS 2 state interfaces]
The cache stores the most recent valid feedback per device.
struct DeviceFeedback
{
double position;
double velocity;
double effort;
rclcpp::Time stamp;
bool valid;
};
std::vector<DeviceFeedback> latest_feedback_;
Then read() copies the latest valid feedback into ROS 2 state interfaces.
hardware_interface::return_type CanSystem::read(
const rclcpp::Time& time,
const rclcpp::Duration& period)
{
can_frame frame;
while (can_bus_.readFrameNonBlocking(frame)) {
const auto feedback = protocol_.decodeFeedback(frame);
latest_feedback_[feedback.device_id] = feedback;
latest_feedback_[feedback.device_id].stamp = time;
latest_feedback_[feedback.device_id].valid = true;
}
for (size_t i = 0; i < latest_feedback_.size(); ++i) {
hw_positions_[i] = latest_feedback_[i].position;
hw_velocities_[i] = latest_feedback_[i].velocity;
hw_efforts_[i] = latest_feedback_[i].effort;
}
return hardware_interface::return_type::OK;
}
The cache should also support freshness checks.
bool is_fresh =
(time - latest_feedback_[i].stamp) < rx_stale_timeout_;
A decoded value is not always a usable value. It must also be recent enough.
WritePlan and TX Pacing
The transmit path should be explicit.
Instead of writing CAN frames directly in several places, the hardware interface can build a write plan and pass it to a frame executor.
flowchart TD
Cmd[ROS 2 command interfaces]
Cmd --> Convert[Convert to device commands]
Convert --> Encode[Encode CAN frames]
Encode --> Plan[WritePlan]
Plan --> Exec[Frame executor]
Exec --> Bus[CAN bus]
A simple write plan may contain:
struct WritePlan
{
std::vector<can_frame> frames;
std::chrono::microseconds inter_frame_delay;
std::chrono::microseconds frame_timeout;
};
The hardware-specific write() function builds the plan.
hardware_interface::return_type CanSystem::write(
const rclcpp::Time& time,
const rclcpp::Duration& period)
{
WritePlan plan;
for (size_t i = 0; i < hw_commands_.size(); ++i) {
plan.frames.push_back(
protocol_.encodeCommand(i, hw_commands_[i]));
}
plan.inter_frame_delay = tx_frame_delay_;
plan.frame_timeout = tx_frame_timeout_;
if (!frame_executor_.execute(plan)) {
return hardware_interface::return_type::ERROR;
}
return hardware_interface::return_type::OK;
}
The executor applies the transmit policy.
bool CanFrameExecutor::execute(const WritePlan& plan)
{
for (const auto& frame : plan.frames) {
if (!can_bus_.writeFrame(frame, plan.frame_timeout)) {
return false;
}
if (plan.inter_frame_delay.count() > 0) {
std::this_thread::sleep_for(plan.inter_frame_delay);
}
}
return true;
}
This separation makes it easier to tune the transmit path without changing the controller or protocol code.
Example CanSystem Structure
A useful structure is to separate raw CAN I/O, frame execution, and robot-specific hardware logic.
flowchart LR
System[CanSystem]
System -->|inherits| Base[CanSystemBase]
System -->|owns| Bus[CanBus]
System -->|owns| Exec[CanFrameExecutor]
Bus --> Interface[CanInterface<br/>SocketCAN / PCAN]
Each layer has a limited responsibility.
| Layer | Responsibility |
|---|---|
CanInterface | Direct access to SocketCAN, PCAN, or another CAN backend |
CanBus | RX polling, TX sending, optional observer dispatch |
CanFrameExecutor | TX execution, pacing, timeout handling, RX freshness |
CanSystemBase | Common read() / write() structure |
CanSystem | Robot-specific command conversion and state update |
The base class keeps the main structure small.
bool CanSystemBase::read()
{
const bool ok = update_measurements_();
refresh_state_snapshot_();
return ok;
}
bool CanSystemBase::write()
{
WritePlan plan;
plan.ready = commands_are_valid();
if (!plan.ready) {
invalidate_previous_commands();
return true;
}
build_write_plan_(plan);
store_previous_commands();
return execute_write_plan_(plan);
}
The concrete system provides the hardware-specific hooks.
bool CanSystem::update_measurements_()
{
return frame_executor_.poll_rx();
}
Incoming frames can be dispatched through an observer.
can_bus_.add_rx_observer([this](const CanFrame& frame) {
std::lock_guard<std::mutex> lock(state_mutex_);
if (protocol_.dispatch_rx_frame(frame, devices_)) {
frame_executor_.mark_rx_frame();
}
});
The receive path becomes:
flowchart TD
Read[read()]
Read --> Poll[frame_executor.poll_rx()]
Poll --> Bus[CanBus.poll_rx()]
Bus --> Frame[Receive CAN frame]
Frame --> Observer[RX observer]
Observer --> Dispatch[dispatch_rx_frame()]
Dispatch --> Device[Update device state]
Dispatch --> Mark[mark_rx_frame()]
Read --> Snapshot[refresh_state_snapshot()]
The state snapshot can include both decoded joint state and hardware health information.
void CanSystem::refresh_state_snapshot_()
{
std::lock_guard<std::mutex> lock(state_mutex_);
update_joint_states_from_devices();
state_snapshot_.stamp = now();
state_snapshot_.has_fresh_rx =
frame_executor_.has_fresh_rx(rx_stale_timeout_);
state_snapshot_.transport_healthy =
frame_executor_.transport_healthy();
copy_joint_state_to_snapshot();
}
The write path is similarly separated.
void CanSystem::build_write_plan_(WritePlan& plan)
{
std::lock_guard<std::mutex> lock(state_mutex_);
convert_joint_commands_to_device_commands();
protocol_.append_command_frames(
devices_,
device_commands_,
plan.frames);
plan.inter_frame_delay = tx_frame_delay_;
plan.frame_timeout = tx_frame_timeout_;
}
Then the executor sends the frames.
bool CanSystem::execute_write_plan_(const WritePlan& plan)
{
return frame_executor_.execute(plan);
}
The full write path is:
flowchart TD
Cmd[Joint commands]
Cmd --> Convert[Joint-to-device command conversion]
Convert --> Frames[Encode CAN command frames]
Frames --> Plan[WritePlan]
Plan --> Exec[CanFrameExecutor]
Exec --> Bus[CanBus.writeFrame]
This structure avoids putting all transport, protocol, timing, and robot-specific logic directly inside read() and write().
Tuning Procedure
A practical tuning process is:
1. Start with a conservative command rate.
2. Add inter-frame delay between command frames.
3. Check that every device responds consistently.
4. Monitor stale feedback and missed responses.
5. Increase command frequency gradually.
6. Reduce inter-frame delay only after the bus and drivers are stable.
7. Re-test with all devices enabled, not only one actuator.
The final values should be selected based on measured hardware behavior.
Important parameters include:
- control loop rate
- command transmission rate
- feedback rate
- inter-frame TX delay
- RX stale timeout
- TX frame timeout
- maximum allowed missing feedback count
These values are not universal. They depend on the adapter, bus topology, device firmware, and actuator hardware.
USB-CAN Converter Bottleneck
Many prototype robots use USB-CAN converters.
This is convenient, but it adds another layer between the control loop and the CAN bus.
```mermaid id=”8zxx31” flowchart LR App[ROS 2 hardware interface] OS[OS scheduler / driver] USB[USB stack] Conv[USB-CAN converter] CAN[CAN bus] Driver[Actuator driver]
App --> OS
OS --> USB
USB --> Conv
Conv --> CAN
CAN --> Driver ```
Even if the CAN bus is configured at 1 Mbps, the host application is not directly writing bits onto the CAN wires. Frames pass through the operating system, USB stack, converter firmware, and driver buffers before reaching the bus.
This can introduce:
```text id=”j8wk3p”
- additional latency
- timing jitter
- transmit buffering
- receive buffering
- bursty frame delivery
- device-dependent throughput limits ```
This is important when the hardware interface sends many frames at a high rate.
```mermaid id=”e5e7xw” flowchart LR App[Application writes frames] –> Queue[Host / USB / converter queue] Queue –> Burst[Frames released as burst] Burst –> CAN[CAN bus] CAN –> Device[Device RX buffer]
From the application side, several writes may return successfully.
```cpp id="ym7ymy"
for (const auto& frame : frames) {
can_interface.writeFrame(frame);
}
However, this does not guarantee that the frames were placed on the CAN bus with the intended spacing. The USB-CAN converter may buffer them and transmit them later, sometimes in a burst.
This matters for actuator drivers with limited receive buffering or slow command parsing.
```text id=”jvb9iq” Application write succeeds ≠ frame transmitted immediately ≠ actuator firmware processed the command
The bottleneck is not always the nominal CAN bitrate.
For high-rate control, the limiting factor may be:
```text id="gqt199"
- USB-CAN converter firmware
- USB polling interval
- driver implementation
- host-side queue size
- converter-side TX buffer
- converter-side RX buffer
- application read frequency
Different USB-CAN converters can behave very differently. Some devices handle dense traffic and timestamping well. Others are acceptable for diagnostics but unreliable for high-rate multi-actuator control.
This difference is often more visible when the system scales from one actuator to many actuators.
```text id=”xk0f2s” One actuator: low bus load low burst pressure most converters appear to work
Eight actuators: higher command rate higher feedback rate more bursts converter behavior becomes important
For this reason, USB-CAN converter selection should be treated as part of the hardware design, not just as a cable choice.
When using a USB-CAN converter, the hardware interface should be tuned with the converter in the loop.
Practical checks include:
```text id="eubmql"
- test with all actuators enabled
- monitor missed feedback
- monitor stale state events
- increase command rate gradually
- add inter-frame delay if needed
- compare behavior across converters if possible
- avoid assuming that successful host writes imply successful device processing
For low-rate diagnostics, most USB-CAN converters are sufficient. For high-rate multi-actuator control, the converter can become the bottleneck even when the CAN bitrate looks adequate.
Summary
A CAN hardware interface is not only an encoding and decoding layer.
It must also define the I/O timing policy.
The main points are:
- CAN devices follow different I/O patterns.
- Periodic broadcast, query-response, and command-response require different read/write logic.
- Blocking reads should be avoided inside a periodic control loop.
- Incoming frames should be drained and stored in a latest-state cache.
- Command frames may need inter-frame delay to avoid burst problems.
- A successful host-side CAN write does not guarantee actuator firmware processing.
- Command frequency and TX pacing must be tuned on real hardware.
A useful design separates the system into:
CanInterface
raw CAN backend
CanBus
RX/TX transport
CanFrameExecutor
pacing, timeout, freshness
CanSystem
ROS 2 Control hardware interface