Previous Posts:
Most hardware debugging starts with an imprecise statement.
I think the hardware is broken.
This may be true. But it is not yet a useful debugging statement.
In a CAN-based robot, the failure can be in many places.
flowchart TD
A["I think the hardware is broken"] --> B["Application code"]
A --> C["Hardware interface"]
A --> D["CAN adapter"]
A --> E["CAN bitrate / ID map"]
A --> F["Device firmware mode"]
A --> G["Power"]
A --> H["Wiring"]
A --> I["Termination"]
A --> J["Physical signal"]
The goal is not to immediately prove that the hardware is broken.
The goal is to turn a vague failure into a smaller, testable statement.
Bad:
I think the hardware is broken.
Better:
No CAN frames are visible in candump.
Better:
Feedback frames are visible, but command frames do not change actuator state.
Better:
TX error counter increases when sending commands.
Better:
The bus works with one actuator but fails when all actuators are connected.
Better:
CANH-CANL resistance is 120 Ω instead of 60 Ω.
This post follows one practical rule:
Start with the least invasive check.
Move toward the physical layer only when the previous layer cannot explain the failure.
In other words, do not start by disassembling the robot. Do not start by replacing the motor driver. Do not start with an oscilloscope unless there is a reason to go that low.
Start by observing the bus.
The debugging flow should move from easy observations to more invasive measurements.
flowchart TD
A["I think the hardware is broken"] --> B["1. Observe raw CAN traffic<br/>candump / PCAN-View"]
B --> C{"Expected frames<br/>at expected rate?"}
C -->|Yes| D["Likely software / configuration issue"]
D --> D1["Check CAN ID map"]
D --> D2["Check protocol encoding"]
D --> D3["Check protocol decoding"]
D --> D4["Check actuator mode"]
D --> D5["Check command timing"]
C -->|No| E["2. Check CAN interface state"]
E --> F{"Interface healthy?"}
F -->|No| G["Fix adapter / driver / interface setup"]
F -->|Yes| H["3. Check device power and mode"]
H --> I{"Device powered<br/>and expected to transmit?"}
I -->|No| J["Fix power / enable / boot mode"]
I -->|Yes| K["4. Check bitrate and CAN ID assumptions"]
K --> L{"Bitrate and IDs correct?"}
L -->|No| M["Fix bitrate / ID map / device configuration"]
L -->|Yes| N["5. Check wiring and termination"]
N --> O{"Wiring and termination OK?"}
O -->|No| P["Fix CANH/CANL / GND / connector / termination"]
O -->|Yes| Q["6. Check error counters"]
Q --> R{"Errors increasing?"}
R -->|Yes| S["Check ACK / CRC / sample point / signal integrity"]
R -->|No| T["Re-check protocol and device behavior"]
S --> U["7. Use oscilloscope"]
T --> U
This flowchart is not the only possible debugging sequence. It is a conservative one. It tries to avoid unnecessary hardware disassembly by using observable evidence first.
Start with the simplest observable signal.
On Linux with SocketCAN:
candump can0
With PCAN tools, use PCAN-View and select the correct channel and bitrate.
The first question is:
Do I receive the expected CAN frames
at the expected rate
with the expected CAN IDs?
This one question immediately separates many failure modes.
flowchart TD
A["Run candump / PCAN-View"] --> B{"Any frames visible?"}
B -->|No| C["Bus may not be transmitting<br/>or host cannot receive"]
B -->|Yes| D{"Expected IDs visible?"}
D -->|No| E["ID map / device ID / mode issue"]
D -->|Yes| F{"Expected rate?"}
F -->|No| G["Timing / mode / busload issue"]
F -->|Yes| H["Raw CAN receive path is likely working"]
Look for three things:
1. Are frames visible?
2. Are the expected CAN IDs visible?
3. Are they arriving at the expected rate?
Example:
Expected:
ID 0x201 at 500 Hz
Observed:
ID 0x201 at 500 Hz
This means the CAN bus, device transmission, adapter, and basic receive path are probably working.
The remaining problem may be software-side.
Possible causes include:
- wrong CAN ID map
- wrong byte order
- wrong scaling
- wrong command mode
- wrong enable sequence
- wrong joint-to-actuator mapping
- timeout threshold too strict
- controller frequency mismatch
- TX pacing problem
If the expected frames are visible, do not open the robot yet. Check the protocol and software assumptions first.
No traffic in candump does not always mean the bus is broken.
Some devices broadcast periodically. Some devices only respond to queries. Some devices only respond after a command.
flowchart LR
A["Device I/O Pattern"] --> B["Periodic broadcast"]
A --> C["Query-response"]
A --> D["Command-response"]
B --> B1["Expect frames without host command"]
C --> C1["Send query first"]
D --> D1["Send command first"]
sequenceDiagram
participant Device
participant Host
loop Fixed device rate
Device->>Host: State frame
end
For a periodic sensor, frames should appear without sending anything.
Examples:
- IMU
- force sensor
- tactile sensor board
- encoder board
If no frames appear, check power, bitrate, device mode, wiring, and termination.
sequenceDiagram
participant Host
participant Device
Host->>Device: Query frame
Device->>Host: Response frame
For a query-response device, candump may show nothing until a query is sent.
Example checks:
- send device status query
- request firmware version
- request diagnostic register
- request encoder value
If the query frame is visible but no response appears, check CAN ID, device ID, mode, bitrate, and ACK/error counters.
sequenceDiagram
participant Host
participant Device
Host->>Device: Command frame
Device->>Host: Feedback frame
Many motor drivers use this pattern.
A motor may not send full feedback until it receives a valid command or enters the correct operation mode.
If the command frame is visible but no feedback appears, check:
- command ID
- device ID
- enable sequence
- operation mode
- command payload
- watchdog timeout
- driver fault state
If raw traffic is missing, check the host-side CAN interface.
On Linux:
ip -details link show can0
Useful fields include:
- interface state
- bitrate
- sample point
- restart-ms
- RX/TX packet counts
- RX/TX error counters
- bus-off state
A healthy interface should normally be UP and ERROR-ACTIVE.
flowchart TD
A["ip -details link show can0"] --> B{"Interface UP?"}
B -->|No| C["Bring interface up"]
B -->|Yes| D{"ERROR-ACTIVE?"}
D -->|Yes| E["Interface state is likely OK"]
D -->|ERROR-WARNING| F["Check increasing error counters"]
D -->|ERROR-PASSIVE| G["Bus has many errors"]
D -->|BUS-OFF| H["Node stopped participating<br/>restart and fix bus issue"]
Bring up the interface with the correct bitrate.
sudo ip link set can0 down
sudo ip link set can0 up type can bitrate 1000000
If the interface enters BUS-OFF, the node has observed too many CAN errors and stopped participating in the bus.
Common causes include:
- wrong bitrate
- no ACK from other nodes
- CANH/CANL wiring problem
- missing or incorrect termination
- severe noise or signal integrity issue
All nodes on the same CAN bus must use the same bitrate.
If the bitrate is wrong, the bus may show no useful frames, or the error counters may increase rapidly.
flowchart LR
A["Host bitrate"] --> C{"Same bitrate?"}
B["Device bitrate"] --> C
C -->|Yes| D["Frames can be decoded"]
C -->|No| E["Errors / no valid frames"]
Also check CAN IDs.
A device may be working correctly but using a different ID than expected.
Expected:
motor feedback ID = 0x201
Observed:
only 0x181 appears
This is not necessarily a hardware failure. It may be an ID map, device configuration, or firmware mode issue.
Check:
- device ID DIP switch or EEPROM setting
- firmware default CAN ID
- command ID vs feedback ID
- standard ID vs extended ID
- bootloader ID vs runtime ID
A CAN device can be physically connected but still not transmit useful frames.
flowchart TD
A["Device connected to CAN"] --> B{"Logic power present?"}
B -->|No| C["No communication"]
B -->|Yes| D{"Correct mode?"}
D -->|No| E["Only heartbeat / no feedback / no response"]
D -->|Yes| F{"Enabled?"}
F -->|No| G["May ignore commands"]
F -->|Yes| H["Should communicate according to protocol"]
Check:
- logic power
- motor power
- enable pin
- enable command
- emergency stop state
- boot mode
- firmware mode
- device ID configuration
- fault state
Many motor drivers do not send normal feedback until they are enabled. Some devices only transmit heartbeat frames before entering operation mode.
If the software-visible state does not explain the issue, move to the physical bus.
Start with termination. This is still a low-effort check.
Power off the bus before measuring resistance.
Measure resistance between CANH and CANL.
Expected:
approximately 60 Ω
Typical readings:
~60 Ω:
two 120 Ω terminations are present
~120 Ω:
only one termination is present
very high / open:
no termination or open wiring
less than 60 Ω:
too many terminations or possible short
flowchart TD
A["Measure CANH-CANL resistance<br/>with bus powered off"] --> B{"Measured resistance"}
B -->|"~60 Ω"| C["Likely correct termination"]
B -->|"~120 Ω"| D["Only one terminator"]
B -->|"Open / very high"| E["No termination or open circuit"]
B -->|"<60 Ω"| F["Too many terminators or short"]
Then check wiring.
- CANH to CANH
- CANL to CANL
- common ground when required
- connector pinout
- shield connection
- cable length
- branch length
- loose contacts
Swapped CANH/CANL is common. Loose connectors are also common in robot prototypes because cables move, bend, and pull during testing.
If frames appear intermittently or the interface enters error states, check the error counters.
ip -details link show can0
Rough interpretation:
TX errors increasing:
the adapter may be transmitting but not receiving ACK
RX errors increasing:
frames may be corrupted or sampled incorrectly
bus-off:
too many errors; the controller stopped participating
flowchart TD
A["Error counters increasing"] --> B{"TX errors?"}
A --> C{"RX errors?"}
B -->|Yes| D["No ACK / wrong bitrate / wiring / no active node"]
C -->|Yes| E["CRC errors / sample point / noise / signal integrity"]
D --> F["Check bitrate, ACK, termination, wiring"]
E --> G["Check bitrate, sample point, waveform"]
CAN frames are checked by the protocol, including CRC. Other nodes also acknowledge valid frames.
If a transmitter does not receive ACK, it treats the transmission as failed and increments error counters.
sequenceDiagram
participant TX as Transmitter
participant RX as Other CAN node
TX->>RX: CAN frame
RX-->>TX: ACK if frame is valid
Note over TX: No ACK increases TX error counter
This means a node can attempt to send frames while still failing at the CAN protocol level.
An oscilloscope is useful, but it should not be the first tool in this flow.
Use it when simpler checks cannot explain the failure.
Typical cases:
- no frames are visible even though a device should transmit
- error counters increase
- the bus works with one device but fails with multiple devices
- communication works at low bitrate but fails at high bitrate
- communication is intermittent
- the robot works on the bench but fails after assembly
- long cables, custom connectors, or moving harnesses are used
At this point, inspect CANH and CANL directly.
flowchart TD
A["Oscilloscope check"] --> B["CANH"]
A --> C["CANL"]
A --> D["Differential signal"]
D --> E["Dominant / recessive levels"]
D --> F["Ringing"]
D --> G["Reflections"]
D --> H["Slow edges"]
D --> I["Noise"]
D --> J["Ground offset"]
Check:
- dominant and recessive levels
- differential voltage
- ringing
- reflections
- slow edges
- noise
- ground offset
- missing termination
- excessive termination
- connector or harness problems
If the waveform is wrong, software changes will not fix the problem.
After raw CAN communication is verified, return to the software stack.
At this point, the question should be more specific than “the hardware is broken.”
Examples:
The feedback frame arrives, but the decoded position is wrong.
The command frame is visible, but the driver does not leave idle mode.
The driver responds in PCAN-View, but not from my hardware interface.
The bus works with one actuator, but feedback becomes stale with eight actuators.
The command ID is correct, but the payload byte order is wrong.
Then inspect:
- CAN ID map
- byte order
- scaling
- signed vs unsigned conversion
- standard vs extended ID
- command sequence
- actuator mode
- watchdog timeout
- TX pacing
- feedback timeout
- joint-to-actuator mapping
flowchart TD
A["Raw CAN works"] --> B["Check protocol"]
B --> C["CAN ID"]
B --> D["Byte order"]
B --> E["Scaling"]
B --> F["Signed / unsigned"]
B --> G["Command sequence"]
B --> H["Mode / enable state"]
B --> I["Timing / TX pacing"]
Use this checklist from least invasive to most invasive.
1. Observe raw CAN traffic with candump or PCAN-View.
2. Check whether expected IDs appear at the expected rate.
3. Identify the device I/O pattern: broadcast, query-response, or command-response.
4. Check the CAN interface state and bitrate.
5. Check device power, enable state, and firmware mode.
6. Check CAN ID assumptions.
7. Measure CANH-CANL termination resistance.
8. Check wiring, connectors, and grounding.
9. Check error counters and bus-off state.
10. Use an oscilloscope if software-visible evidence is insufficient.
11. Return to protocol and software debugging only after raw CAN behavior is verified.
The goal is not to prove that the hardware is broken.
The goal is to find the first observable layer where the system stops behaving as expected.
A hardware failure should be the conclusion of the debugging process,
not the starting assumption.