Week 8: Reinforcement Learning for Control
The RL Paradigm in Robotics
Reinforcement Learning (RL) has revolutionized legged locomotion. Instead of hand-tuning PID gains for every joint, we define a Reward Function and let the robot learn how to walk via trial and error in simulation.
Key Algorithms
- PPO (Proximal Policy Optimization): The workhorse of on-policy RL. Stable and relatively easy to tune.
- SAC (Soft Actor-Critic): Off-policy, sample efficient, good for manipulation.
Sim-to-Real Gap
A policy trained in simulation often fails in the real world due to:
- Inaccurate physics modeling (friction, mass distribution).
- Sensor noise.
- Latency.
Solution: Domain Randomization. We vary mass, friction, and delays during training so the policy becomes robust.
Lab: Training a Walking Policy
We will use Isaac Lab (formerly Orbit) on top of Isaac Sim.
Step 1: Install Isaac Lab
Follow the official NVIDIA documentation for your specific GPU setup.
Step 2: Define the Task (Python)
# Pseudo-code for a Humanoid Task
class HumanoidWalkTask(ManagerBasedRleEnv):
def __init__(self, cfg):
# Define observation space (joint pos, vel, base lin_vel)
# Define action space (joint position targets)
pass
def _get_rewards(self):
# Reward forward velocity
lin_vel_reward = torch.sum(self.base_lin_vel[:, :2] * self.commands[:, :2], dim=1)
# Penalize energetic cost
torque_penalty = torch.sum(torch.square(self.dof_torques), dim=1)
return lin_vel_reward - 0.01 * torque_penalty
Step 3: Train
./isaaclab.sh -p scripts/rsl_rl/train.py --task Humanoid-v1 --headless