Model Predictive Control

The entropy term encourages information-gathering actions. For MPC with fixed rho, learned policies reduce mean iteration count by about 18 percent and best runtime by about 17 percent. For MPC with adaptive rho, learned policies reduce it…

2 sources - 9 claims

The entropy term encourages information-gathering actions. For MPC with fixed rho, learned policies reduce mean iteration count by about 18 percent and best runtime by about 17 percent. For MPC with adaptive rho, learned policies reduce iterations by up to 11 percent, but runtime differences are small. The MPC objective combines contact risk, expected contact cost, visual perception cost, and terminal belief entropy. The controller selects the lowest-risk component action sequence subject to a failure-probability bound and executes the first action. The MPC experiment varies only the initial state while fixing the dynamics, dimensions, cost matrices, horizon, and box constraints. The method assumes an obstacle-free tabletop setup with fixed pre-grasp arm configuration and finger-only execution. The paper treats MPC as a motivating setting where related optimization problems are repeatedly solved online for changing initial states. The controller optimizes only finger motions during grasp execution.