Skip to main content

Makale: Learning Contact-Rich Manipulation Skills with Guided Policy Search

Autonomous learning of object manipulation skills can enable robots to acquire rich behavioral repertoires that scale to the variety of objects found in the real world. However, current motion skill learning methods typically restrict the behavior to a compact, low-dimensional representation, limiting its expressiveness and generality. In this paper, we extend a recently developed policy search method and use it to learn a range of dynamic manipulation behaviors with highly general policy representations, without using known models or example demonstrations. Our approach learns a set of trajectories for the desired motion skill by using iteratively refitted time-varying linear models, and then unifies these trajectories into a single control policy that can generalize to new situations. To enable this method to run on a real robot, we introduce several improvements that reduce the sample count and automate parameter selection. We show that our method can acquire fast, fluent behaviors after only minutes of interaction time, and can learn robust controllers for complex tasks, including stacking large lego blocks, putting together a plastic toy, placing wooden rings onto tight-fitting pegs, and screwing bottle caps onto bottles.

Makale: Incentivizing Exploration In Reinforcement Learning With Deep Predictive Models

Achieving efficient and scalable exploration in complex domains poses a major challenge in reinforcement learning. While Bayesian and PAC-MDP approaches to the exploration problem offer strong formal guarantees, they are often impractical in higher dimensions due to their reliance on enumerating the state-action space. Hence, exploration in complex domains is often performed with simple epsilon-greedy methods. To achieve more efficient exploration, we develop a method for assigning exploration bonuses based on a concurrently learned model of the system dynamics. By parameterizing our learned model with a neural network, we are able to develop a scalable and efficient approach to exploration bonuses that can be applied to tasks with complex, high-dimensional state spaces. We demonstrate our approach on the task of learning to play Atari games from raw pixel inputs. In this domain, our method offers substantial improvements in exploration efficiency when compared with the standard epsilon greedy approach. As a result of our improved exploration strategy, we are able to achieve state-of-the-art results on several games that pose a major challenge for prior methods.

Makale: End-to-End Training of Deep Visuomotor Policies

Policy search methods based on reinforcement learning and optimal control can allow robots to automatically learn a wide range of tasks. However, practical applications of policy search tend to require the policy to be supported by hand-engineered components for perception, state estimation, and low-level control. We propose a method for learning policies that map raw, low-level observations, consisting of joint angles and camera images, directly to the torques at the robot’s joints. The policies are represented as deep convolutional neural networks (CNNs) with 92,000 parameters. The high dimensionality of such policies poses a tremendous challenge for policy search. To address this challenge, we develop a sensorimotor guided policy search method that can handle high-dimensional policies and partially observed tasks. We use BADMM to decompose policy search into an optimal control phase and supervised learning phase, allowing CNN policies to be trained with standard supervised learning techniques. This method can learn a number of manipulation tasks that require close coordination between vision and control, including inserting a block into a shape sorting cube, screwing on a bottle cap, fitting the claw of a toy hammer under a nail with various grasps, and placing a coat hanger on a clothes rack.

Makale:Gradient Estimation Using Stochastic Computation Graphs

In a variety of problems originating in supervised, unsupervised, and reinforcement learning, the loss function is defined by an expectation over a collection of random variables, which might be part of a probabilistic model or the external world. Estimating the gradient of this loss function, using samples, lies at the core of gradient-based learning algorithms for these problems. We introduce the formalism of stochastic computation graphs—directed acyclic graphs that include both deterministic functions and conditional probability distributions—and describe how to easily and automatically derive an unbiased estimator of the loss function’s gradient. The resulting algorithm for computing the gradient estimator is a simple modification of the standard backpropagation algorithm. The generic scheme we propose unifies estimators derived in variety of prior work, along with variance-reduction techniques therein. It could assist researchers in developing intricate models involving a combination of stochastic and deterministic operations, enabling, for example, attention, memory, and control actions.

Makale: Deep Learning Helicopter Dynamics Models

We consider the problem of system identification of helicopter dynamics. Helicopters are complex systems, coupling rigid body dynamics with aerodynamics, engine dynamics, vibration, and other phenomena. Resultantly, they pose a challenging system identification problem, especially when considering non-stationary flight regimes. We pose the dynamics modeling problem as direct highdimensional regression, and take inspiration from recent results in Deep Learning to represent the helicopter dynamics with a Rectified Linear Unit (ReLU) Network Model, a hierarchical neural network model. We provide a simple method for initializing the parameters of the model, and optimization details for training. We describe three baseline models and show that they are significantly outperformed by the ReLU Network Model in experiments on real data, indicating the power of the model to capture useful structure in system dynamics across a rich array of aerobatic maneuvers. Specifically, the ReLU Network Model improves 58% overall in RMS acceleration prediction over state-of-the-art methods. Predicting acceleration along the helicopter’s up-down axis is empirically found to be the most difficult, and the ReLU Network Model improves by 60% over the prior state-of-the-art. We discuss explanations of these performance gains, and also investigate the impact of hyperparameters in the novel model.