What does this writing signal mean?

OpenAI Writing: Solving Rubik’s Cube with a robot hand

Captured source

source ↗

openai.com/openai.com/index/solving-rubiks-cube

Solving Rubik’s Cube with a robot hand

Source ↗

published Oct 15, 2019seen 6dcaptured 2dhttp 200method exa

Solving Rubik’s Cube with a robot hand | OpenAI

October 15, 2019

Solving Rubik’s Cube with a robot hand

Read paper Watch all videos

Photo: Eric Haines

Loading…

We’ve trained a pair of neural networks to solve the Rubik’s Cube with a human-like robot hand. The neural networks are trained entirely in simulation, using the same reinforcement learning code as OpenAI Five⁠ paired with a new technique called Automatic Domain Randomization (ADR). The system can handle situations it never saw during training, such as being prodded by a stuffed giraffe⁠. This shows that reinforcement learning isn’t just a tool for virtual tasks, but can solve physical-world problems requiring unprecedented dexterity.

Human hands let us solve a wide variety of tasks. For the past 60 years of robotics, hard tasks which humans accomplish with their fixed pair of hands have required designing a custom robot for each task⁠. As an alternative, people have spent many decades trying to use general-purpose robotic hardware⁠, but with limited success due to their high degrees of freedom. In particular, the hardware we use here is not new—the robot hand we use has been around for the last 15 years—but the software approach is.

Since May 2017, we’ve been trying to train a human-like robotic hand to solve the Rubik’s Cube⁠. We set this goal because we believe that successfully training such a robotic hand to do complex manipulation tasks lays the foundation for general-purpose robots. We solved the Rubik’s Cube in simulation in July 2017. But as of July 2018, we could only manipulate a block⁠ on the robot. Now, we’ve reached our initial goal.

A full solve of the Rubik’s Cube. This video plays at real-time and was not edited in any way.

Solving a Rubik’s Cube one-handed is a challenging task even for humans, and it takes children several years to gain the dexterity required to master it. Our robot still hasn’t perfected its technique⁠ though, as it solves the Rubik’s Cube 60% of the time (and only 20% of the time for a maximally difficult⁠ scramble).

Our approach

We train neural networks to solve the Rubik’s Cube in simulation⁠ using reinforcement learning and Kociemba’s algorithm⁠ for picking the solution steps.A Domain⁠ randomization⁠ enables networks trained solely in simulation to transfer to a real robot.

The biggest challenge we faced was to create environments in simulation diverse enough to capture the physics of the real world. Factors like friction, elasticity and dynamics are incredibly difficult to measure and model for objects as complex as Rubik’s Cubes or robotic hands and we found that domain randomization alone is not enough.

To overcome this, we developed a new method called Automatic Domain Randomization (ADR), which endlessly generates progressively more difficult environments in simulation.B This frees us from having an accurate model of the real world, and enables the transfer of neural networks learned in simulation to be applied to the real world.

ADR starts with a single, nonrandomized environment, wherein a neural network learns to solve Rubik’s Cube. As the neural network gets better at the task and reaches a performance threshold, the amount of domain randomization is increased automatically. This makes the task harder, since the neural network must now learn to generalize to more randomized environments. The network keeps learning until it again exceeds the performance threshold, when more randomization kicks in, and the process is repeated.

One of the parameters we randomize is the size of the Rubik’s Cube (above). ADR begins with a fixed size of the Rubik’s Cube and gradually increases the randomization range as training progresses. We apply the same technique to all other parameters, such as the mass of the cube, the friction of the robot fingers, and the visual surface materials of the hand. The neural network thus has to learn to solve the Rubik’s Cube under all of those increasingly more difficult conditions.

Domain randomization required us to manually specify randomization ranges, which is difficult since too much randomization makes learning difficult but too little randomization hinders transfer to the real robot. ADR solves this by automatically expanding randomization ranges over time with no human intervention. ADR removes the need for domain knowledge and makes it simpler to apply our methods to new tasks. In contrast to manual domain randomization, ADR also keeps the task always challenging with training never converging.

We compared ADR to manual domain randomization on the block flipping task, where we already had a strong baseline⁠. In the beginning ADR performs worse in terms of number of successes on the real robot. But as ADR increases the entropy, which is a measure of the complexity of the environment, the transfer performance eventually doubles over the baseline—without human tuning.

Analysis

Testing for robustness

Using ADR, we are able to train neural networks in simulation that can solve the Rubik’s Cube on the real robot hand. This is because ADR exposes the network to an endless variety of randomized simulations. It is this exposure to complexity during training that prepares the network to transfer from simulation to the real world since it has to learn to quickly identify and adjust to whatever physical world it is confronted with.

To test the limits of our method, we experiment with a variety of perturbations while the hand is solving the Rubik’s Cube. Not only does this test for the robustness of our control network but also tests our vision network, which we here use to estimate the cube’s position and orientation.

We find that our system trained with ADR is surprisingly robust to perturbations even though we never trained with them: The robot can successfully perform most flips and face rotations under all tested perturbations, though not at peak performance.

Emergent meta-learning

We believe that meta-learning⁠, or learning to learn, is an important prerequisite for building general-purpose systems, since it enables them to quickly adapt to changing conditions in their environments. The hypothesis behind ADR is that a memory-augmented networks combined with a sufficiently randomized environment leads to emergent meta-learning, where the network implements a learning algorithm that allows itself to rapidly adapt its behavior to the environment it is deployed in.C

To…

Excerpt shown — open the source for the full document.