Image for post
Image for post
Photo by — @beaz

Google AI and Evolutionary Strategies

Exploring Evolutionary strategies — model agnostic meta-learning in robotics

If you are confused by the headline do not be alarmed.

“…moving trained policies from “sim-to-real” remains one of the greatest challenges of modern robotics, due to the subtle differences encountered between the simulation and real domains, termed the “reality gap”.”

They are discussing the rapid development of more accurate simulator engines.

  1. Offline reinforcement learning
  2. Domain randomisation.
Image for post
Image for post
Above: In the game of Pong, the policy could take the pixels of the screen and compute the probability of moving the player’s paddle (in green, on right) Up, Down, or neither. (picture by OpenAI)

“In ES, we forget entirely that there is an agent, an environment, that there are neural networks involved, or that interactions take place over time, etc.”

The optimisation is a “guess and check” process.

Image for post
Image for post
Their algorithm quickly: “adapts a legged robot’s policy to dynamics changes. In this example, the battery voltage dropped from 16.8V to 10V which reduced motor power, and a 500g mass was also placed on the robot’s side, causing it to turn rather than walk straight. The policy is able to adapt in only 50 episodes (or 150s of real-world data).”

“at a high level, meta-learning learns to solve an incoming task quickly without completely retraining from scratch, by combining past experiences with small amounts of experience from the incoming task.”

Most of the past experiences come cheaply from simulation.

  1. However, increasing the policy’s stochasticity may also benefit exploration, as the policy needs to use random actions to probe the type of environment to which it adapts.”

“ES-MAML, an algorithm that leverages a drastically different paradigm for high-dimensional optimisation — evolutionary strategies.”

That is Evolutionary Strategies — Model Agnostic Meta-Learning.

This approach updates the policy based solely on:

“This flexibility is critical for efficient adaptation of locomotion meta-policies. Our results show that adaptation with ES can be conducted with a small number of additional on-robot episodes. Thus, ES is no longer just an attractive alternative to the state-of-the-art algorithms, but defines a new state of the art for several challenging RL tasks.”

To read more in-depth check out their actual blog post that will go far more into the details.

Written by

AI Policy and Ethics at Student at University of Copenhagen MSc in Social Data Science. All views are my own.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store