google-deepmind/android_env

Python

Open original ↗

Captured source

source ↗
published Apr 21, 2021seen 5dcaptured 10hhttp 200method plain

google-deepmind/android_env

Description: RL research on Android devices.

Language: Python

License: Apache-2.0

Stars: 1223

Forks: 111

Open issues: 1

Created: 2021-04-21T09:34:44Z

Pushed: 2026-06-09T02:27:02Z

Default branch: main

Fork: no

Archived: no

README:

AndroidEnv - The Android Learning Environment

AndroidEnv is a Python library that exposes an Android device as a Reinforcement Learning (RL) environment. The library provides a flexible platform for defining custom tasks on top of the Android Operating System, including any Android application. Agents interact with the device through a universal action interface - the touchscreen - by sending localized touch and lift events to the system. The library processes these events and returns pixel observations and rewards as provided by specific task definitions. For example, rewards might be given for events such as successfully scrolling down a page, sending an email, or achieving some score in a game, depending on the research purpose and how the user configures the task.

![tests](https://github.com/google-deepmind/android_env/actions/workflows/tests.yml) ![PyPI version](https://badge.fury.io/py/android-env) ![Downloads](https://pepy.tech/project/android-env)

Index

Environment features

There are a number of aspects that make AndroidEnv a challenging yet suitable environment for Reinforcement Learning research:

  • Allowing agents to interact with a system used daily by billions of users

around the world, AndroidEnv offers a platform for RL agents to navigate, learn tasks and have direct impact in real-world contexts. The environment wraps a simulated Android device, which runs independently from the environment, completely unaltered, and works in exactly the same way as the devices that humans use, exposing exactly the same features and services.

  • The platform offers a virtually infinite range of possible tasks, all

sharing a common action interface. The library facilitates the design of Reinforcement Learning tasks for any existing or custom built Android application. For example, it exposes the broad world of Android games, ranging from card games, puzzle games, time reactive games, all requiring a diverse set of action combinations and interaction types.

  • The environment runs on top of a real-time simulation of an Android

device. In other words, the environment dynamics does not wait for the agent to deliberate, and the speed of the simulation cannot be increased.

  • The observation is a collection of RGB values corresponding to the

displayed pixels on the screen. The exact screen resolution depends on the simulated device, but in general it will be considered relatively large in an RL context. However, users have the option of downsampling each observation.

  • The learning environment has an interesting, complex action space unique

to the touchscreen interface of Android.

  • The raw, hybrid action space consists of a continuous tuple

signifying the action location, and a discrete signal determining whether the agent wants to touch the screen or lift its virtual finger.

  • Raw actions are highly composable: the Android UI and most

applications were designed so that they could be intuitively navigated via common touchscreen gestures such as tapping, scrolling, swiping, pinching, drag & drop etc. This is still the case in AndroidEnv: to trigger meaningful changes in the environment, the agent often has to perform carefully timed and positioned sequences of raw actions. For example, in order to navigate to the next image in a photo gallery, the agent would have to perform a *swipe*, touching the screen multiple times, gradually shifting the actions' positions to the right. Thus, in most contexts raw actions do not trigger changes in the state of the environment unless correctly chained together to make up a human gesture.

  • The action interface is closely related to the observation space, as

meaningful touch and lift events are often either co-localized or strongly correlated to the location or movement of salient objects in the observation. For example, the position of a button on the screen aligns with the location of the actions that trigger the button press.

  • The library provides tools for flexibly **altering the action

interface** if needed for particular studies, such as discretization or hard-coding gesture skills. Still, we believe that the real challenge remains in devising agents that are capable of dealing with a large suite of diverse tasks, through acting and learning in the complex unifying action interface.

Getting started

Installation

The easiest way to get AndroidEnv is with pip:

$ python3 -m pip install android-env

Please note that /examples are not included in this package.

Alternatively, you can clone the repository from git's main branch:

$ git clone https://github.com/google-deepmind/android_env/
$ cd android_env
$ python3 -m pip install .

Update: the environment now runs on Windows, but please keep in mind that this option is not well-maintained or widely supported, as Unix-based systems are the primary target platforms of this project.

Create a simulator

Before running the environment, you will need access to an emulated Android device. For instructions on creating a virtual Android device, see the Emulator guide.

Define a task

Then, you…

Excerpt shown — open the source for the full document.