The Agent
class has been predefined in the agent/
folder, with implementations for the OpenAI interface based on
oneapi and the currently deployed GLM interface. If you need to add a base model, you need to:
- Create a new Python file under the
agent/
directory, and refer toagent/model/OpenAIAgent
. Implement your model call by inheriting theAgent
class. Theact
function input is already organized according to the OpenAI message format, and the output should be a string. If the input format of the corresponding model differs from OpenAI, you can refer to theformat_history
function inclaude_model
and theprompt_to_message
function inqwen_model
for modifications.format_history
can organize the format of historical records, and theprompt_to_message
method converts the prompt and image input (if any) of the current turn into the single-turn format of the current model. - Import your new class in
agent/__init__.py
. - Replace the content under
agent
in the config file used byeval.py
with:
agent:
name: Your Agent Module Name
args:
max_new_tokens: 512
Make sure the name matches your implemented class name, and the content under args
will be passed to your
class's init
function.
During the process of writing a new task, it is equally important to write and use the code to determine if your code is correct through actual running results. Therefore, please follow the steps below to ensure each new task is error-free.
- Write your task. Tasks include yaml files, evaluation methods, and corresponding mobile app installation.
- The task's yaml file should refer to other existing files under
evaluation/config
and must includetask_id
,task
,metric_type
, andmetric_func
.adb_query
is only used when the results need to be queried using adb commands. Althoughcategory
is not yet in use, it is strongly recommended to add it. - The evaluation method needs to inherit the
evaluation/task/SingleTask
class. After each recorded operation, thejudge
function will be executed, and its return value is a dict:{"judge_page": bool, "1": bool, ..., "complete": bool}
. The code will record the judgment result of the last page wherejudge_page
isTrue
, andcomplete
should only be set toTrue
if all judgment points are correct. If it's a task that compares return values, thecheck_answer
method has already been implemented. Modifyfinal_ground_truth
to the standard answer before calling this function. - Refer to other tasks, import all evaluation methods in
evaluation/app_name/__init__.py
into thefunction_map
class. - To ensure the model can execute the launch command correctly, add the app name and corresponding package name
in
templates/packages/apps_dict
. The package name can be obtained by executingadb -s {device} shell dumpsys window | grep mCurrentFocus | awk -F '/' '{print $1}' | awk '{print $NF}'
.
- The task's yaml file should refer to other existing files under
- Execute your task using at least the most advanced agent and generate evaluation results. If necessary, quickly complete the correct operation during model operation intervals to ensure that the recorded operation can capture the correct result page between two model operations to test if your code can complete the detection task.
- Use the
tools/check_result_multiprocess.py
function to generate screenshots of each step. Focus on checking whether the screenshots of correct model operations are indeed judged as correct.
If you want to define a mobile snapshot different from the android eval snapshot, you need to follow these steps:
- Download related docker files from the link: https://drive.google.com/file/d/1xpPEzVof5hrt5bQY6BHm_4Uoyq5mJQNb/view?usp=drive_link
- Extract the file, enter the extracted folder, and then run:
docker build -t android_eval_no_avd:latest .
- Configure your AVD snapshot on an x86_64 machine (it is recommended to configure it directly using Android Studio). Note that the default installed Android AVD type is:
RUN /bin/bash -c "source /root/.bashrc && yes | sdkmanager 'platform-tools' 'emulator' 'system-images;android-33;google_apis;x86_64'"
RUN /bin/bash -c "source /root/.bashrc && yes | sdkmanager 'build-tools;33.0.0'"
RUN /bin/bash -c "source /root/.bashrc && yes | sdkmanager 'platforms;android-33'"
If you want to configure the AVD for a different version, please modify the specific version number installed in the Dockerfile. Note that the version number must be strictly consistent, otherwise, the installed image will not be able to read the existing cache.
- You can use the following code to generate the AVD image used in the docker:
python tools/modify_mobile_to_docker.py
--avd_dir /Path/to/your/.android/avd
--device_name your device name
--save_dir /Path/to/your/save/avd
Alternatively, you can modify it as follows:
Find your .avd folder and .ini file through Android Studio -> Virtual Devices Manager -> Right-click -> Show on Disk, and make the following modifications:
In Pixel_7_Pro_API_33.ini, modify path and path.rel to the following paths:
avd.ini.encoding=UTF-8
path=/root/.android/avd/device name.avd
path.rel=avd/device name.avd
target=android-33
In Pixel_7_Pro_API_33.avd/config.ini, modify the following paths:
...
image.sysdir.1 = system-images/android-33/google_apis/x86_64/
...
skin.path = /root/.android/skins/pixel_7_pro
...
Keep the other contents unchanged.
- Start an image and copy your .avd folder and .ini file into the image:
docker run -it android_eval_no_avd:latest /bin/bash
docker cp /path/to/your/device name.avd container_id:/root/.android/avd
docker cp /path/to/your/device name.ini container_id:/root/.android/avd
After completing the above, you can execute the following in the image:
emulator -avd device name -no-window -no-audio -no-snapshot-save
Verify whether the installation is successful.