This is an experimental plugin for Appium that enables test automation of mobile apps using Test.ai's machine-learning element type classifier. It allows you to find Appium elements using a semantic label (like "cart" or "microphone" or "arrow") instead of having to dig through your app hierarchy. The same labels can be used to find elements with the same general shape across different apps and different visual designs.
In addition to being a plugin for Appium, this project also contains a small server that can be run, bundled with clients in various programming languages that allow the same functionality for Selenium. (See further below)
If you haven't worked with Appium element finding plugins before, you should first check out the Appium element finding plugins doc.
First, you'll need some system dependencies to do with image processing.
brew install pkg-config cairo pango libpng jpeg giflib
sudo apt-get install pkg-config libcairo2-dev libpango* libpng-dev libjpeg-dev giflib*
You may have to install each package individually if you run into issues
TBD (not yet tested or supported)
Appium's element finding plugin feature is experimental, so you will need to be using Appium version 1.9.2-beta.2 at a minimum. Also, be sure you either using the XCUITest driver (for iOS) or the UiAutomator2 or Espresso drivers (for Android). The older iOS and Android drivers do not support the required Appium capabilities, and are deprecated in any case.
If you wish to take advantage of the object detection mode for the plugin (see below), you'll need Appium 1.13.0 or higher.
To make this plugin available to Appium, you have three options:
- Simply go to the directory where Appium is installed (whether a git clone,
or installed in the global
node_modules
directory by NPM), and runnpm install test-ai-classifier
to install this plugin into Appium's dependency tree and make it available. - Install it anywhere on your filesystem and use an absolute path as the module name (see below).
- Install it globally (
npm install -g test-ai-classifier
) and make sure yourNODE_PATH
is set to the globalnode_modules
dir.
Element finding plugins are made available via a special locator strategy,
-custom
. To tell Appium which plugin to use when this locator strategy is
requested, send in the module name and a selector shortcut as the
customFindModules
capability. For example, to use this plugin, set the
customFindModules
capability to something like {"ai": "test-ai-classifier"}
(here ai
is the "selector shortcut" and test-ai-classifier
is the "module
name"). This will enable access to the plugin when using selectors of the form
ai:foo
(or simply foo
if this is the only custom find module you are using
with Appium).
In addition to this capability, you'll need to set another Appium capability,
shouldUseCompactResponses
, to false
. This directs Appium to include extra
information about elements while they are being found, which dramatically
speeds up the process of getting inputs to this plugin.
In your test, you can now make new findElement calls, for example:
driver.findElement('-custom', 'ai:cart');
The above command (which will differ for each Appium client, of course), will use this plugin to find a shopping cart element on the screen.
How did we know we could use "cart" as a label? There is a predefined list of
available labels in lib/labels.js
--check there to see if the elements you
want to find match any of them.
Using the testaiConfidenceThreshold
capability, you can set a confidence
threshold below which the plugin will refuse to consider elements as matching
your label. This capability should be a number between 0 and 1, where 1 means
confidence must be perfect, and 0 means no confidence at all is required.
This is a useful capability to set after reading the Appium logs from a failed
element find; this plugin will tell you what the highest confidence of any
element that matched your label was, so you could use that to modulate the
confidence value. The default confidence level is 0.2
.
There are two ways that this plugin can attempt to find elements:
- The default mode uses Appium to get a list of all leaf-node elements, and
can be specified by setting the
testaiFindMode
capability toelement_lookup
. Images of these elements are collected and sent to the test.ai classifier for labeling. Matched elements are returned to your test script as full-blownWebElement
s, just as if you were using any of the standard Appium locator strategies. - The alternative mode takes a single screenshot, and uses an object detection
network to attempt to identify screen regions of interest. These regions are
then sent into the classifier for labeling. Matched regions are returned to
your test script as Appium ImageElements (meaning that all you can do with
them is click/tap them). This mode can be specified by setting the
testaiFindMode
capability toobject_detection
. By default, no output is logged from the native object detection code (apart from what TensorFlow itself does), but this can be turned on by setting thetestaiObjDetectionDebug
capability totrue
.
Each of these modes comes with different benefits and drawbacks:
Element lookup mode returns full-blown elements to your test script, which means you can perform any standard actions on them. However, leaf-node elements are not always easy for the classifier to label. For example, in iOS it is common to have a single element with both an icon and text as part of the element, and this kind of element will never be labeled with high confidence. Element lookup mode is also especially slow in cases where there are many elements.
Object detection mode is not limited to actual UI elements, as it deals only with an image of the screen. So, it can accurately find icons to label even if those icons are mixed with other content in their UI element form. Object detection is currently slow, but in principle it is faster (at least in the limit) than element lookup mode. The main drawback is that elements returned to your script are really just representations of screen regions, not full-blown UI elements. So all that can be done with them is clicking/tapping them (of course, that's typically all you would do with an icon anyway).
Currently, detecting objects in a screenshot is quite slow.
Object detection mode relies on C/C++ code which is built on install. This code is portable but may not compile on some systems.
The TensorFlow network used to run the object detection strategy is provided as a free download by Test.ai, and downloaded automatically on install. If something goes wrong or you want to download it manually, you can run:
node ./build-js/lib/download.js
This will not re-download the model if the MD5 hash of the model online matches what is currently downloaded.
While the functionality provided by this project is available as a plugin for direct use with Appium, it can also be used for arbitrary purposes. In this fashion, it must be run as a server, which accepts connections from a client written in a number of languages. These clients can ask the server to classify images. The clients also make available a method which takes a Selenium driver object and finds elements matching a label.
test-ai-classifier -h <HOST> -p <PORT>
The default host is 127.0.0.1
and the default port is 50051
.
For information on how to use the clients to take advantage of the server's functionality, see the repositories for each of them:
There are some limitations to how the Selenium support works, because it relies
on the getElementScreenshot
functionality, which is not yet supported well by
all the major browsers. (In my testing, Chrome was the most reliable).
There are some tests, but they must be run ad hoc. See the tests themselves for assumptions.