home | roadmap | issues | ©2022,2023 by tim menzies |
TESTED is a semi-supervised, multi-objective, model-based explanation system. The code is a refactoring of decades of work by dozens of Ph.D. students.
TESTED assumes that the best way to test "it" is to watch someone else (specifically, stakeholders 1) try to break "it". TESTED lets people explore more and fix more (but sampling less around a system).
TESTED is not a pretty GUI. Rather, it is a programming toolkit that shows that these kinds of tools are (very) simple to build. For example, everything here is just a few hundred lines of LUA (or less). And all of that code shares the same similar structures (so once you can code on tool, you can code much of the others).
So what is TESTED really about?
- is it about how to reconfigure broken things to make them better?
- is it about requirements engineering?
- is it about software engineering?
- is it about configuration?
- is it about data mining?
- is it about testing?
To which the answer is "yes". All these things share the same underlying methods and challenges. Which means tools built for one of these tasks can help the other 23.
TESTED aims to support "stakeholder testing". This is a kind of black box tested aimed at offering a big picture summary of some code, not just for the developers, but also for those that have to use the software. More than just acceptance testing (which is usually some contractual thing), stakeholder testing aims to measure then mitigate problems with the system. That is, normal testing just finds bugs while stakeholder testing finds gradients along which the system can slip up or down to better or worse behavior
Install Lua:
brew install lua # on max os/x
lua -v # TESTED needs at least LUA version 5.3
Check-out this repo
git clone https://github.com/timm/tested
Check your installation
cd tested/src
lua 101.lua -g all # should show one crash and several passes
When people like me (i.e. a developer) write software that is used by other people (i.e. the stakeholders), those other people should be able to verify that the built software is right, and validate that the right software is being built.
Such "stakeholder testing" is challenging since, often, stakeholders may not understand everything about what goes on inside the code. Hence stakeholder testing needs special kinds of tools that helps helps humans find the best things or fix the worst things; without having to offer too much information on each thing.
The central claim of TESTED is that these: tools are surprisingly easy to build. To say that another way:
- people can (and should) understand AI systems;
- then use those systems to build a better world.
Every tool is less than a few hundred lines of LUA code and all those tools share most of the same internal structure. Students can learn this simpler approach to AI as a set of weekly homeworks where they recode the tools in any language at all (except LUA). Then, for graduate students, they can also do a final four week project where they try to improve on a stakeholder testing tool called "fishing", provided in this kit.
(Just an aside, the way I do homeworks is that every week, everyone has to submit something, even if it is broken. Homeworks can get submitted multiple times so I grade them "2" (for "good"); "1" (for "invited to resubmit"); "0" (for "bad" or "no submission".)
Better methods for better searching for better solutions is important. There are too many examples of terrible software solutions. For example:
- Amazon had to scrap an automated recruiting tool as it was found to be biased against women.
- A widely used face recognition software was found to be biased against dark-skinned women and dark-skinned men.
- Google Translate, the most popular translation engine in the world, shows gender bias. “She is an engineer, He is a nurse” is translated into Turkish and then again into English becomes “He is an engineer, She is a nurse” [5].
- Chapter six of Safiya Noble’s book Algorithms of Oppression 4 tells the sad tale of how a design quirk of Yelp ruined a small business: As one of Noble’s interviewees put it "Black people don’t ‘check in’ and let people know where they’re at when they sit in my (hair dressing salon). They already feel like they are being hunted; they aren't going to tell the Man where they are". Hence, that salon fell in the Yelp ratings (losing customers) since its patrons rarely pressed the “checked-in” button.
For our purposes, the important point of the Noble example is this:
- if software designers had been more intentional about soliciting feedback from the Black community...
- then they could have changed how check-ins are weighted in the overall Yelp rating system.
As to the other examples, in each case there was some discriminatory effect which was easy to detect and repair 5], but developers just failed to test for those biases.
There is a solution to all these problems
- if a small group of people build software for the larger community
- they need to listen more to the concerns of the larger community.
For that to work, the smaller group of developers have to admit the larger group into their design processes– either via
- changing the reward structures such that there are inducements for the few to listen to the many (e.g. by better government legislation or professional standards); or
- inclusion practices that admits the broader community into the developer community, or by
- review practices where the developers can take better and faster feedback from the community.
The first two of these points requires major organization changes. This repository is more about the third point which can we said another way: from an ethical perspective, it is good practice to give software to stakeholders and let them try to break it.
Footnotes
-
Definition: "Stakeholders" are individuals or organizations having a right, share, claim, or interest in a system or in its possession of characteristics that meet their needs and expectations (ISO/IEC/IEEE 2015). ↩
-
Better Software Analytics via "DUO": Data Mining Algorithms Using/Used-by Optimizers Amritanshu Agrawal, Tim Menzies, Leandro L. Minku, Markus Wagner, and Zhe Yu. 2020. Empirical Softw. Engg. 25, 3 (May 2020), 2099–2136. https://doi.org/10.1007/s10664-020-09808-9 ↩
-
For more on the mysterious machine that runs deep within testing, SE, requirements engineering, configuration, etc, see my Ph.D. thesis. In summary, by the time you can test "it" then you can also exercise "it"; i.e. properly designed, a good test engine is also a good execution engine. For years I tried coding all this up in a logical framework. Then I found ways to use data mining for very faster, scalable, approximate reasoning. So now I offer my private theory-of-everything in a procedural framework, embedded with some data mining tools. Specifically, data miners divide a space and optimizers tell you how to jump around that space. ↩
-
Noble, Safiya Umoja. "Algorithms of oppression." Algorithms of Oppression. New York University Press, 2018. ↩
-
Chakraborty, Joymallya, Suvodeep Majumder, and Tim Menzies. "Bias in machine learning software: why? how? what to do?." Foundations of Software Engineering, 2021 ↩