-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Decoupling test generation and test execution #41
Comments
This sounds like something reactive streams are well suited for. Reactive streams can be infinite, can be cancelled, are asynchronous and can be easily parallelised. |
I don't necessarily want the streams themselves to be asynchronous (producing a test is almost an instant operation, it's executing which takes time, so there is little gain in having produces being asynchronuous). So it might be overkill for something we can express using a simple iterator. However, if the library for reactive streams already gives a lot of reusable building blocks (we think we will use), it might be worth looking into it. Reusable blocks which I imagine being used for testing are: bounds on test suites (for termination), interleaving of test suites (zip), extending test sequences (map), removing test cases (filter). If your suggestion is to implement this ourselves, I would rather go for something simple like iterators. If your suggestion is to use a library, then I would like to know whether you have experience with such a lib. Cheers, |
Another pro of Joshua's proposal is the possibility to use the Generally speaking, the It would be ideal if, in turn, the test generation methods could use an up-to-date (partial) hypothesis from It would be interesting to see how such a setup would compare to the conventional active learning approach in terms of performance and results. I am under the impression that active learning algorithms (at least some variations of L*, much less so TTT) add overhead (in terms of queries asked), that do not benefit the result of the learning process. |
So my plan was to have an interface What I envision for active learning is something where we can write the following (where all these components are provided in the library): EquivalenceOracle<I, O> eqOracle = new TestingOracle(new Bound(1000, new Interleave(new WMethod(...), new LeeYannakakisMethod(...)))); where |
I like the idea and think we should implement this in LearnLib! The one method that does not match this pattern is a random walk - but I think that is OK. |
Asynchronous execution does have benefits for parallel execution, but I agree that the performance gain would not be that big.
That's exactly the purpose of these libraries. For example, see the list of ReactiveX operators or the reactor-core Flux JavaDoc (which implements Reactive Streams). Unfortunately terminology differs a little between the ReactiveX and Reactive Streams camps, but the ideas are mostly the same. Your example: EquivalenceOracle<I, O> eqOracle = new TestingOracle(new Bound(1000, new Interleave(new WMethod(...), new LeeYannakakisMethod(...)))); could be implemented with standard operators (
This definitely shouldn't be something you want to implement by yourself. Project Reactor has a very nice core library, although documentation for version 3 is still lacking. RxJava is a little more mature, but it does not implement Reactive Streams by default. I have positive experiences with Reactor in other projects and I'd be happy to help here as well. |
@fhowar You are right about the random walk oracle. The design I propose is "word-based". I think it will work well for all equivalence oracles which use a membership oracle (the random walk oracle does not do that). If the random walk oracle is the only exception, I think it's fine to just leave it like that. If you can think of more of these examples, that would be nice, we need more examples! @praseodym I have to admit I am a bit overwhelmed by all the terminology (especially considering I am trying to solve an easy problem). But I can get used to that. One question though. It seems the test executor will be a subscriber in that model. That means I have to implement something like: class TestExecutor {
CE findCounterExample(...) {
// start generation
// and subscribe
// and return ???
}
void onNext(...) {
// execute test
// if counter example, return it to learner
}
void onCompletion(...) {
// return null to learner
}
} I do not really see yet, how to use the subscriber/publisher design here. Do I have to fork a thread in class TestExecutor {
CE findCounterExample(...) {
Iterator it = ...
for(Word w : it) {
// execute test
// if counter example, return it
}
return null;
}
} Also keep in mind that each hypothesis will have its own generation. But that I only want to construct the "chain" of operations once. |
@Jaxan I agree that it is definitely overwhelming at first. Actually it might not be a bad idea to start off implementing the iterator style and then adding reactive streams later (but of course before implementing things like An iterator/iterable can be wrapped into a Flux (Publisher) quite easily; executing tests could be done with something like: counterExample = Flux.fromIterable(() -> testGenerator)
.take(10)
.filter(query -> testForCounterExample(query, hypothesis))
.next(); Or with parallel membership queries (of course, counterExample = Flux.fromIterable(() -> testGenerator)
.take(10)
.parallel()
.runOn(Schedulers.parallel())
.filter(query -> testForCounterExample(query, hypothesis))
.sequential()
.next(); Implementing your own |
Just to give an update: next week, @praseodym, @ricksmet, others and myself will have a meeting to discuss this (among other learning related things). I will give an update after the meeting. |
I'm quite fond of this idea as well and I think both @Jaxan and @praseodym have pointed out valid requirements/concerns already. I'm not a fan of adding a big dependency such as RXJava for such a "small" use case. Also this would mean some kind of "vendor lock-in" because everyone who wants to use this feature would have to exclusively use (and understand) the chosen library. Yet I think, some comfort (e.g. interleaving multiple sources) is quite nice to have. In my opinion a good middle ground is given by using Java 8's I dabbled around a little bit with Streams in my learnlib fork. I added the basic interfaces you guys already proposed and tried to model some of the existing EQ oracles with the new approach, which already lead to cleaner code. (I know, I butchered the "batched" mode of the RandomWord oracle, but I just wanted a use case for unlimited streams). I also used the JOOL library to model some of the more advanced stuff, where multiple sources are cross-joined. (Btw, I think this library is quite nice, because it's relatively small and since But there are still some things I need to further think about. One thing is parallelism in Streams. In the best case, we could somehow cover the ParallelEQOracle with this approach as well. The other thing is: A big advantage of this approach is the lazy evaluation of generated test words. However, a lot of EQ oracles still compute all test words in a bulk operation and then simply break the iteration loop, once a counterexample is found. So to fully utilize this approach, there still needs to be some work done in lazily computing things like a characterizing set. But for now I'm interested in your opinion about this proposal. Do you think this API is rich enough to allow all the crazy stuff, yet slim enough to not pull in a specific framework? Or do you have any other improvement suggestions? |
Hi @mtf90, I gave your fork a quick look, and it seems very nice! Especially, the W-method is elegantly coded. Do you have any idea how a state-dependent methods will look like (e.g. the Wp-method)? About the parallelism of different test methods, I see two use cases:
I am not sure if both use cases have to be solved within the same framework. I can imagine solving both in different ways. I would say it has a low priority. I am not against using libraries, per se. As long as someone can still use LearnLib without having to write against those interfaces himself. JOOL seems to do that, since everything is just a java-Stream in the end. I'm all for it! Thanks for the great work! |
Regarding parallelism, I was initially wondering if I could somehow use the
Your second point is a little bit more complex, though. With the Lastly, your point on state dependent methods is quite good, because I haven't thought about that yet. In case of the WpMethod it is actually quite easy, because the state is local to the test generation, so you can simply incorporate it into the stream construction. See e.g. my commit mtf90/learnlib-ce-streams@7a123961d2b072bec384ad4569efcb978860ac5c (again, since JOOL only supports sequential streams, we don't have to worry about threading/synchronization). However, oracles such as the |
Just to add a short note here: the Reactive Streams API has been available for a while, and adding only the 3KB API dependency (i.e. not an actual implementation) could probably suffice. If someone were to use these features they could pull in Reactor or RxJava 2.x. No need to wait for Java 9 here. It's been a while since I used LearnLib, so it might well be possible to add these features in a completely separate module (i.e. outside of LearnLib entirely). To me, that would be preferred above any specific interface implementation. |
I looked into reactive streams a little bit further and start to see the appeal of it. It's nice that they have a separate API (I previously thought the situation was like with JavaRx1) and it's even nicer that they plan to be conformant to the JDK9 interfaces (such as Guava functions extending I tried to integrate reactive-streams in mtf90/learnlib-ce-streams@ba8cd064b5a7b91aed648943bcffc1b79674bce9 in a similar approach to the stream-based oracle, but I think there needs to be (a lot) more work done to fullfill the RS contract and integrate into existing EQOracle interfaces. For fully utilizing the asynchronous aspect we would also have to ramp up the ParallelOracle API, to (e.g.) delegate single items to free MembershipOracles as they come. While this definitely sounds interesting from a HPC standpoint, I'm not sure about the cost/value ratio for now. Especially for this FR, which is originally not concerned with performance, but convenience / clean API design. If think, if you plan to use RS a Publisher -> Iterator -> Stream chain will integrate in the proposed draft just fine. |
Many of the equivalence oracles in LearnLib are based on testing (this includes
RandomWordEQOracle
,W(p)MethodEQOracle
,CompleteExplorationEQOracle
, ...) and so each of these classes will contain code like this (in many flavours):It is a bit annoying to have to do this every time one comes up with a new test generation algorithm. My proposal is to introduce the concept of a test generator, then you implement the above code just once (i.e. one can make an equivalence oracle from any test generator).
Pros:
RandomWordEQOracle
for example does this).There are no cons ;-), besides having yet another abstraction.
Ideally we would implement this with coroutines, but Java does not have them. So I think iterators will do fine. Note that a collection is not sufficient, since many test generation methods are infinite. A supplier is also not sufficient because it cannot stop (some test generations methods may be finite).
In terms of interfaces, I think we would like to have two concepts. First there is the test generator, which may be just an iterator. Then there is a test generation method which has a function taking a hypothesis and returning a test generator. I have no opinion on whether we should use standard Java interfaces (like iterator) or define our own.
What are your thoughts?
Also should this be in LearnLib, or as something separate?
The text was updated successfully, but these errors were encountered: