Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document backend_config.ini #623

Closed
matsduf opened this issue Jun 29, 2020 · 9 comments · Fixed by #684
Closed

Document backend_config.ini #623

matsduf opened this issue Jun 29, 2020 · 9 comments · Fixed by #684
Labels
A-Documentation Area: Documentation only.
Milestone

Comments

@matsduf
Copy link
Contributor

matsduf commented Jun 29, 2020

All settings in backend_config.ini are not documented in Configuration.md, e.g. ZONEMASTER.force_hash_id_use_in_API_starting_from_id is not.

@matsduf matsduf added the A-Documentation Area: Documentation only. label Jun 29, 2020
@matsduf matsduf added this to the v2020.1 milestone Jun 29, 2020
@matsduf matsduf modified the milestones: v2020.1, v2020.2 Sep 21, 2020
@ghost ghost linked a pull request Dec 21, 2020 that will close this issue
@matsduf matsduf modified the milestones: v2020.2, v2021.1 Feb 9, 2021
@matsduf matsduf assigned ghost Apr 15, 2021
@ghost
Copy link

ghost commented Apr 26, 2021

There are 2 last keys to document :

  • number_of_processes_for_frontend_testing
  • number_of_processes_for_batch_testing

The names used for the keys seem to express the fact that we can use these keys to limit the number of processes depending on the usage.

However looking at the current code, I can't find much on this potential limitation. What I found is the following usage, where both values are used simultaneously to compute the maximum number of processes.

https://github.com/zonemaster/zonemaster-backend/blob/5b485f7f88caa545150e011fdb835b7767941b30/lib/Zonemaster/Backend/Config.pm#L359..L410

sub new_PM {
    my $self = shift;

    my $maximum_processes = $self->NumberOfProcessesForFrontendTesting() + $self->NumberOfProcessesForBatchTesting();
    ...
    my $pm = Parallel::ForkManager->new( $maximum_processes );
    ...
}

I then tried to look for code that can limit the creation of a process for each of the usage (frontend or batch) without success.

Going through the git history I was able to find code where the limitation was explicit (this is quite old, but this is around the time where the NumberOfProfessesForFrontendTesting and NumberOfProfessesForBatchTesting (sic) were added - 4df3d9b)

https://github.com/zonemaster/zonemaster-backend/blob/4df3d9bd8ac43e8ddfccf4df7aca9b7ed2acc699/JobRunner/execute_tests.pl#L68..L84

sub can_start_new_worker {
	my ($priority, $test_id) = @_;
	my $result = 0;
	
	my @nb_instances = split(/\n+/, `ps -ef | grep "execute_zonemaster_P$priority.pl" | grep -v "sh -c" | grep -v grep | grep -v tail`);
	my @same_test_id = split(/\n+/, `ps -ef | grep "execute_zonemaster_P$priority.pl $test_id " | grep -v "sh -c" | grep -v grep | grep -v tail`);
	
	my $max_slots = 0;
	if ($priority == 5) {
		$max_slots = $batch_slots;
	}
	elsif ($priority == 10) {
		$max_slots = $frontend_slots;
	}
	
	$result = 1 if (scalar @nb_instances < $max_slots && !@same_test_id);
}

if (can_start_new_worker($priority, $h->{id})) {

So it looks like the file execute_tests.pl was using the limitation per type of process. This file is no more in the codebase. And it seems that its substitute has not kept the logic on frontend and batch processes (see commit 55f72d9)

https://github.com/zonemaster/zonemaster-backend/blob/55f72d9ead3ab0e06f81b4a9d390d0972ebe14de/script/zm_wb_daemon#L42..L44

my $maximum_processes =
  Zonemaster::WebBackend::Config->NumberOfProfessesForFrontendTesting() +
  Zonemaster::WebBackend::Config->NumberOfProfessesForBatchTesting();

Unless I am missing something it looks like the idea to limit the number of processes per usage has not been kept into the code. So I suggest we decide what we do about batch processing as proposed in #743.

The best I come with at this point is to specify that these values are added together and the result defines the total number of allowed processes. Despite their names, they do not limit the number of process by type (frontend or batch).

@ghost
Copy link

ghost commented Apr 26, 2021

@matsduf and @mattias-p can you have a look at my above comment ? (I don't know if editing the comment with @ mentions raises a notification on your sides)

@matsduf
Copy link
Contributor Author

matsduf commented Apr 26, 2021

I think it would be fine to deprecate both number_of_processes_for_frontend_testing and number_of_processes_for_batch_testing, and replace that with a new key with a shorter name for limiting the number of processes. We should also set a reasonable default value on that so that the default config.ini can be without the key. I guess 10 could be such a value.

I think it is unclear, however, what will happen when you reach the limit.

Before we do anything about #743 we should contact RIPE that we could be using the batch function.

@mattias-p
Copy link
Member

I think it would be fine to deprecate both number_of_processes_for_frontend_testing and number_of_processes_for_batch_testing, and replace that with a new key with a shorter name for limiting the number of processes.

I agree with this. If you want two test agent worker pools it seems more reasonable to just run two test agent daemons.

We should also set a reasonable default value on that so that the default config.ini can be without the key. I guess 10 could be such a value.

I tried to look around to see what other people use. In one page of search engine hits I found two examples. The first one I found, Gunicorn, defaults to 1. The second one, parpool (some MatLab parallelization library or I don't know), defaults to one worker per physical core.

Having a default value of 1 worker for the test agent makes sense to me. One worker per core would make sense to me in a way if the test utilization is close to 100%. From looking at top during a single test I got a CPU utilization of around 50%. Another thing to consider is that if the test agent workers are hogging too much CPU the rpcapi workers would also be affected.

I think it is unclear, however, what will happen when you reach the limit.

Unclear in what way? Do you mean that there might be bugs in Parallel::ForkManager?

@matsduf
Copy link
Contributor Author

matsduf commented Apr 26, 2021

I think it is unclear, however, what will happen when you reach the limit.

Unclear in what way? Do you mean that there might be bugs in Parallel::ForkManager?

Well, unclear to me, I guess. :-)

@mattias-p
Copy link
Member

mattias-p commented Apr 27, 2021

The way I read the design is that Parallel::ForkManager grows its worker pool until it reaches its given limit. If new test requests come in at a higher rate than the worker pool can handle them, the test requests start queuing up. The test requests have a priority field that is used for prioritization when consuming the queue. If more than one test requests have the same priority value they are consumed in FIFO order.

@matsduf
Copy link
Contributor Author

matsduf commented Apr 27, 2021

@mattias-p, thank you!

@ghost
Copy link

ghost commented Mar 23, 2022

I think that all settings are now properly documented. Can we close this issue?

@matsduf
Copy link
Contributor Author

matsduf commented Mar 23, 2022

Fine to close. If we find something new that is properly documented it is better to open a new issue and specify what is missing.

@matsduf matsduf closed this as completed Mar 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-Documentation Area: Documentation only.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants