-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Debugging support in the aggregation service: Feedback Requested #42
Comments
At Criteo, we are using the aggregation service when testing the end-to-end pipeline of ARA reports. We have been using the Aggregation Service for months, and have faced several issues when trying to run aggregation jobs. While the setup documentation is really clear, it turns out that most of our efforts w.r.t the aggregation service were spent not deploying or maintaining it, but in debugging it. Here we give some ideas of features that we think would greatly enhance our visibility when debugging aggregation jobs, as well as insight on information we think should be part of the aggregation service documentation. 1. More details on PRIVACY_BUDGET_EXHAUSTED errorsRoot causes for aggregation jobs failing to execute are currently very obscure, and it’s hard to know where the error lies. This is specifically the case for PRIVACY_BUDGET_EXHAUSTED errors. It would be a lot easier for us to locate and fix errors if an aggregation service failure could give information on either: The report(s) causing the error, or at least the sharedID's information (or sharedIDs' information) related to the issue The jobId of the aggregations that were related to the error, be it the aggregation that failed, but also any other, previous aggregation, that could have consumed the privacy budget for the faulty sharedIDs 2. Additional documentation on the AWS internal architectureTo simplify the understanding of the AS structure in AWS, it would be helpful to have a document explaining the various components of the aggregation service (job queue on SQS, job status table in DynamoDB, workers on EC2, access through API Gateway, etc.). Knowing what type of information is exposed via AWS tools, its format, and where to look for it would all be useful. Additionally, once changes are made to the AS running online by the adtechs, a new deployment using Google’s cloned repositories will probably override the specific settings reached at that point (although we haven’t done this ourselves). It would be interesting to add more options when filling in the 3. Additional information on optimization of the AS within and without the AWS infrastructureThe sizing guidance provides useful guidelines for choosing EC2 instance types depending on batch sizes. However in our tests we observed that splitting the aggregation load into thousands of small batches (which is necessary to batch the data per client) leads to long end-to-end execution times, at least if done in a naive way, even if the processing times for individual batches are short. In order to facilitate the tuning of this process for AdTechs it would be useful to have:
|
Is there any plan to address the debugging of PRIVACY_BUDGET_EXHAUSTED error? We met the same issue. |
@evazkj We plan to launch some tooling for budget recovery post-disasters (errors, misconfigurations, and so on), and this tooling can also be used to test batching configurations to prevent PRIVACY_BUDGET_EXHAUSTED errors. |
Hi,
The Aggregation service team is looking for your feedback to improve debugging support in the service.
Adtech can already get metrics for their jobs (status, errors, execution time etc.) from the Cloud metadata (DynamoDb in AWS and Spanner on GCP).
We are exploring other metrics, traces and logs that can provide a better understanding of the job processing within the Trusted Execution Environment without impacting privacy. We are considering providing CPU and memory metrics and total execution time traces for the adtech deployment and will benefit from your feedback on other metrics that adtech may find useful.
We are also considering adding useful logs which can give information about the job processing for debugging purposes such as ‘Job at data reading stage’ etc.. This is subject to review and approval considering user privacy.
Your inputs will be reviewed by the Privacy Sandbox team. We welcome any feedback on debugging Aggregation Service jobs.
Thank you!
The text was updated successfully, but these errors were encountered: