Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide a monthly report of youtube API keys #854

Closed
benoit74 opened this issue Oct 25, 2023 · 13 comments · Fixed by #855
Closed

Provide a monthly report of youtube API keys #854

benoit74 opened this issue Oct 25, 2023 · 13 comments · Fixed by #855
Assignees

Comments

@benoit74
Copy link
Collaborator

The content team would like to have monthly overview of which Youtube API keys are used by which recipes, to check everything is in order.

API keys should probably not be displayed in the report but only referenced by name ; and they should not be stored in code, so a map of {API key name => API key value hash} is probably the way to go.

Report creation must be automated, how to push it to content team could probably be manual in a first step, or it could be an issue on Github, to be discussed.

@benoit74 benoit74 self-assigned this Oct 25, 2023
@benoit74
Copy link
Collaborator Author

Code gathering the data is ready, and as discussed I store only key hashes in the configuration to avoid centralizing all API keys in plain text in one file. This code will run once per month, as a kubernetes job.

@RavanJAltaie @Popolechien
Is it ok for you if we create one issue per month in zim-request which reports API keys used and ask you to review, adapt configuration if needed and then you can close the issue on your own?

@kelson42 @rgaudin
Are you ok with this design idea? Can we reuse the Github token already used by Github CI to setup the k8s job?

@rgaudin
Copy link
Member

rgaudin commented Oct 26, 2023

Yes, that seems appropriate.

@Popolechien
Copy link
Contributor

We'll need a bit of briefing beforehand to make but yeah, should be manageable.

@benoit74
Copy link
Collaborator Author

We just have discussed about it right now with Stephane and Ravan and have agreed that:

  • information on which recipe is using which API key is only the bare minimum
  • knowing the real API usage per recipe (and hence per API key) would be much more valuable to allow the content team to fix what needs to be
  • since this is a complex information to retrieve due to its dynamic nature, a good first estimate could be the number of videos per channel (this is only proxy data, but a good educated guess)

During the meeting, I agreed to investigate how feasible this would be to provide this information.

While writing this comment, I realize that this effort around API keys usage / monitoring is maybe useless since we might get rid of these API keys once we achieve to implement openzim/youtube#177

Rather than investigating this issue (having a monthly report of API keys usage) I will probably take time to investigate the root cause (could we get rid of API keys). I will keep you informed.

@rgaudin
Copy link
Member

rgaudin commented Oct 26, 2023

I don't understand the last sentence. We can't use current code without API Keys. We can get rid of API Keys by implementing mentioned ticket. There's no technical challenge but it's a piece of work.

I also don't think it's wise use of time to fetch API keys usage or nb of videos.

I think the distribution of recipes with API keys is a good start that's both easy to deploy and easy to assess.

@benoit74
Copy link
Collaborator Author

In the last sentence I meant I would spend my time more wisely on mentioned issue (which would allow us to get rid of API keys) rather than fetching number of videos per API key.

@rgaudin
Copy link
Member

rgaudin commented Oct 27, 2023

Ah ok 👍

@benoit74
Copy link
Collaborator Author

I just tried to grab number of videos manually and indeed it is quite fast to do, it took me 5 minutes to process 13 recipes, since we have 134 recipes it would take about 1 hour to process them all, it is not a very tedious task and definitely not a full project to get the full list.

So:

  • I will push in production the monthly report of API keys as currently available, i.e. with only the list of recipes per API key
  • I consider it is feasible to manually create a file with the amount of videos per channel (updating it manually once a year is definitely feasible)

List of videos per recipe I explored:

- cest-pas-sorcier_fr_all: 1900 videos
- voa_learning_english_all_playlists: 5500 videos
- universcience-tv_fr_all ???? (channel is down ?)
- crashcourse_en_all 1500
- madrasa 6100
- mali-pour-les-nuls 8
- tedmed_en_all 791
- wikistage_mul_all ???? (channel is down ?)
- litterature-audiobooks-poetry_fr 671
- los_miserables_audiobook ???? (channel is down ?)
- 2021_ted_countdown_global 49
- teded_en_all 2100
- ubongo_sw 573

@benoit74
Copy link
Collaborator Author

@RavanJAltaie @Popolechien: you are the two persons who will be assigned the created issues, correct?

@Popolechien
Copy link
Contributor

Popolechien commented Oct 27, 2023 via email

@kelson42
Copy link
Contributor

kelson42 commented Oct 28, 2023

The content team would like to have monthly overview of which Youtube API keys are used by which recipes, to check everything is in order.

I would prefer to have a description of what is exactly the problem the content team faces, instead of running to the proposal they have in their mind. This is the only way we can have an open discussion about what would be the best solution to tackle the problem.

AFAIK, the problem is that we keep having Youtube tasks failing because the scraper runs again API keys quota limitations. The diagnostic is not easy because we have many API keys and this is not always the same recipe failing (depending the scheduling). I was pretty sure I had open a ticket about this years ago, but I can not find it anymore.

Anyway, IMO, this is not a problem for a human, this is a problem to solve for a computer. API keys should not be assigned to the recipe, but -dynamically- to the task or even to the scraper at runtime.

@kelson42
Copy link
Contributor

I was pretty sure I had open a ticket about this years ago, but I can not find it anymore.

This was #558, unfortunately I don't have written the rationals. My bad.

@rgaudin
Copy link
Member

rgaudin commented Oct 28, 2023

We've discussed this already ; we think it's less work to use ytdlp than implementing this properly

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants