Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Collecting netlab usage data #1481

Open
ipspace opened this issue Nov 3, 2024 · 10 comments
Open

Collecting netlab usage data #1481

ipspace opened this issue Nov 3, 2024 · 10 comments
Labels
enhancement New feature or request

Comments

@ipspace
Copy link
Owner

ipspace commented Nov 3, 2024

It would be great to know how people use netlab; currently, we can only guess as we get little feedback and zero hard data.

The proposal to implement the usage data collection and eventual upload is in docs/roadmaps/usage.md. Feedback or PRs against that file are most welcome.

@ssasso
Copy link
Collaborator

ssasso commented Nov 3, 2024

Some very draft ideas:

Receiving and storing data can be achieved using cloudflare workers+kv or D1 storage, or with AWS Lambda+DynamoDB (plus putting some limits on it).

To demonstrate that we do not collect sensible data we could also show the collected data and some reporting?

(Edit: if we need more resources for collecting and storing data we could apply for this? https://blog.cloudflare.com/expanding-our-support-for-oss-projects-with-project-alexandria )

@DanPartelly
Copy link
Collaborator

DanPartelly commented Nov 3, 2024

Ivan's proposed collection mechanism is in plain-text yml dictionary , so any user can actually see the data collected, and the upload is user triggered, so I guess this covers the issue.

I personally would be interested with what host OSes Netlab is used as well.

To demonstrate that we do not collect sensible data we could also show the collected data and some reporting?

@ipspace
Copy link
Owner Author

ipspace commented Nov 3, 2024

I personally would be interested with what host OSes Netlab is used as well.

Agree. Would you please document how we could collect that in a way that would work on most Linux distros while still providing reasonably easy-to-interpret results?

For example, uname -a produces a printout that someone might be able to deduce Ubuntu release from, but it's way beyond my capabilities. Anyway, according to this https://gist.github.com/natefoo/814c5bf936922dad97ff, the whole thing is a bit of a mess

@ipspace
Copy link
Owner Author

ipspace commented Nov 3, 2024

Receiving and storing data can be achieved using cloudflare workers+kv or D1 storage, or with AWS Lambda+DynamoDB (plus putting some limits on it).

These days I would definitely go with CF workers + KV/D1/R2

To demonstrate that we do not collect sensible data we could also show the collected data and some reporting?

"The user could inspect the usage data with netlab usage show" ;) https://github.com/ipspace/netlab/blob/dev/docs/roadmap/usage.md?plain=1#L19

@ssasso
Copy link
Collaborator

ssasso commented Nov 3, 2024

Ok for the inspection of collected data, but seeing some "reporting stats" could be interesting imho

@jbemmel
Copy link
Collaborator

jbemmel commented Nov 3, 2024

Ok for the inspection of collected data, but seeing some "reporting stats" could be interesting imho

That's why I was thinking a GitHub repo might be a nice option - it puts the (anonymized) reported stats in a public place that people can go look at if they want to - not hidden in some backend database

@DanPartelly
Copy link
Collaborator

I like this, conceptually. There is nothing like letting the user watch the data.

That's why I was thinking a GitHub repo might be a nice option - it puts the (anonymized) reported stats in a public place that people can go look at if they want to - not hidden in some backend database

@DanPartelly
Copy link
Collaborator

Sure, Ill look into it, and yes, you are right, this can be a can of worms. I had to fight it recently with cmake , their linux detection sucks so I had to overwrite the variables.

Agree. Would you please document how we could collect that in a way that would work on most Linux distros while still providing reasonably easy-to-interpret results?

@jbemmel
Copy link
Collaborator

jbemmel commented Nov 3, 2024

I like this, conceptually. There is nothing like letting the user watch the data.

That's why I was thinking a GitHub repo might be a nice option - it puts the (anonymized) reported stats in a public place that people can go look at if they want to - not hidden in some backend database

Maybe we could even talk to GitHub and make this into an officially supported feature. Usage data for open source projects voluntarily provided by GitHub users would be a great addition - I think many projects would use that

@DanPartelly
Copy link
Collaborator

Perhaps the best way to determine the OS name without descending into madness is to use a systemd component, hostnamectl. It will return the correct distro name in its output. It will of course only work on systems using systemd but in 2024 all mainstream distros use it. Where it will fail are musl lib C based distros, which still use alternate init systems by necessity (Alpine, Void Linux, Chimera) and specialty distributions (embeded ... whatever).

Agree. Would you please document how we could collect that in a way that would work on most Linux distros while still providing reasonably easy-to-interpret results?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants