Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jupyterhub: update recommended paths for public share folders #481

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

mishaschwartz
Copy link
Collaborator

Overview

The recommended public share folders in the env.local.example file create a conflict with the default PUBLIC_WORKSPACE_WPS_OUTPUTS_SUBDIR path when both are enabled and mounted on a Jupyterlab container. This change updates the recommended paths for the public share folders to avoid this conflict and adds a warning helping users to avoid this conflict.

Note: the conflict arises when PUBLIC_WORKSPACE_WPS_OUTPUTS_SUBDIR is mounted to a container as read-only volume and then Jupyterhub tries to mount the public share folder within that volume. Since the parent volume is read-only, the second volume mount fails.

Changes

Non-breaking changes
None, documentation only

Breaking changes
None

Related Issue / Discussion

Additional Information

Links to other issues or sources.

CI Operations

birdhouse_daccs_configs_branch: master
birdhouse_skip_ci: false

Copy link
Collaborator

@fmigneault fmigneault left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure exactly what is causing the conflict or how to work around it (been a while I didn't tackle this complicated volume mount problem), but the PUBLIC_WORKSPACE_WPS_OUTPUTS_SUBDIR is purposely set to public to align with Weaver's public output location, which both are configured together, and if not, uses the same public default. Whether set or not, both location should change and still resolve the same paths.

The idea is that inside Jupyter, you could access the publicly shared files (ie: the ones dumped in .../wpsoutputs/public), allowing fast access to data rather than going through the URL and loop-back. However, those are read-only given their public share to avoid everyone editing them.

The public location is automatically picked if running a Weaver process (that is allowed publicly) while not being logged in, or explicitly requesting to make it public (with X-Wps-Output-Context header). Otherwise, it places the wps/job outputs under the nested dir by user-id, which should be mounted as user-workspaces by Cowbird in Jupyter.

So, unless I misunderstand the conflict here, setting explicitly different paths is not really a fix? It avoids the feature entirely.

@github-actions github-actions bot added the documentation Improvements or additions to documentation label Nov 26, 2024
@mishaschwartz
Copy link
Collaborator Author

@fmigneault this doesn't affect the directories created by cowbird and used by weaver and the WPS birds.

This is a recommended change to the mechanism that jupyterhub uses to share files between users.

@fmigneault
Copy link
Collaborator

@mishaschwartz
I'm not sure to follow.

The following configurations mount the volume in cowbird purposely so it monitors the creation/removal/update of jupyter directories within user-workspaces/public-share, and applies the corresponding Magpie permissions to the relevant services matching the cowbird mappings such that those same files have (from the point of view of the user), the same set of permissions/access-level via the other interfaces (THREDDS, WPS-Outputs, etc.).

export PUBLIC_WORKSPACE_WPS_OUTPUTS_SUBDIR=public
export COWBIRD_PUBLIC_WORKSPACE_WPS_OUTPUTS_SUBDIR='${PUBLIC_WORKSPACE_WPS_OUTPUTS_SUBDIR}/wps_outputs'

c.DockerSpawner.volumes[join(os.environ['WORKSPACE_DIR'], os.environ['PUBLIC_WORKSPACE_WPS_OUTPUTS_SUBDIR'])] = {
"bind": join(notebook_dir, os.environ['PUBLIC_WORKSPACE_WPS_OUTPUTS_SUBDIR']),
"mode": "ro"
}

# NOTE:
# This value should correspond to the directory defined for public data on WPS outputs.
# The output hierarchy would be:
#
# ${WEAVER_WPS_OUTPUTS_DIR}/
# users/
# <user-id>/
# <job-id>/
# ${PUBLIC_WORKSPACE_WPS_OUTPUTS_SUBDIR}/
# <job-id>/
#
PUBLIC_WORKSPACE_WPS_OUTPUTS_SUBDIR = "${PUBLIC_WORKSPACE_WPS_OUTPUTS_SUBDIR}" or "public"
PUBLIC_USER_CONTEXTS = {
"public", # shortcut notation to avoid instance-specific configuration
PUBLIC_WORKSPACE_WPS_OUTPUTS_SUBDIR,
}

I don't see which other purpose that dir could have.

@mishaschwartz
Copy link
Collaborator Author

mishaschwartz commented Nov 26, 2024

@fmigneault

We're talking about this code, not the cowbird paths. This has nothing to do with wps outputs or cowbird managed file permissions:

# Sample below will allow for sharing notebooks between Jupyter users.
# Note all shares are public.
#
### public-read paths
#
## /data/jupyterhub_user_data/public-share/
#public_read_on_disk = join(jupyterhub_data_dir, 'public-share')
#
## /notebook_dir/public-share/
#public_read_in_container = join(notebook_dir, 'public-share')
#
#c.DockerSpawner.volumes[public_read_on_disk] = {
# 'bind': public_read_in_container,
# 'mode': 'ro',
#}
#
### public-share paths
#
## /data/jupyterhub_user_data/public-share/{username}-public
#public_share_on_disk = join(public_read_on_disk, '{username}-public')
#
## /notebook_dir/mypublic-share
#public_share_in_container = join(notebook_dir, 'mypublic-share')
#
#c.DockerSpawner.volumes[public_share_on_disk] = {
# 'bind': public_share_in_container,
# 'mode': 'rw',
#}
#
## If enabling the public-share paths above, make sure that the paths in the container don't conflict
## with the PUBLIC_WORKSPACE_WPS_OUTPUTS_SUBDIR path.
#
### create dir with proper permissions
#
#def custom_create_dir_hook(spawner):
# username = spawner.user.name
#
# perso_public_share_dir = public_share_on_disk.format(username=username)
#
# for dir_to_create in [public_read_on_disk, perso_public_share_dir]:
# if not os.path.exists(dir_to_create):
# os.mkdir(dir_to_create, 0o755)
#
# subprocess.call(['chown', '-R', '1000:1000', public_read_on_disk])
#
# # call original create_dir_hook() function
# create_dir_hook(spawner)
#
#c.Spawner.pre_spawn_hook = custom_create_dir_hook
#"

Cowbird should not interact with these paths and should not change permissions. If a user with the username "X" puts a file in their mypublic-share folder it become publicly available to everyone at the public-share/X/ path.
Cowbird doesn't need to touch this.

@mishaschwartz
Copy link
Collaborator Author

To clarify further... the feature that is enabled by the jupyterhub code snippet in env.local.example predates the cowbird feature you describe here #392 (comment)

It was never intended to be used with the cowbird feature. It is a workaround that allows sharing files between users that should probably be deprecated as soon as we have a better alternative. But for now, if you want to enable this feature and cowbird, you need to make sure their mount points do not overlap or you will encounter an error as reported in #392

@fmigneault
Copy link
Collaborator

What I mean is that the paths were purposely selected to match those "conveniently", so Cowbird can be enabled as a drop-in replacement of the basic sharing feature, and add the advanced permission sync capabilities.

I agree, they are not used together. It is one or the other. But the idea is that you don't need to completely refactor the entire data/workspace directory hierarchy when switching feature.

If both directories are maintained simultaneously with different paths, it just creates 2 distinct hierarchies that have different capabilities (which users are not fully aware of the implications), and that just causes confusion.

So what I'm saying is that changing the path is not a "proper fix" for #392. It's a workaround (which might be valid), but I think we must further resolve the switch to Cowbird capabilities to make it work as intended instead. Using the "deprecated" approach should not be recommended. The message should be clear about that (they should never be enabled simultaneously).

env.local.example file create a conflict with the default PUBLIC_WORKSPACE_WPS_OUTPUTS_SUBDIR path when both are enabled and mounted on a Jupyterlab container.

In a way, the fact that errors are raised by conflicting permissions is a good indication that something is misconfigured, outside the intended approach.

@mishaschwartz
Copy link
Collaborator Author

I think we must further resolve the switch to Cowbird capabilities to make it work as intended instead

Can you explain this more please. What additional changes need to be added to cowbird to "make it work as intended"?

Using the "deprecated" approach should not be recommended
they should never be enabled simultaneously

Sure, I agree with this. I'm happy to just delete the entire section from env.local.example.

@fmigneault
Copy link
Collaborator

Cowbird is enabled by default, and its job is to manage cross-service access to corresponding resources.
Therefore, if a file is shared publicly by users via their Jupyter session, it should be handled by Cowbird/Magpie handshakes like for every other service involving access managing. There is really no reason to have another way of sharing files indirectly with another directory.

Cowbird's directory monitoring is supposed to work with Jupyter sessions. So, unless something is broken in that feature, I don't see a reason why users should care about the name of that directory duplicating the feature. No conflict if there's no duplicate.

@mishaschwartz
Copy link
Collaborator Author

So to clarify, the user experience for publicly sharing a file in cowbird is exactly the same:

  • a user named "X" puts a file in their mypublic folder
  • that file turns up in everyone's public/X-public folder

@tlvu
Copy link
Collaborator

tlvu commented Nov 26, 2024

FYI, I have not had time to look at this yet. Was sick yesterday and recovering today.

Just quickly the existing poor-man sharing is what we have been using since Cowbird did not exist before and it served us very well.

No problem to update the default so both can play nice together but I'd rather keep it than removed it since it is 100% reliable so far and is what our current users are used to.

I'd rather not have support calls from our rather large user base because of behavior change, so that's why I'd like to keep the existing poor-man sharing. It is not activated by default anyways so no problem there.

So to clarify, the user experience for publicly sharing a file in cowbird is exactly the same:

* a user named "X" puts a file in their `mypublic` folder

* that file turns up in everyone's `public/X-public` folder

This is very interesting !!! Is this compatible with the existing data from the poor-man sharing?

Meaning before Cowbird, use the poor-man sharing is activated. With Cowbird enabled, poor-man sharing disabled, and Cowbird taking over the existing poor-man sharing file path, can Cowbird show the old data from poor-man sharing? Meaning the switch is completely seemless to the user?

@fmigneault
Copy link
Collaborator

Yes. That is the intent. But you're not limited to "public".

You can have a "users/user-x" hierarchy that is deny-access by default (except for "user-x"), and have "user-y" granted access to the user-workspace of "user-x" (eg: colleagues working together). So, they end up with a shared space, but it's not shared to everyone (public) on the platform.

And because Cowbird works by events with Magpie, you could have very fancy webhook configurations based on any actions on the services. The user-workspace uses a watchdog to trigger on-create/delete/updated of the file-system. Removing a permission on Magpie side will take effect as a reaction to remove the link that virtually gives access to the file via Jupyter user-workspaces (and vice versa).

I'm fine with keeping the current alternative so we have a fallback until ready to change. Just have to keep in mind that using different paths un purpose will require messing with the vars and actual file hierarchy when doing the switch. However, that allows reverting without having to chown/chmod all the files (the "conflict" mentioned by the PR).

@mishaschwartz
Copy link
Collaborator Author

mishaschwartz commented Nov 27, 2024

Yes. That is the intent.

Ok great. But is this actually implemented in any way at the moment or does this system of webhooks need to be created?

I'm asking because I'm having a really hard time finding this in any of the documentation:

and I'm not able to get this functionality working on a test instance either.

@fmigneault
Copy link
Collaborator

The permission-sync mappings of webhooks and the service "handlers" being monitored by Cowbird are defined here:
https://github.com/bird-house/birdhouse-deploy/blob/master/birdhouse/components/cowbird/config/cowbird/config.yml.template

The FS monitor function are (mostly) defined here:
https://github.com/Ouranosinc/cowbird/blob/master/cowbird/handlers/impl/filesystem.py

Magpie webhooks reception / trigger of user/group/permission modifications are done here:
https://github.com/Ouranosinc/cowbird/blob/master/cowbird/handlers/impl/magpie.py

What each handler does and how it interacts with the others is very custom-made for the platform, so you have to dig in the code to understand what actually happens.

@mishaschwartz
Copy link
Collaborator Author

Just so that we're on the same page:

  • I have read through most of the cowbird source code over the last few years
  • Despite that, I am not an expert on the cowbird source code
  • Linking to the cowbird source code is not helping me discover the answer to my question

Please...

With cowbird as it is currently set up in birdhouse-deploy can users share files?

  • yes
  • no
  • I don't know

If the answer is yes ... what actions do users need to take to share files?
If the answer is no ... what changes do we need to make to the cowbird or birdhouse-deploy code in order to allow them to share files?

@fmigneault
Copy link
Collaborator

[x] I don't know

It was in the process of being implemented, but our development funding ended about at the same time. So, I do not have a complete overview of the current state and latest code/configs applied.

I believe all the triggers and configs for the main user-workspace directory of each user in jupyter is configured, such that fs-monitor events will sync permissions of the files in the directories with the other services as applicable. The nested "user/user-x" sharing might not be fully completed.

Sharing between specific users might not be configured yet.
@ChaamC Was working on it.

@mishaschwartz
Copy link
Collaborator Author

Ok thank you

Let's see if @ChaamC can help us out with what has or hasn't been implemented and we'll continue the discussion from there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

🐛 [BUG]: jupyterlab server fails to spawn due to read-only volume mount
3 participants