Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[NEW] Add scripting languages (for EVAL, etc.) using module API #1261

Open
zuiderkwast opened this issue Nov 4, 2024 · 18 comments
Open

[NEW] Add scripting languages (for EVAL, etc.) using module API #1261

zuiderkwast opened this issue Nov 4, 2024 · 18 comments

Comments

@zuiderkwast
Copy link
Contributor

The problem/use-case that the feature addresses

  1. Allow adding more scripting languages
  2. Allow replacing the vendored Lua implementation with another one (e.g. LuaJIT or Lua 5.4)

Description of the feature

Looking at function.c, there is work started that should allows different "engines" for functions (FUNCTION CREATE, FCALL, etc.). For example, there is a function to register an engine. Currenly, Lua is the only engine, implemented in functions_lua.c. There is some separation here.

Extend the existing modularity to the module API: ValkeyModule_RegisterScriptingEngine or similar.

The module registers a callback that is invoked for executing code in commands like the EVAL, EVALSHA and FCALL.

In the beginning of an EVAL script, users can add a shebang, a line like #!lua and some optional flags or parameters, to select the scripting engine. This mechanism already exists, but currently, only "lua" exists. A module should be able to provide their own languages.

To add Lua engine in parallel to the built-in Lua implementation, the module can register with a different name like "lua5.4", "luajit". For a module to be able to replace the default "lua" engine, the built-in Lua support needs to be disabled. For that, see #1204.

Alternatives you've considered

...

Additional information

Related discussions:

@hpatro
Copy link
Contributor

hpatro commented Nov 4, 2024

This will be a good addition to Valkey to provide the underlying abstraction to support new engines easily.

Few questions which comes to my mind and some discussed in the weekly meeting

  1. Do we support multiple engines at a given point in time?
  2. Do we plan to host other first party engines in the near future like V8 engine? KeyDB in the past created this: https://github.com/Snapchat/ModJS
  3. If yes, do we route them based on the shebang (do we fail the script without shebang ?)
  4. or introduce new command/sub-command for each engine ?
  5. We would need certain API to describe the supported engine and it's version.

@neomantra
Copy link
Contributor

neomantra commented Nov 4, 2024

Allow replacing the vendored Lua implementation with another one (e.g. LuaJIT or Lua 5.4)

One cannot easily create parallel Lua engines. This is due to symbol collisions with the statically linked Lua, discussed here and here. Disabling the Lua support makes this straight-forward at the expense of losing the built-in Lua. Otherwise, module implementors must carefully move their Lua symbols to not collide (I never tried this). But then they can't use system-installed Lua libraries in their modules (maybe that's fine).

At least once it is figured out for one Lua module, it will be figured out for all. Then it's a documentation issue =). Valkey could rename its Lua symbols since it is building and statically linking Lua from source. I'm not sure what that might break elsewhere.

I hadn't tried this since 2016, but since I just upgraded the valkey-mod_luajit, I just loaded it into a Lua-enabled Valkey and it certainly segfaults by using the wrong symbols.

@zuiderkwast
Copy link
Contributor Author

@hpatro

  1. Do we support multiple engines at a given point in time?

Yes.

  1. Do we plan to host other first party engines in the near future like V8 engine? KeyDB in the past created this: https://github.com/Snapchat/ModJS

What do you mean by host? Officially support or vendor? It's not impossible. I have no answer.

  1. If yes, do we route them based on the shebang (do we fail the script without shebang ?)

ModJS adds a new command EVALJS. Modules can always add their own commands, but the idea here is to provide an API for modules to hook in to EVAL and FUNCTIONs. We can extend this to triggers or events of some sort.

Scripts without a shebang are Lua scripts for backward compatibility, at least by default. We could add a config to change the default engine though, but I imagine that all other languages will use a shebang.

  1. or introduce new command/sub-command for each engine ?

They should be able to take advantage of the framework provided by EVAL + EVALSHA + SCRIPT LOAD, FUNCTION CREATE + FCALL, etc.

There's a difference between scripts and functions. Scripts are part of an application and are written by the application developers while functions are assumed to be installed by a database admin. The caller of the function doesn't need to know which language the function was written in.

  1. We would need certain API to describe the supported engine and it's version.

Yes. Do you have a suggestion? INFO? A new subcommand of FUNCTION or SCRIPT?

@zuiderkwast
Copy link
Contributor Author

One cannot easily create parallel Lua engines. This is due to symbol collisions with the statically linked Lua, discussed here and here.

@neomantra We discussed this, but we were not sure why. Dynamically linked symbols don't collide with statically linked symbols, do they? Does Lua itself use dynamic linking for its modules?

Worst case, we can only have one Lua at a time. 😢

@hpatro
Copy link
Contributor

hpatro commented Nov 4, 2024

  1. Do we plan to host other first party engines in the near future like V8 engine? KeyDB in the past created this: https://github.com/Snapchat/ModJS

What do you mean by host? Officially support or vendor? It's not impossible. I have no answer.

Yeah, I was trying to see what's Valkey's stance on supporting other scripting engines.

@neomantra
Copy link
Contributor

Dynamically linked symbols don't collide with statically linked symbols, do they? Does Lua itself use dynamic linking for its modules?

Yes, since they use the same exact symbol names and the linker can't disambiguate. LuaJIT is intended to be a drop-in replacement for Lua 5.1. A Lua 5.4 load would by be similar. There are common names like lua_State and all the API functions.

I had a bit of a conversation with Claude (for this chat, "better" IMO than ChatGPT) on how to get around it and it is tricky and platform-specific -- don't have a sharing account but wasn't something I could do in 2016. Not sure if it is gauche to share prompts:

  • can I use statically linked PUC Lua 5.1 and dynamically linked LuaJIT in the same process?
  • can you tell me more about the symbol space collision?
  • what about a dynamically linked PUC Lua with a dynamically linked LuaJIT?
  • can I hide luajit inside the shared library?
  • but would my shared library know to use the luajit symbols instead of the static symbols?

I did test this on ARM/OSX versus x64/Linux and got segfaults on both.

I realized I quoted the wrong section earlier and my reply was meant to suggest you shouldn't make this a goal:

To add Lua engine in parallel to the built-in Lua implementation, the module can register with a different name like "lua5.4", "luajit".

@madolson madolson moved this to Todo in Valkey 8.1 Nov 5, 2024
@rjd15372
Copy link
Contributor

rjd15372 commented Nov 6, 2024

@zuiderkwast I would like to implement the WASM engine using this approach. I can include the module API changes as part of the work I'm doing with WASM.

@zuiderkwast
Copy link
Contributor Author

@rjd15372 sounds great, but I'd prefer a separate PR for only the module API and a dummy engine module for testing that just returns the script code back or something.

@rjd15372
Copy link
Contributor

rjd15372 commented Nov 6, 2024

@zuiderkwast sure, I wasn't implying that all work would be in a single PR. I was thinking in the same lines as you.

@rjd15372
Copy link
Contributor

rjd15372 commented Nov 8, 2024

@zuiderkwast @madolson I opened a PR #1277 with the changes to the module API.

@PingXie
Copy link
Member

PingXie commented Nov 19, 2024

I am generally aligned with the proposal of extending scripting language support via modules. I think it strikes a good balance between extensibility and complexity.

@PingXie
Copy link
Member

PingXie commented Nov 19, 2024

Great questions, @hpatro!

  1. Do we support multiple engines at a given point in time?

Yes but there should be one "inbox" engine - the current Lua one. All others will come in via the modules

  1. Do we plan to host other first party engines in the near future like V8 engine? KeyDB in the past created this: https://github.com/Snapchat/ModJS

If by "first party" you meant "inbox", I think there should be one and only, i.e., the current Lua engine. Others will be shipped out of band via modules.

  1. If yes, do we route them based on the shebang (do we fail the script without shebang ?)

Make sense.

  1. or introduce new command/sub-command for each engine ?

This would be bad coupling.

  1. We would need certain API to describe the supported engine and it's version.

Can we encode the version in the shebang as well? #!<module>-<version>?

@PingXie
Copy link
Member

PingXie commented Nov 19, 2024

There's a difference between scripts and functions. Scripts are part of an application and are written by the application developers while functions are assumed to be installed by a database admin. The caller of the function doesn't need to know which language the function was written in.

@zuiderkwast what is your thought on supporting other scripting languages in FUNCTION?

@PingXie
Copy link
Member

PingXie commented Nov 19, 2024

btw, we should capture the details in an RFC once we wrap up the discussion, I think.

@zuiderkwast
Copy link
Contributor Author

@zuiderkwast what is your thought on supporting other scripting languages in FUNCTION?

My thought is that it's a good idea and already implemented in #1277. But EVAL is very easy to use and preparations for that are started in #1312. I'd like a scripting engine module to provide FUNCTION and EVAL. It doesn't seem that hard to achieve both.

@zuiderkwast
Copy link
Contributor Author

The main languages/engines I can guess being used are WASM and JavaScript (e.g. V8), because are very well sandboxed by design, while Lua isn't. I can see other Lua versions provided by modules too, like LuaJIT, Lua 5.4 and Luau. For the Lua variants, it would be nice to allow a module to be the default engine, if built-in Lua is disabled, so many applications written for regular Lua can benefit without modification.

@hwware
Copy link
Member

hwware commented Nov 19, 2024

I am not object to support more script languages, but all of them should be via the module part, even for the WASM in the future. We had better only keep Lua in core part, but we could give an option for user and developer to enable or disable it.

@zuiderkwast
Copy link
Contributor Author

Yes, module API for all new languages. We already agreed about this in the meeting some weeks ago.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Todo
Development

No branches or pull requests

6 participants