Navigation
- README
- Pure-Python feature set
- Syntactic macro feature set
- Examples of creating dialects using
mcpyrate
- REPL server
- Troubleshooting
- Design notes
- Essays
- Additional reading
- Contribution guidelines
Table of Contents
Hot-patch a running Python process! With syntactic macros in the REPL!
Inspired by Swank in Common Lisp. Need to fix a space probe in flight? Hot-patching has you covered.
To easily start an instance to play around with:
python3 -m unpythonic.net.server
It will bind to 127.0.0.1
on ports 1337
(main channel) and 8128
(control channel). When you want to shut it down, press Ctrl+C
in the server's terminal window.
Multiple clients may be connected simultaneously to the same server. Each client gets an independent REPL session. The top-level namespace is shared between all sessions.
The actual REPL console you get when you connect to the server depends on what you have installed in the environment where the server is running. The following will be tried in order. The first one that imports successfully wins:
- If you have
mcpyrate
installed, you will getmcpyrate.repl.console.MacroConsole
. (Recommended.) - Otherwise you will get
code.InteractiveConsole
, and macro support will not be enabled.
In a REPL session:
-
print()
works as usual. Stdout is redirected to the client only in the REPL session's main thread.- If you must, look at the value of
sys.stdout
(et al.) in the REPL session's main thread. After the REPL server has been started, it's actually anunpythonic.Shim
that holds the underlying stream in anunpythonic.ThreadLocalBox
, so you can get the stream from there if you really need to. For any thread that hasn't sent a value into that box, the box will return the default, which is the original stdin/stdout/stderr of the server process.
- If you must, look at the value of
-
server_print()
prints to the original stdout of the server process. -
To view docstrings, use
doc(obj)
instead ofhelp(obj)
.- Do not use
help(obj)
in the REPL. Trying to do that hangs the client, becausehelp
's stdin/stdout are not redirected properly. doc(obj)
just prints the docstring without paging, while emulatinghelp
's dedenting. It's not a perfect solution, but should work well enough to view docstrings of live objects in a live Python process.- If you want to look at docstrings for definitions currently on disk instead, just use a regular IPython session or similar.
doc(obj)
also works whenobj
is a syntactic macro.- For example,
from unpythonic.syntax import macros, let
, and thendoc(let)
.
- For example,
- Do not use
-
For very rudimentary job control (spawn background task, retrieve result later, possibly in another session), see
bg
andfg
.
Technical:
-
IPv4 only for now. IPv6 would be nice, but something for a later release.
-
Tested only on Linux (with CPython 3.6 and PyPy3).
- At least the PTY stuff on the server side is *nix-specific.
- Also, I make no guarantees that
select.select
is not called on an fd that is not a socket. - Probably possible to make this work in Windows, but I don't need that. PRs are welcome, though.
python3 -m unpythonic.net.client 127.0.0.1
This connects to the REPL server running on localhost
, and opens a REPL session, where:
-
Line editing (GNU
readline
) is available, with history, and remote tab completion.- When you use tab completion, the client transparently queries for completions from the server.
- History is currently not saved; it is only available back to the start of the session. For the intended use case this is probably enough.
- If you need to preserve commands across sessions, maybe store them in a file?
- If you need to refresh large sections of code, consider hot-patching with the help of
importlib.reload
, instead of pasting new definitions directly into the REPL. It's possible toimportlib.reload(somemodule)
from the REPL.
-
Pressing
Ctrl+D
at the prompt politely asks to disconnect. If the server fails to respond for whatever reason, following that withCtrl+C
forces a client-side disconnect. -
At any other time, pressing
Ctrl+C
in a REPL session sends aKeyboardInterrupt
to the remote.- This works by injecting a
KeyboardInterrupt
asynchronous exception into the thread running that particular session. Any other threads in the process running the server are unaffected. - That feature is actually documented in the CPython C API docs, and is part of the public API. But it's a bit hard to find, and was never intended to be called from Python code, without writing a custom C extension. It just happens that
ctypes.pythonapi
makes that possible. Thanks go to LIU Wei and Federico Ficarelli for the original detective work. - Due to technical reasons, remote Ctrl+C currently only works on CPython. Support for PyPy3 would be nice, but currently not possible. See
unpythonic.misc.async_raise
and #58 for details. - Be sure to press
Ctrl+C
just once. Hammering the key combo may raise aKeyboardInterrupt
locally in the code that is trying to send the remoteKeyboardInterrupt
(or in code waiting for the server's response), thus forcibly terminating the client. Starting immediately after the server has responded, remote Ctrl+C is available again. (The server indicates this by sending the textKeyboardInterrupt
, possibly with a stack trace, and then giving a new prompt, just like a standard interactive Python session does.)
- This works by injecting a
If you don't need tab completion or Ctrl+C
support, the main channel is also netcat
compatible. Use rlwrap
to get basic readline functionality (history):
rlwrap netcat localhost 1337
The really barebones way, no history:
netcat localhost 1337
from unpythonic.net import server
server.start(locals={})
That's all.
-
The REPL server is strictly opt-in; it must be imported and started explicitly. There's no way to turn it on in a running process that didn't opt in when it started.
-
The
locals
parameter ofstart
specifies the top-level namespace of REPL sessions served by the server.- If this is one of your modules'
globals()
, you can directly write to that namespace in the REPL simply by assigning to variables. E.g.x = 42
will actually domymod.x = 42
, wheremymod
is the module that calledserver.start(locals=globals())
. - If you want a namespace that's only accessible from (and shared by) REPL sessions, use an empty dictionary:
server.start(locals={})
. - For write access to module-level globals in other modules, access them as module attributes, like in Manhole. For example,
sys.modules['myothermod'].x
.
- If this is one of your modules'
-
There's no need to
server.stop()
manually; this is automatically registered as anatexit
handler. -
The server runs in the background, in a daemon thread. By design, it doesn't hijack any existing thread.
- This does mean that if you shut down your app in any way (or if your app crashes), the server will also shut down immediately. This will forcibly disconnect clients, if any remain.
- We use threads instead of an asyncio model to let you choose whatever async framework you want for the rest of your app. This is important especially because as of 2020, the game-changer Trio is a thing, but not yet the de facto standard. Not many simultaneous sessions are needed, so the OS can easily spare the resources to run a classical threaded server.
A general-purpose REPL server, by its very nature, essentially is an opt-in back door. While the intended use is to allow you to hot-patch your running app, the server gives access to anything that can be imported, including os
and sys
. It is trivial to use it as a shell that just happens to use Python as the command language, or to obtain traditional shell access (e.g. bash
) via it.
This particular REPL server has no authentication support whatsoever. The server accepts any incoming connection, with no way to audit who it is. There is no encryption for network traffic, either.
Note this is exactly the same level of security (i.e. none whatsoever) as provided by the Python REPL itself. If you have access to python
, you have access to the system, with the privileges the python
process itself runs under.
Therefore, to remain secure:
- Only bind the server to the loopback interface. This is the default. This ensures connections only come from users who can log in to the machine running your app. (Physical local access or an SSH session are both fine.)
- Only enable the server if you trust any logged in user to allow them REPL access under your credentials. The two most common scenarios are:
- The app runs on your local machine, where you are the only user, or which otherwise has no untrusted human users.
- The app runs on a dedicated virtual server, which runs only your app.
In both cases, access control, auditing and encrypted connections (SSH) can then provided by the OS itself.
As usual, the legends are exaggerated.
Making full use of hot-patching requires foresight, adhering to a particular programming style. Some elements of this style may be considered antipatterns in programs that are not designed for hot-patching. An example is to save important data in global variables, so that it can later be accessed from the REPL, instead of keeping as much as possible in the locals of a function. Hot-patching and FP-ness are opposing goals. For hot-patching (and its cousin, live programming), local state is poison.
-
It is possible to reload arbitrary modules.
- Just use
importlib.reload
. - But if someone has from-imported anything from the module you're reloading, tough luck. The from-import will refer to the old version of the imported object, unless you reload the module that performed the from-import, too. Good luck catching all of them.
- Just use
-
You can access only things which you can refer to through the top-level namespace.
- Keep in mind you can access the top-level namespace of any module via
sys.modules
. - If the logic you need to hot-patch happens to be inside a closure, tough luck. The only way is then to replace the thing that produces the closures (if that happens to live at the top level), and re-instantiate the closure.
- So think ahead, and store the things you need to be able to access in a container accessible from the top-level namespace.
- Keep in mind you can access the top-level namespace of any module via
-
You may need to
box
a lot more stuff than usual.- Especially things that you provide for other modules to from-import from yours.
- A module should export a
box
containing the useful thing, not directly the thing itself, since the thing may get replaced later. When a caller indirects via the box, they always get the latest version of the thing.
-
It is impossible to patch a running loop.
- Unless it's an FP loop with the body defined as a function at the top level. In this case it's possible to rebind the function name that refers to the loop body.
- In
unpythonic
, this setup is possible using@trampolined
(but not@looped
, because@looped
overwrites the def'd name with the loop's return value). Define the loop body as a@trampolined
top-level function, and start the loop by calling this function from wherever you want. Python's dynamic name lookup will then ensure that during each iteration, the latest definition is always used.
-
Even if you replace a class definition, any existing instances will still use the old definition.
- Though you could retroactively change its
__class__
. Automating that is exactly what ActiveState recipe 160164 is for.
- Though you could retroactively change its
-
There is no way to save a "Python image".
- Python wasn't really designed, as a language, for the style of development where an image is kept running for years and hot-patched as necessary.
- We don't have anything like SBCL's
save-lisp-and-die
, or indeed the difference betweendefvar
(initialize only if it does not already exist in the running process) anddefparameter
(always initialize) (for details, see Chapter 6 in Peter Seibel's Practical Common Lisp). - So what we have is not "image based programming" as in Common Lisp. If you need to restart the process, it needs to cold-boot in the usual manner.
- Therefore, in Python, never just hot-patch; always change your definitions on disk, so your program will run with the new definitions also the next time it's cold-booted.
- Once you're done testing, then reload those definitions in the live process, if you need/want to.
- The next best thing: if you want semi-transparently persist objects between program runs, look into ZODB.
Happy live hacking!
To make the objects in your hot-patchable program persistent, see ZODB [1] [2] [3]. It originally comes from Zope, which as of early 2020 is dead, but ZODB itself is very much alive.
ZODB can semi-transparently store and retrieve any Python object that subclasses persistent.Persistent
. (I haven't tried that class as a mixin, though. That could be useful for persisting unpythonic
containers box
, cons
, and frozendict
).
Here persistent means the data lives on disk. This is not to be confused with the other sense of persistent data structures, as in immutable ones, as in pyrsistent.
Persistence using ZODB is only semi-transparent. You have to explicitly assign your to-be-persisted object into the DB instance to track it, and transaction.commit()
to apply pending changes. (Explicit is better than implicit.) Any data stored under the DB root dict (recursively) is saved.
Notes:
-
Stuff that's related to ZODB, but not obvious from the name: the
persistent
andtransaction
packages (in the top-level namespace for packages), and.fs
files (ZODB database store). -
ZODB saves only data, not code. It uses
pickle
under the hood, so functions are always loaded from their definitions on disk.- ZODB is essentially
pickle
on ACID: atomicity, consistency, isolation, durability.
- ZODB is essentially
-
API style concerns:
- Transactions have also a context manager interface; modern style is to use that.
- The DB root exposes both a dict-like interface as well as an attribute-access interface. So
dbroot['x'] = x
anddbroot.x = x
do the same thing; the second way is modern style. The old way is useful mainly when the key is not a valid Python identifier.
-
Think of the DB root as the top-level namespace for persistent storage.
- So it's a bit like our
dyn
, a special place into which you can store attributes, and which plays by its own rules. - Place all of your stuff into a container object, and store only that at the top level, so that ZODB database files for separate applications can be easily merged later if the need arises. Also, the performance of the DB root isn't tuned to store a large number of objects directly at the top level. If you need a scalable container, look into the various
BTrees
in ZODB.
- So it's a bit like our
-
Naming conventions, with semantics:
- Any attributes beginning with
_v_
are volatile, i.e. not saved.- Use this naming scheme to mark your caches and such (when stored in an attribute of a persistent object).
- Volatile attributes may suddenly vanish between any two method invocations, if the object instance is in the saved state, because ZODB may choose to unload saved instances at any time to conserve memory.
- Any attributes beginning with
_p_
are reserved for use by ZODB machinery.- For example, set
x._p_changed = True
to force ZODB to consider an object instancex
as modified. - This can be useful e.g. when
x.foo
is a builtinlist
that was mutated by calling its methods. Otherwise ZODB won't know thatx
has changed when wex.foo.append("bar")
. Another way to signal the change to ZODB is to rebind the attribute,x.foo = x.foo
.
- For example, set
- Any attributes beginning with
-
If your class subclasses
Persistent
, it's not allowed to later change your mind on this (i.e. make it non-persistent), if you want the storage file to remain compatible. See ZODB issue #99. -
Be careful when changing which data attributes your classes have; this is a database schema change and needs to be treated accordingly.
- New data attributes can be added, but if you remove or rename old ones, the code in your class needs to account for that (checking for the presence of the old attribute, migrating it), or you'll need to write a script to migrate your stored objects as an offline batch job.
- The ZODB docs were unclear on the point, and there was nothing else on it on the internet, so by testing it myself: ZODB handles properties correctly.
- The property itself is recognized as a method. Only raw data attributes are stored into the DB.
- After an object instance is loaded from the DB, reading a property will unghost it, just like reading a data attribute.
-
Beware of persisting classes defined in
__main__
, because the module name must remain the same when the data is loaded back. -
The ZODB tutorial says: Non-persistent objects are essentially owned by their containing persistent object and if multiple persistent objects refer to the same non-persistent subobject, they’ll (eventually) get their own copies.
- So beware, anything that should be preserved up to object identity and relationships should be made a persistent object.
- ZODB has persistent list and dict types, if you need them.
Support for syntactic macros, in a REPL connected to a live Python process, is why this feature is included in unpythonic
, instead of just recommending Manhole, socketserverREPL, or similar existing solutions.
Also, the focus is subtly different from most similar projects. This server is primarily intended for hot-patching, not so much for debugging. So we don't care about debugger hooks, or instantly embedding a REPL into a particular local scope (to give the full Python user experience for examining program state, pausing the thread that spawned the REPL). We care about running the REPL server in the background (listening for connections as part of normal operation of your app), and making write access to module globals easy.
A hot-patching REPL server is also useful for agile development of oldschool style computational science scripts that run directly via python3 -m mysolver
(no Jupyter notebook there), because it reduces the burden of planning ahead.
Seeing the first plots from a new study often raises new questions... some of which could be answered by re-plotting the same data (that often took two hours to compute) in alternative ways. Which would be easy if you could get your hands on the NumPy arrays your program just finished computing. But the program doesn't yet have the code to save anything to disk, because the run was supposed to be just for testing. You know that when you close that last figure window, the process will terminate, and all that delicious data will be gone.
If the arrays can be accessed from module scope, an embedded REPL server can still save the day. You just connect to your running process while it's still live, and in the REPL, save whatever you want, before closing that last figure window and letting the process terminate. It's all about having a different kind of conversation with your scientific problem. (Cf. Paul Graham on software development in On Lisp; original quotation.)
SSH with key-based authentication is the primary future direction of interest. It would enable security, making actual remote access feasible.
This may be added in an eventual v2.0 (using Paramiko), but right now it's not on the immediate roadmap. This would allow a client to be sure the server is who it claims to be, as well as letting users log in based on an authorized_keys
file. It would also make it possible to audit who has connected and when.
What we want is to essentially treat our Python REPL as the shell for the SSH session. There are a lot of Paramiko client examples on the internet (oddly, with a focus mainly on security testing), but demo_server.py in the distribution seems to be the only server example, and leaves unclear important issues such as how to set up a session and a shell. Reading paramiko/server.py as well as paramiko/transport.py didn't make me much wiser.
So right now, I'm not going to bother with SSH support. If interested, help is welcome.
Back in 0.14.x, drop-in replacing code.InteractiveConsole
in unpythonic.net.server
with macropy.core.console.MacroConsole
gave rudimentary macro support. However, to have the same semantics as in the imacropy.iconsole
IPython extension, a custom console was needed. This was added to imacropy
as imacropy.console.MacroConsole
. An updated version of this technology (including an updated IPython extension) was then included in mcpyrate
.
Why mcpyrate.repl.console.MacroConsole
:
- Catches and reports import errors when importing macros.
- Allows importing the same macros again in the same session, to refresh their definitions.
- When you
from somemod import macros, ...
, this console automatically first reloadssomemod
, so that a macro import always sees the latest definitions.
- When you
- Makes viewing macro docstrings easy.
- When you import macros, beside loading them into the macro expander, the console automatically imports the macro stubs as regular runtime objects. They're functions, so just look at their
__doc__
. - This also improves UX. Without loading the stubs,
from unpythonic.syntax import macros, let
, would not define the namelet
at runtime. Now it does, with the name pointing to the macro stub.
- When you import macros, beside loading them into the macro expander, the console automatically imports the macro stubs as regular runtime objects. They're functions, so just look at their
- IPython-like
obj?
andobj??
syntax to view the docstring and source code ofobj
. - Can list macros imported to the session, using the command
macros?
.
For historical interest, refer to and compare imacropy/iconsole.py and macropy/core/console.py. The result was imacropy/console.py. Now this technology lives in mcpyrate/repl/console.py and mcpyrate/repl/iconsole.py.