Navigation
- README
- Pure-Python feature set
- Syntactic macro feature set
- Examples of creating dialects using
mcpyrate
- REPL server
- Troubleshooting
- Design notes
- Essays
- Additional reading
- Contribution guidelines
Table of Contents
Rule #1: Code and/or documentation contributions are welcome!
-
Scope: language extensions and utilities.
- Lispy, haskelly, and/or functional features all fit. It can be anything from a small utility function to a complete (optional) change of language semantics.
- Lisp should be understood in the familial sense, including e.g. Common Lisp, Scheme (including the SRFI extensions), and Racket.
- Some lispy features are actually imperative, not functional. This is fine. Just like Python, Lisp is a multi-paradigm language.
- Borrowing ideas from random other languages is fine, too. The
dbg[]
macro was inspired by Rust. What next... Julia?
- If a feature is large, and useful by itself, a separate project may be The Right Thing.
- Consider
pydialect
andimacropy
, which are closely related tounpythonic
, but separate. (Those features are now part ofmcpyrate
, but the point stands.) - The hot-patching REPL server (
unpythonic.net
) is a borderline case. Hot-patching live processes is a legendary Common Lisp feature (actually powered by Swank [0] [1] [2] [3]), so arguably it belongs; but the machinery is large and only loosely coupled to the rest ofunpythonic
, which favors splitting it off into a separate project.
- Consider
- When in doubt, osmosis: if it feels like a particular feature is missing, chances are high that it will fit.
- Lispy, haskelly, and/or functional features all fit. It can be anything from a small utility function to a complete (optional) change of language semantics.
-
Motivation: teaching, learning, bringing Python closer to perfection, increasing productivity.
unpythonic
started as a collection of code analysis exercises, developed while teaching the special course RAK-19006: Python 3 in Scientific Computing at Tampere University of Technology, spring term 2018.- Aim at clarity. Target generally intelligent readers, who are not necessarily familiar with the specific construct you're writing about, or even with the language that may have inspired it.
- For example, the special course was aimed at M.Sc. and Ph.D. students in the engineering sciences. It's common in fields of science involving computing to be proficient in mathematics, yet not have much programming experience.
- Aim at increasing your own future productivity. Is Python missing some batteries and/or language features?
- Not all ideas have to be good, especially right away. Generally the only way to find out is to try, and then iterate.
-
Be pythonic: find pythonic ways to do unpythonic things.
- Fitting user expectations of how Python behaves beats mathematical elegance.
- For example,
scanr
returns its results in the same order as it scans them, even though this breaks Haskell tradition.
-
Be obsessively correct.
- Get the terminology right. This promotes clear thinking. For example:
- A function definition has (formal) parameters, which are filled by arguments at call time.
- Dynamic assignment is descriptive, while dynamic scoping is nonsense, because scope is arguably a lexical concept (cf. dynamic extent).
- Sometimes different subcultures have different names for the same ideas (e.g. Python's unpacking vs. Lisp's destructuring). When there is no universal standard, feel free to pick one option, but list the alternative names if reasonably possible. Bringing a touch of the Rosetta stone helps discoverability.
- If I have made a terminology mistake, please challenge it, to get it fixed in a future release.
- Lack of robustness is a bug. The code should do the right thing in edge cases, possibly in corner cases too.
- For example,
memoize
catches and caches exceptions. The singleton-related abstractions (Singleton
,sym
andgsym
) worry about the thread-safety of constructing the singleton instance. All custom data structure types worry about pickling. - When it doesn't make sense to cover all corner cases, think it through, and give examples (in documentation) of what isn't covered.
- For example,
- Get the terminology right. This promotes clear thinking. For example:
-
Be obsessively complete when going the extra mile adds value.
- For example:
- Not only a summarizing
minmax
utility, butrunning_minmax
as well. The former is then just a one-liner expressed in terms of the latter. foldl
accepts multiple iterables, has a switch to terminate either on the shortest or on the longest input, and takes its arguments in a curry-friendly order. It also requires at least one iterable, so thatcurry
knows to not trigger the call until at least one iterable has been provided.curry
changes Python's reduction semantics to be more similar to Haskell's, to pass extra arguments through, and keep calling if an intermediate result is a function, and there are still such passed-through arguments remaining. This extends what can be expressed concisely, for example a classic lispymap
iscurry(lambda f: curry(foldr, composerc(cons, f), nil))
. Feed that a function and an iterable, and get a linked list with the mapped results. Note the arity mismatch;f
is 1-to-1, butcons
is 2-to-1.curry
also supports our@generic
functions, and named return values...
- Not only a summarizing
- Make features work together when it makes sense. Aim at composability. Try to make features orthogonal when reasonably possible, so that making them work together requires no extra effort. When not possible, purposefully minimizing friction in interaction between features makes for a coherent, easily understandable language extension.
- For example:
-
Be concise but readable, like in mathematics.
- If an implementation could be made shorter, perhaps it should. But resist the temptation to golf. If readability or features would suffer, that's when to stop.
- Some extra complexity (and hence length) is tolerable when it adds value.
- Leaving a commented-out, elegant illustrative example version of the code (in addition to the battle-ready production version that's actually in use) can be useful to document the core idea. See
memoize
andcurry
for examples.
- As a rough guideline, at most ~700 lines per module. Python is expressive; that's a lot.
- Language extensions and utilities tend to be much more dense than e.g. GUI or database code.
- Functional style can also enable very dense code by taking advantage of the ability to define custom higher-order functions and use function composition.
- Prefer ~100-300 SLOC per module, if reasonably possible.
- Don't obsess over it, if going over allows a better solution.
- Docstrings, comments and blanks don't count towards SLOC. The included
countlines.py
is what I've used for measurement.
- If an implementation could be made shorter, perhaps it should. But resist the temptation to golf. If readability or features would suffer, that's when to stop.
-
Test aggressively.
- Beside guarding against regressions, automated tests also serve as documentation (possibly with useful comments).
- Write tests for each module (unit), as well as any relevant interactions between features (integration).
- Keep in mind the Practical Test Pyramid (Ham Vocke, 2018).
-
Document aggressively.
- Useful docstrings for public API functions are mandatory in release-worthy code.
- Explain important points, omit boilerplate.
- Use reStructuredText syntax.
- Prefer basic features that enhance readability, yet avoid ruining the plain-text aesthetics.
- Having no docstring is better than having a placeholder docstring.
- If a function is not documented, make that fact explicit, to help static analyzers flag it as needing documentation.
- To help discoverability, the full documentation
doc/features.md
(ordoc/macros.md
, as appropriate) should contain at least a mention of each public feature. Examples are nice, too. - Features that have non-obvious uses (e.g.
@call
), as well as those that cannot be assumed to be familiar to developers mostly working in Python (e.g. Common Lisp style conditions and restarts) should get a more detailed explanation.
- Useful docstrings for public API functions are mandatory in release-worthy code.
In short: regular code is in unpythonic
, macros are in unpythonic.syntax
, and REPL server related stuff is in unpythonic.net
.
Automated tests are in tests
(note plural), under the directory whose modules they test. The test runner is, unsurprisingly, runtests.py
, at the top level. Yes, I know many developers prefer to separate the src
and tests
hierarchies at the top level; we currently don't, mostly for historical reasons.
For coverage analysis, coverage.py
works fine for analyzing statement coverage. Block macros do cause some false negatives, but this is minor.
We use a custom testing framework, which lives in the modules unpythonic.test.fixtures
(note singular test
, part of the framework name) and unpythonic.syntax.testingtools
. It uses conditions and restarts to communicate between individual tests and the testset, which acts as a reporter.
In retrospect, given that the main aim was compact testing syntax for macro-enabled Python code (without installing another import hook, doing which would disable the macro expander), it might have made more sense to make the testing macros compile to pytest. But hey, it's short, may have applications in teaching... and now we can easily write custom test runners, since the testing framework is just a mcpyrate
library. It's essentially a no-framework (cf. "NoSQL"), which provides the essentials and lets the user define the rest.
(The whole framework is about 1.8k SLOC, counting docstrings, comments and blanks; under 700 SLOC if counting only active code lines. Add another 800 SLOC (all) / 200 SLOC (active code lines) for the condition system.)
Since unpythonic
is a relatively loose collection of language extensions and utilities, that's about it for the 30 000 ft (9 144 m) view.
To study a particular feature, just start from the entry point that piques your interest, and follow the definitions recursively. Use an IDE or Emacs's anaconda-mode
for convenience to stay sane. Look at the automated tests; those double as usage examples, sometimes containing finer points that didn't make it to prose documentation.
curry
has some cross-cutting concerns, but nothing that a grep wouldn't find. Same goes for the multiple-dispatch system (particularly @generic
).
The lazify
and continuations
macros are the most complex (and perhaps fearsome?) parts. As for the lazifier, grep also for passthrough_lazy_args
and maybe_force_args
. As for continuations, read the tco
macro first, and keep in mind how that works when reading continuations
. The continuations
macro is essentially what academics call "a standard CPS transformation", plus some technical details due to various bits of impedance mismatch.
unpythonic.syntax.scopeanalyzer
is a unfortunate artifact that is needed to implement macros that interact with Python's scoping rules, notably let
. Fortunately, the language reference explicitly documents what is needed for a lexical scope analysis for Python. So we have just implemented that (better, as an AST analysis, rather than scanning the surface syntax text).
As of the first half of 2021, the main target platforms are CPython 3.8 and PyPy3 3.7 (since as of April 2021, PyPy3 does not have 3.8 yet). The code should run on 3.6 or any later Python. We have a GitHub workflow that runs the test suite on CPython 3.6 through 3.9, and on PyPy3.
-
Use semantic versioning.
- For now (early 2020), there's a leading zero, but the intent is to drop it sooner rather than later.
-
Use from-imports.
- The
unpythonic
style isfrom ... import ...
. - The from-import syntax is mandatory for macro imports in user code, anyway, since macro expanders for Python (including
mcpyrate
) support onlyfrom ... import macros, ...
for importing macros. We just use the same style also for regular imports. - For imports of certain features of
unpythonic
(including, but not limited to,curry
), our macro code depends on those features being referred to by their original bare names at the use site. This won't work if theimport ...
syntax is used. For the same reason, locally renamingunpythonic
features withfrom ... import ... as ...
should be avoided. - For imports of stuff from outside
unpythonic
, it's a matter of convention. Sometimesimport ...
can be clearer. - No star-import
from ... import *
, except in the top-level__init__.py
, where it is allowed for re-exporting the public APIs.
- The
-
Try to pick good names.
- "There are only two hard things in Computer Science: cache invalidation and naming things." --Phil Karlton
- Try to pick a short, descriptive name that doesn't need to be changed soon.
- Preferably one word, but if there is none that will fit, and no chance for a newly coined yet obvious word (e.g. SymPy's
lambdify
), then perhaps two or more words separated by underscores must do the job... for now. - When coining a new term, try to quickly check that it's not in common use for something entirely different in some other programming subculture.
- Observe the two different meanings of
throw
/catch
in Common Lisp vs. C++/Java.
- Observe the two different meanings of
-
Try to keep backward compatibility. But build for the long term.
- Practicality beats purity. Backward compatibility is important.
- But sometimes elegance beats practicality. If a breaking change makes the code much simpler, it may be The Right Thing... to schedule for the next major version milestone.
-
Avoid external dependencies.
- Diametrically opposite to the sensible approach for most software projects, but
unpythonic
is meant as a standalone base to build on. Few dependencies makes it easy to install, and more unlikely to break. mcpyrate
is fine, but keep the macro features (and themcpyrate
dependency) strictly optional.unpythonic
should be able to run, without its macro features, on a standard Python.- Macros can depend on regular code.
unpythonic.syntax
is a subpackage, so the parent level__init__.py
has already finished running when it's imported.
- Diametrically opposite to the sensible approach for most software projects, but
-
Build a tower of abstractions.
- Internal dependencies are encouraged. It is precisely the role of a language extension and utility library to make code shorter. As long as the result imports without errors, it's fine.
- For example, some parts already use
dyn
, somebox
, yet othersscanl
orfoldl
.
-
Export explicitly.
- In regular (non-macro) code, any names intended as exports must be present in
__all__
. Although not everyone uses it, this is the official mechanism to declare a module's public API, recommended by PEP8. - It makes re-export trivial, so that
from .somemodule import *
, in the top-level__init__.py
, automatically pulls in the public API, and only the public API ofsomemodule
. - This in turn helps properly support
from unpythonic import *
for interactive sessions. - While "anything without a leading underscore is public" is often a reasonable guideline, that includes also any imports done by the module... which more often than not should not be blindly re-exported. So be explicit.
- Only populate
__all__
explicitly, manually, to allow IDEs and static analysis tools to work properly. (No tricks. See commented-out code inunpythonic.llist
for an example of a bad idea.)
- In regular (non-macro) code, any names intended as exports must be present in
-
Be curry-friendly whenever reasonably possible.
- Even though it is more pythonic to pass arguments by name, passing them positionally should not be ruled out.
- Parameters that change the least often, and hence are meaningful to partially apply for, should go on the left.
- For higher-order functions this usually means the user function on the left, data on the right.
- To say
def f(other, args, *things)
in situations where passing in at least onething
is mandatory, the correct signature isdef f(other, args, thing0, *things)
.- This makes it explicit that at least a
thing0
must be supplied beforef
can be meaningfully called.curry
needs this information to work properly.
- The syntactic issue is that Python has a Kleene-star
*args
, but no Kleene-plus+args
that would require at least one. - The standard library doesn't bother, so e.g.
map
may fire prematurely when curried. (Hence we provide a curry-friendly wrapper formap
.)
- This makes it explicit that at least a
-
Be functional (FP) when it makes sense.
- Don't mutate input unnecessarily. Construct and/or edit a copy instead, and return that.
- Macros are an exception to this. Syntax transformers usually edit the AST in-place, not build a copy.
- If there is a useful value that could be returned, return it, even if the function performs a mutating operation. This allows chaining operations.
- Don't mutate input unnecessarily. Construct and/or edit a copy instead, and return that.
-
Refactor aggressively. Extract reusable utilities.
- When implementing something, if you run into an empty niche, add the missing utility, and implement your higher-level functionality in terms of it.
- This keeps code at each level of abstraction short, and exposes parts that can later be combined in new ways.
-
Compile-time or run-time?
- For anyone new to making programming languages: there's a reason the terms static/lexical/compile-time and dynamic/run-time are grouped together.
- At compile time (macros), you have access to the source code (or AST), including its lexical structure. (I.e. what is defined inside what, in the source code text.)
- You also have access to the macro bindings of the current expander, because for the macros, it's run time.
- A block macro (
with mac:
) takes effect for the lexical content of that block.
- At run time (regular code), you have access to run-time bindings of names (e.g. whether
curry
refers tounpythonic.fun.curry
or something else), and the call stack.- Keep in mind that in Python, knowing what a name at the top level of a module (i.e. a "global variable") points to is only possible at run time. Although it's uncommon, not to mention bad practice in most cases, any code anywhere may change the top-level bindings in any module (via
sys.modules
). - A run-time context manager (
with mgr:
) takes effect for the dynamic extent of that block.
- Keep in mind that in Python, knowing what a name at the top level of a module (i.e. a "global variable") points to is only possible at run time. Although it's uncommon, not to mention bad practice in most cases, any code anywhere may change the top-level bindings in any module (via
- Try to take advantage of whichever is the most appropriate for what you're doing.
-
Follow PEP8 style, including the official recommendation to violate PEP8 when the guidelines do not apply. Specific to
unpythonic
:- Conserve vertical space when reasonable. Even on modern laptops, a display can only fit ~50 lines at a time.
x = x or default
for initializingx
inside the function body ofdef f(x=None)
(when it makes no sense to publish the actual default value) is concise and very readable.- But avoid overusing if-expressions.
- Line width ~110 columns. This still allows two columns in a modern editor.
- Can locally go a character or three over, if that gives globally a more pleasing layout for the text. Examples include a long word at the end of a line, or if a full stop wouldn't fit.
- No line breaks in URLs, even if over 110 characters. URLs should be copy'n'pasteable, as well as allow
link-hint-open-link-at-point
to work in Emacs.
- European style: one space between a full stop and the next sentence.
- A blank line in code plays the role of a paragraph break in prose.
- Insert a blank line when the topic changes, or when doing so significantly increases clarity.
- Sometimes a group of related very short methods or functions looks clearer without blank lines separating them. This makes the grouping explicit.
- One blank line after most function definitions, as well as class definitions.
- Two blank lines only if the situation already requires a blank line, and the topic changes.
- Use your judgment. E.g. maybe there should be two blank lines between the class definitions of
ThreadLocalBox
andShim
, but on the other hand, maybe not. The deciding question is whether we consider them as belonging to the same conceptual set of abstractions.
- Use your judgment. E.g. maybe there should be two blank lines between the class definitions of
-
Contracts and type checking.
- Contracts. State clearly in the docstrings what your code expects and what it provides. Semantics are crucial. Type is nice to know, but not terribly important.
- What is the service the function provides? (General explanation.)
- What are the requirements on its input, for it to perform that service? (Parameters.)
- What happens if those requirements are violated? (E.g. possible exception types, and why each of them may be raised. Obvious ones may be omitted.)
- Provided those requirements are satisfied, what is guaranteed about its output? (Return value(s).)
- Are there invariants the function preserves? (Notes.)
- In case it's not obvious, is the function pure? This is important in a multi-paradigm language.
- Dynamic type checking is ok, both
isinstance
and duck variants.- A thin abstraction layer (in the sense of On Lisp) might not even need a type check of its own, if, upon a type error, it's clear from the resulting stack trace that the function crashed because a specific argument didn't make sense. This keeps the codebase clutter-free.
- In cases where an error message more specific to your function is helpful, then a dynamic type check and a corresponding
raise
upon failure is The Right Thing. - Report what was expected, what the actual value was instead - at least the type, but preferably also the value. This helps track down bugs in code using that function, based on the stack trace alone.
- Contract validation at run time is also ok - it's a Turing-complete language, after all.
- If the requirements on the input don't hold, then the caller is at fault. If the guarantees about output don't hold, then the function broke its own contract. This helps narrow down bugs.
- Obviously, some properties cannot be automatically checked (e.g., is the given iterable finite?). The docstring is the most important.
- To help avoid accidental transpositions of arguments in function calls, take advantage of Python's named arguments.
- Passing arguments positionally can be risky. Even static type declarations won't help detect accidental transpositions of arguments that have the same type.
- Any arguments that don't have a standard ordering are good candidates to be made keyword-only.
- E.g. a triple of coordinates is almost always ordered
(x, y, z)
, so individual coordinate arguments are a good candidate to be passed positionally. But thesrc
anddst
parameters in a file copy operation could be defined either way around. So to prevent bugs, The Right Thing is to require stating, at the call site, which is intended to be which.
- E.g. a triple of coordinates is almost always ordered
- Static type declarations are not considered part of
unpythonic
style.- Technically, Python's object model is based on the prototype paradigm, so strictly speaking the language doesn't even have static types. This is perfectly fine.
- Because in Python, it is possible to arbitrarily monkey-patch object instances, there's no guarantee that two object instances that advertise themselves as having the same type are actually alike in any meaningful way.
isinstance
only checks if the object instance was minted by a particular constructor. Until someone retroactively changes the type. (Python'sisinstance
is a close relative of JavaScript'sinstanceof
.)- In practice, there is rarely a need to exploit all of this dynamism. Actual Python code is often much more static, which allows things like the static type checker mypy to work.
- So, static type declarations are not frowned upon, but I likely won't bother with any for the code I write for the library.
- Technically, Python's object model is based on the prototype paradigm, so strictly speaking the language doesn't even have static types. This is perfectly fine.
- Contracts. State clearly in the docstrings what your code expects and what it provides. Semantics are crucial. Type is nice to know, but not terribly important.
-
Macros.
- Macros are the nuclear option of software engineering.
- Only make a macro when a regular function can't do what is needed.
- Sometimes a regular code core with a thin macro layer on top, to improve the user experience, is the appropriate solution for minimizing magic. See
do
,let
,autocurry
,forall
for examples. - A macro entry point can be just a thin wrapper around the relevant syntax transformer: a regular function, which takes and returns an AST.
- You can have an expr, block and decorator macro with the same name, in the same module, by making the macro interface into a dispatcher. See the
syntax
kwarg inmcpyrate
. - Macros and syntax transformers should be sensibly organized into modules, just like any other regular (non-macro) code.
- The docstring should usually be placed on the macro entry point, so the syntax transformer typically does not need one.
- If your syntax transformer (or another one it internally uses) needs
mcpyrate
**kw
arguments:- Declare the relevant
**kw
s as parameters for the macro entry point, therefore requestingmcpyrate
to provide them. Stuff them intodyn
usingwith dyn.let(...)
, and call your syntax transformer, which can then get the**kw
s fromdyn
. See the existing macros for examples. - Using
dyn
keeps the syntax transformer call signatures clean, while limiting the dynamic extent of what is effectively a global assignment. If we used only function parameters, some of the high-level syntax transformers would have to declareexpander
just to pass it through, possibly through several layers, until it reaches the low-level syntax transformer that actually needs it. Avoiding such a parameter definition cascade is exactly the use casedyn
was designed for.
- Declare the relevant
- If a set of macros shares common utilities, but those aren't needed elsewhere, that's a prime candidate for placing all that in one module.
- See e.g.
tailtools.py
, which implementstco
andcontinuations
. The common factor is tail-position analysis.
- See e.g.
- Macros are the nuclear option of software engineering.
-
Violate these guidelines when it makes sense to do so.