Introducing venvstacks: layered Python virtual environments

In our recent announcement of Apple MLX support in LM Studio 0.3.4, we alluded to a Python utility for creating an "... integrated set of independently downloadable Python application environments".

Today we're excited to open source this utility: introducing venvstacks! [Github repo]

venvstacks enabled us to ship our MLX engine, which is a Python application, within LM Studio without requiring the end user to install any Python depedencies.

venvstacks is live on PyPi: $ pip install --user venvstacks

Meet `venvstacks`

Machine learning and AI libraries for Python are big. Really big.

— not Douglas Adams

venvstacks is a new venv based Python project which uses Python's sitecustomize.py environment setup feature to chain together three layers of Python virtual environments:

"Runtime" layers: environments containing the desired version of a specific Python interpreter
"Framework" layers: environments containing desired versions of key Python frameworks
"Application" layers: environments containing components to be launched directly

While the layers are archived and published separately, their dependency locking is integrated, allowing the application layers to share dependencies installed in the framework layers, and the framework layers to share dependencies installed in the runtime layers.

Using `venvstacks` to build and publish Python environments

Defining environment stacks

The environment layers to be published are defined in a venvstacks.toml stack specification, with a separate array of tables for each kind of layer definition.

For example, the following specification defines a pair of applications which use scikit-learn as a shared framework layer with numpy preinstalled in the runtime layer, all running in a controlled Python 3.11 base runtime:

[[runtimes]]
name = "[email protected]"
fully_versioned_name = "[email protected]"
requirements = [
    "numpy",
]

[[frameworks]]
name = "sklearn"
runtime = "[email protected]"
requirements = [
    "scikit-learn",
]

[[applications]]
name = "classification-demo"
launch_module = "launch_modules/sklearn_classification.py"
frameworks = ["sklearn"]
requirements = [
    "scikit-learn",
]

[[applications]]
name = "clustering-demo"
launch_module = "launch_modules/sklearn_clustering.py"
frameworks = ["sklearn"]
requirements = [
    "scikit-learn",
]

Locking environment stacks

$ venvstacks lock sklearn_demo/venvstacks.toml

The lock subcommand takes the defined layer requirements from the specification, and uses them to perform a complete resolution of all of the environment stacks together that ensures the different layers can be published separately, but still work as expected when deployed to a target system. The locking mechanism is defined such that only changes to modules a given layer uses in lower layers affect them, rather than upper layers needing to be rebuilt for every change to a lower layer.

Building environment stacks

$ venvstacks build sklearn_demo/venvstacks.toml

The build subcommand performs the step of converting the layer specifications and their locked requirements into a working Python environment (either a base runtime environment, or a layered virtual environment based on one of the defined runtime environments). If the environments have not already been explicitly locked, the build step will lock them as necessary.

This command is also a "build pipeline" command that allows locking, building, and publishing to be performed in a single step (see the command line help for details).

Publishing environment layer archives

$ venvstacks publish --tag-outputs --output-dir demo_artifacts sklearn_demo/venvstacks.toml

Once the environments have been successfully built, the publish command allows each layer to be converted to a separate reproducible binary archive suitable for transferring to another system, unpacking, and using the unpacked environments to run the included applications (needing only a small post-installation step using a Python script embedded in the built layer archives to correctly relink the deployed environments with each other in their deployed location on the target system).

Metadata regarding the layer definitions and the published artifacts is published alongside the published archives (to demo_artifacts/__venvstacks__/ in the given example) . This metadata captures both input details (such as the hashes of the locked requirements and the included launch modules) and output details (such as the exact size and exact hash of the built layer archive).

Locally exporting environment stacks

$ venvstacks local-export --output-dir demo_export sklearn_demo/venvstacks.toml

Given that even considering the use of venvstacks implies that some layer archives may be of significant size (a fully built PyTorch archive weighs in at multiple gigabytes, for example), packing and unpacking the layer archives can take a substantial amount of time.

To avoid that overhead when iterating on layer definitions and launch module details, the local-export subcommand allows the built environments to be copied to a different location on the same system, with most of the same filtering steps applied as would be applied when performing the archive pack-and-unpack steps (the omissions are details related to reproducible builds, like clamping the maximum file modification times to known values).

Locally exporting environments produces much of the same metadata as publishing layer archives, but the details related specifically to the published archive (such as its size and expected contents hash) are necessarily omitted.

`venvstacks` in LM Studio

The open source mlx-engine is deployed in the LM Studio desktop application as a launch module in an application layer environment that declares the required runtime package dependencies, running atop an MLX framework layer and a CPython 3.11 base runtime layer.

The use of venvstacks then allows LM Studio to introduce additional MLX based features without needing to duplicate the MLX framework layer, and additional Python based features without needing to duplicate the Python runtime layer.

Over time, the ability to distribute multiple application, framework, and even base runtime layers in parallel allows for graceful migrations to newer component versions without any disruption to LM Studio users.

Trying `venvstacks` for yourself

The initial release of venvstacks is available from the Python Package Index, and can be installed with pipx (and similar tools):

$ pipx install venvstacks

For additional usage information, consult the venvstacks project documentation, and the command line help:

$ venvstacks --help

 Usage: venvstacks [OPTIONS] COMMAND [ARGS]...

 Lock, build, and publish Python virtual environment stacks.

╭─ Options ───────────────────────────────────────────────────────────────────────╮
│ --help          Show this message and exit.                                     │
╰─────────────────────────────────────────────────────────────────────────────────╯
╭─ Commands ──────────────────────────────────────────────────────────────────────╮
│ build          Build (/lock/publish) Python virtual environment stacks.         │
│ local-export   Export layer environments for Python virtual environment stacks. │
│ lock           Lock layer requirements for Python virtual environment stacks.   │
│ publish        Publish layer archives for Python virtual environment stacks.    │
╰─────────────────────────────────────────────────────────────────────────────────╯

Contributing to `venvstacks` development

venvstacks is MIT Licensed and is developed on GitHub:

https://github.com/lmstudio-ai/venvstacks

If you have a suitable use case, the easiest way to contribute to venvstacks development is just to try it out, and let us know how that goes. What did you like, what did you dislike, what just plain broke?

If anything does break, then please open an issue (if the problem hasn't already been reported). If you're not sure if some behaviour is a bug or not, or would just like to provide general feedback rather than file specific issues or suggestions, the Discord channels mentioned below are the best way to get directly in touch with the developers. The "Packaging" category on https://discuss.python.org/ is also a reasonable place to provide feedback.

We also already have a lot of ideas for ways in which venvstacks could be improved.

While we've recorded many of those ideas because we plan to implement them ourselves, there are also others where we recorded them because we think they're interesting and would be open to seeing them included, but don't have any immediate need for them.

For additional information, consult the venvstacks developer documentation.

Discuss venvstacks in general in the new #venvstacks channel on the Python Packaging Discord Server.

Discuss the use of mlx-engine and venvstacks in LM Studio in the #dev-chat channel on the LM Studio Discord Server.

Download LM Studio for Mac / Windows / Linux from https://lmstudio.ai/download.

Meet venvstacks

Using venvstacks to build and publish Python environments