Skip to main content

Homebrew python with a private github repo and poetry

·1839 words·9 mins
Blog Python Homebrew Poetry
Mike Wyer
Author
Mike Wyer
Table of Contents

The Challenge
#

I have a python library with a CLI, built using the awesome typer library. The dependencies are managed with poetry and the users are mostly gophers with a robust dislike of python.

Getting your own python code working is usually pretty straightforward. Getting someone else’s python code working is usually a bit of a nightmare. Getting the right python version installed, configuring venvs, using their package / dependency manager, etc. etc.

“Wouldn’t it be great if there was a homebrew package?!”

Yes, it surely would. But how do we get from here (python code in a private github repo) to there (packaged binaries in a ruby-based packaging system)?

TL;DR: Example code in my github repo

Assemble the pieces
#

Homebrew docs have a page on building python formulae but it is very quiet on the subject of private github repos, doesn’t mention poetry at all, and refers to homebrew-pypi-poet which was last released 6 years ago when python 3.6 was still considered new and cool. Well, “new” anyway.

This blog post got me to an initial build of a Cask (pre-built zipped “binary” built using pyinstaller ).

It was a start, and did at least provide a way for folks to download a packaged build that would run using its own embedded python binary. No need to manage a python install on the machine.

Note: throughout this project I rely on github release assets, so these steps only work after a new github release has been created. They work fine if it is a draft release, but it still needs to be a release rather than arbitrary files in the repo.

Version 1.0 - a Cask from a private repo
#

Let’s say you have a github org called “my_org_name”, with a python repo called “pybinary_repo” and we’re going to create an executable called “pybinary”.

This needs the gh formula installed, and credentials for your private repo(s):

gh auth login --hostname github.com -p https -w

Now you can set up a Tap (github repo with a homebrew- prefix), eg homebrew-tools, and add it to homebrew with brew tap my_org_name/tools Then in homebrew-tools, create Cask/pybinary.rb:


ORG = "my_org_name"
REPO = "pybinary_repo"

cask "pybinary" do

  # Update version and sha256 to release a new Cask
  version "1.0"
  sha256 "cbeafe76301d2f814487ee6631bc0cbf0708d90034c8a3ab3b8be7a0840aa029"

  depends_on formula: "gh"

  url do
    assets = GitHub.get_release(ORG, REPO, "v#{version}").fetch("assets")
    zip_url = assets.find{|a| a["name"] == "pybinary-#{version}.zip"}.fetch("url")
    [zip_url, header: [
      "Accept: application/octet-stream",
      "Authorization: bearer #{GitHub::API.credentials}"
    ]]
  end
  name "pybinary"
  desc "Epic CLI tool"
  homepage ""

  # Documentation: https://docs.brew.sh/Brew-Livecheck
  livecheck do
    url "https://github.com/#{ORG}/#{REPO}/releases"
  end


  binary "pybinary-#{version}/pybinary"

  caveats do
    "Please run 'xattr -r -d com.apple.quarantine #{staged_path}' to remove the quarantine flag"
  end

  postflight do
    ohai "Removing quarantine flag"
    system_command "/usr/bin/xattr", args: ["-r", "-d", "com.apple.quarantine", staged_path]
    ohai "Unpacking the PyBinary CLI tool"
    # Actually try running it, to spot any problems with packaging asap:
    system_command "#{staged_path}/pybinary-#{version}/pybinary", args: ["--version"]
  end
  # Documentation: https://docs.brew.sh/Cask-Cookbook#stanza-zap
  zap trash: ""
end

Creating the zip file that is going to become the Cask is fairly straightforward, assuming you already have a working venv and poetry package defined for your code.

  1. Install pyinstaller: poetry add pyinstaller (if you don’t already have it)
  2. Bundle the script into a versioned directory: poetry run pyinstaller src/main.py -n pybinary-1.0 (assuming your code is in src/main.py, otherwise use the actual path to the main script)
  3. Change into the “dist” directory (output from pyinstaller): cd dist
  4. Rename the main executable back to its base name: mv pybinary-1.0/pybinary-1.0 pybinary-1.0/pybinary
  5. Zip up the code: zip -r pybinary-1.0.zip pybinary-1.0

You could add this to a Makefile to simplify the process:

POETRY_VERSION=1.8
POETRY=$(shell PATH="$$HOME/bin:$$PATH" command -v poetry${POETRY_VERSION})
VER=$(shell PATH="$$HOME/bin:$$PATH" poetry${POETRY_VERSION} version -s)
CLI_VER=pybinary-${VER}

dist/${CLI_VER}.zip:
        ${POETRY} run pyinstaller src/main.py -n ${CLI_VER} \
        && cd dist \
        && ${CLI_VER}/${CLI_VER} --version \
        && mv ${CLI_VER}/${CLI_VER} ${CLI_VER}/pybinary \
        && zip -r ${CLI_VER}.zip ${CLI_VER}

cli-zip: dist/${CLI_VER}.zip

I have several poetry versions installed because I work on repos owned by other people who haven’t upgraded yet. Poetry is generally stable across patch releases, but almost always has incompatibilities across minor versions. Which means it works fine to have poetry 1.8.3 installed as poetry1.8 in my PATH.

I make life easier by explicitly setting this repo’s required poetry version in the Makefile and checking it exists before doing anything else.

With this setup, I can run make cli-zip and it will build a zip with the current version number from the poetry configuration.

When building a release, I run make cli-zip and add the file to the assets in github.

Then update the version tag and sha256 in pybinary.rb cask file, save, commit, and push the homebrew-tools repo.

Users of the code need to initially install the tap and can then install the cask:

  1. brew install gh
  2. gh auth login --hostname github.com -p https -w
  3. brew tap my_org_name/tools
  4. brew install pybinary

When the release is updated, a brew update; brew upgrade will install the new version.

This works, but it feels kinda ugly. The zipfile can get big, and the workaround to remove the quarantine flags is a bit of a hack.

Can we do it with a Formula instead of a Cask, and build a “proper” homebrew bottle? Yes.

I have now removed the Cask definition from my own repo, but it may be good enough for other folks so I’m happy to share what I learned.

Building a python Formula
#

This is where our packaging process gets about 80% more complex in order to provide a 20% nicer experience for the end user.

In order to build from source, we cannot use the same url hack to download the release assets from github. Or at least, I couldn’t make that work.

Which means we need our own downloader implementation, based on the existing Curl download strategy. This is informed by, but simpler than, other folks solutions

I call it repo.rb:

# Custom downloader for private repo.
class GitHubPrivateRepositoryReleaseDownloadStrategy < CurlDownloadStrategy
  def initialize(url, name, version, **meta)
    parse_url_pattern(url)
    super
  end
  def parse_url_pattern(url)
    url_pattern = %r{https://github.com/([^/]+)/([^/]+)/releases/download/([^-]+)-([0-9.]+)(\.arm\S+)}
    unless url =~ url_pattern
      raise CurlDownloadStrategyError, "Invalid url pattern for GitHub Release."
    end
    _, @owner, @repo, pkg, version, filename = *url.match(url_pattern)
    @tag = "v#{version}"
    @filename = "#{pkg}--#{version}#{filename}"
  end
  def download_url
    "https://api.github.com/repos/#{@owner}/#{@repo}/releases/assets/#{asset_id}"
  end
  private
  def _fetch(url:, resolved_url:, timeout:)
    # HTTP request header `Accept: application/octet-stream` is required.
    # Without this, the GitHub API will respond with metadata, not binary.
    curl_download download_url, "--header", "Accept: application/octet-stream", "--header", "Authorization: Bearer #{GitHub::API.credentials}", to: temporary_path, timeout: timeout
  end
  def asset_id
    @asset_id ||= resolve_asset_id
  end
  def resolve_asset_id
    release_assets = fetch_release_assets
    assets = release_assets.select { |a| a["name"] == @filename }
    raise CurlDownloadStrategyError, "Asset file not found." if assets.empty?
    assets.first["id"]
  end
  def fetch_release_assets
    GitHub.get_release(@owner, @repo, @tag).fetch("assets")
  end
end

Now we can build a Formula/pybinary.rb:

# Homebrew Formula for pybinary
# We need a customer downloader to use github release assets from our private repos
require_relative 'repo.rb'

class Pybinary < Formula
  include Language::Python::Virtualenv

  desc "Epic CLI tool"
  homepage ""
  url "https://github.com/my_org_name/pybinary_repo.git",
    branch: "main",
    tag: "v1.0"

  license ""

  depends_on "python@3.12"
  depends_on "rust" => :build
  depends_on "python-setuptools" => :build

  bottle do
    root_url "https://github.com/my_org_name/pybinary_repo/releases/download",
      using: GitHubPrivateRepositoryReleaseDownloadStrategy
    sha256 cellar: :any, arm64_sonoma: "4a922d718e7e616ab4f59eb4615ec78d2b70c96d4b737d4c2a1d8e5df716d675"
  end

  eval(IO.read(File.join(File.expand_path(File.dirname(__FILE__)), 'resources.rb')))

  def install
    # Handle changes to clang / xcode paths in most recent xcode update.
    # Without this, grpcio fails to build due to missing cstddef inclue.
    ENV.append_to_cflags "-I/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/c++/v1"
    virtualenv_install_with_resources
  end

  test do
    # Not needed
    system "false"
  end
end

Oh wait- what’s that resources.rb?

Oh yeah, this is where it gets fun.

Resources? Dependencies? Requirements? “DLL Hell is back, baby!”
#

The homebrew python docs offer this helpful advice:

You can use brew update-python-resources to help you write resource stanzas.

What does it do? Queries the names of installed python modules, then goes and grabs the latest version of each. Whether or not they are compatible with each other.

Great, except for things like the protobuf library which must be a matching version for any other modules relying on protobufs (eg any google client code).

It’s pretty disappointing to watch a build run for 10-20 minutes, succeed, and then the resulting code crashes on startup.

One frustrating part of this is that we already have poetry to do all the dependency resolution, and it already has a list of specific versions of every module which will satisfy all the constraints. But the homebrew code has no idea about poetry, nor any way to input the same constraints that we have in pyproject.toml.

What’s the workaround?

Export the poetry.lock file to requirements.txt format (which has very simple module_name==version constraints and a list of matched sha256 digests for the downloadable packages of the module). poetry export -o requirements.txt

Then use something like make_resources.py


from pathlib import Path
from subprocess import run
from rich.console import Console
import typer

app = typer.Typer()
console = Console()

@app.command()
def brew_resources(file: Optional[Path] = None, output: Optional[Path] = None) -> None:
    """Outputs brew resource stanzas from requirements.txt."""
    if file is None:
        subprocess.run(["make", "requirements.txt"], check=True)
        file = Path("requirements.txt")
    if output is None:
        output = FORMULA_DIR / "resources.rb"
    pypi_base_url = "https://pypi.org/pypi"
    seen = set()
    resource_blocks = []
    with open(file, "r", encoding="utf-8") as file:
        for line in file:
            line = line.strip()
            if line.startswith("-e") or line.startswith("#") or line == "":
                continue
            while line.endswith("\\"):
                line += next(file).strip()
            if ";" in line:
                pkg_spec, constraints = line.split(" ; ", 1)
            else:
                pkg_spec, constraints = line, ""
            try:
                pkg_name, pkg_version = pkg_spec.split("==")
            except ValueError:
                errs.print(f"Skipping invalid line: {line}")
                continue
            # Ignore package extras
            if "[" in pkg_name:
                pkg_name, *_ = pkg_name.split("[")
            if pkg_name in seen:
                continue
            seen.add(pkg_name)

            pkg_info_url = f"{pypi_base_url}/{pkg_name}/{pkg_version}/json"
            response = requests.get(pkg_info_url, timeout=10)

            if response.status_code != 200:
                errs.print(f"Failed to fetch package info for {pkg_name}=={pkg_version} using {pkg_info_url}")
                continue

            pkg_info = response.json()
            selected_url = ""
            # I prefer the pre-built wheels, especially for google client code.
            # The sdist tar.gz files can be huge, and I'm trying for a quick, simple, repeatable build.
            for suffix in ["-none-any.whl", ".tar.gz"]:
                selected_url = next(
                    (
                        url_info
                        for url_info in pkg_info["urls"]
                        if url_info["url"].endswith(suffix) and url_info["digests"]["sha256"] in constraints
                    ),
                    None,
                )
                if selected_url:
                    break
            if not selected_url:
                errs.print(f"No distribution found for {pkg_name}=={pkg_version}")
                continue

            download_url = selected_url["url"]
            sha256 = selected_url["digests"]["sha256"]

            resource_block = (
                f'resource "{pkg_name}" do\n' f'  url "{download_url}"\n' f'  sha256 "{sha256}"\n' "end\n\n"
            )

            resource_blocks.append(resource_block)
    with open(output, "w", encoding="utf-8") as outfh:
        outfh.writelines(resource_blocks)
    console.print(f"{len(resource_blocks)} resources written to {output}")

During a package release, you can now build a bottle of the homebrew formula:

  1. Update the resources by exporting from poetry to requirements.txt then building resources.rb
  2. Update the version in Formula/pybinary.rb to match the github release tag.
  3. HOMEBREW_NO_INSTALL_FROM_API=1 brew install --build-bottle my_org_name/tools/pybinary
  4. HOMEBREW_NO_INSTALL_FROM_API=1 brew bottle my_org_name/tools/pybinary
  5. Add the newly created pybinary--*.bottle.tar.gz to the github release assets.
    1. I could not seem to prevent the creation of a file with a double hyphen.
    2. But that is what homebrew expects, and it works.
  6. Update Formula/pybinary.rb to update the sha256 cellar digest to match the new bottle. This enables users to quickly install a pre-packaged build and save themselves 10+ minutes of building from source.
  7. Save, commit, and push the tap repo.

Again, users just set up their gh credentials, add the my_org_name/tools tap, and brew install pybinary.

But this time, it’s using the brew-managed python binary and standard pre-built wheels (python module packages) for as many dependencies as possible.