Letting Claude loose on scikit-lego.

2025-06-13

TLDR: I wanted to explore the GitHub integration of Claude. It works pretty well but human in the loop remains needed and a bunch of hiccups to overcome. These might be fixed, but it's not as good as it could be right now and for open-source projects there's a risk for abuse.

I figured it was time to see if we could also automate some things on GitHub with Claude. With that in mind, I figured that it might be good to have it work on a PR for scikit-lego. It's a scikit-learn utility library that I made, and I had this one issue #695 that is reasonably well documented, but does require relatively deep knowledge on how scikit-learn works and how the project is set up.

Setup

Setting this up was relatively easy. A lot is automated when you run this in Claude Code:

/install-github-app

This configures all the keys and gives you a nice step-by-step interface to walk through. However, I did notice it was slightly counterintuitive to get things working right from the start This is in part because there tends to be a bit of a delay. It really can take 30 seconds before Claude responds on GitHub issues.

Long story short, this is the workflow that I ended up with after iterating.

name: Claude Code

on:
  issue_comment:
    types: [created]
  pull_request_review_comment:
    types: [created]
  issues:
    types: [opened, assigned]
  pull_request_review:
    types: [submitted]

jobs:
  claude:
    if: |
      (github.event_name == 'issue_comment' && contains(github.event.comment.body, '@claude')) ||
      (github.event_name == 'pull_request_review_comment' && contains(github.event.comment.body, '@claude')) ||
      (github.event_name == 'pull_request_review' && contains(github.event.review.body, '@claude')) ||
      (github.event_name == 'issues' && (contains(github.event.issue.body, '@claude') || contains(github.event.issue.title, '@claude')))
    runs-on: ubuntu-latest
    permissions:
      contents: write
      issues: write
      pull-requests: write
      id-token: write
    steps:
      - name: Checkout repository
        uses: actions/checkout@v4
        with:
          fetch-depth: 1

      - name: Run Claude Code
        id: claude
        uses: anthropics/claude-code-action@beta
        with:
          anthropic_api_key: ${{ secrets.ANTHROPIC_API_KEY }}
          github_token: ${{ secrets.GITHUB_TOKEN }}

Towards a review

From here, notified Claude on issue #695 which implements a linear embedding trick for scikit-learn pipelines. Claude replied with a branch that contained the work, which I could turn into a pull request #748 with a single click.

CleanShot 2025-06-13 at 17.01.14.png
The description of the issue, it's not a hard one but certainly not incredibly simple either.

Claude is also configured to review any PR. So the first PR that Claude made was first reviewed by Claude before I could get a chance to look. The review picked up a few mistakes, which was good to see, but the review did not trigger an update to the PR. That needs to be triggered manually.

The first PR looked reasonable, but there were linting errors, and I also noticed that Claude didn't write any documentation for the new feature. A lot of these complaints could be fixed by adding a claude.md file (link) to the project.

In hindsight, one thing that really does help here is that I have a pretty clear issue with an implementation that really helps motivate what needs to be done. I can imagine that without having the example in the issue, this implementation might not have gone as smoothly.

I was generally also impressed with how Claude knew about some of the details of how to implement scikit-learn. It's a popular library, I know. But the convention that properties set within .fit() should end with an underscore (like self.estimator_) is not universally known.

As expected?

A lot of "the flow" steps really does work as expected, I can ping Claude from an issue, PR, and as a review message. But at the same time it also feels like the GitHub/Claude collaboration mismatches at times.

  • I would prefer it if Claude would wait it's first review until I reviewed the code myself first. Otherwise I might be too influenced by the first review. You might be able to script this behavior but it is not the default.
  • When you make a small change on a PR the Claude will also re-review the work. But it will re-review the entire PR not the small change that you've just added. Again, be prepared to do a lot of scrolling.
  • The text that Claude writes inside of GitHub is bonkers big. Prepare to scroll when you read a review. Maybe claude.md could fix this, but it feels too long for now.
  • You really need to @claude in every response in every thread for it to trigger any work. Replying with a "yes please that" is not enough.
  • It can be hard to know if Claude is working or not. When you respond it can take 30 seconds before you see a reply and sometimes a hard browser refresh seems needed. Some sort of notifier/ping would be nice here.
  • You know how you can write "this pr fixes #issue-id" and then once the PR goes in the issue goes away? That's missing from PRs made by Claude somehow, so you need to do that manually yourself.
  • The claude user makes it's own branch and I noticed that CI checks don't always understand what's going on. There was one instance where I had to commit a small change manually for the CI to run normally.
  • Claude isn't amazing when it comes to fixing ruff linting errors. I think part of the issue is that it doesn't look at the linting errors in the first place. But I could be wrong on that.
  • It's more expensive than I would have thought. One larger PR and one smaller one cost $4.22. It's cheap when you consider the amount of work that it does, but it's less cheap when you realise that it's hard to control the tokens that it uses/emits. You also cannot use a Max subscription here and need to do everything via the API. For an open-source project I initially worried this is a risk, but it seems that Claude review only triggers if the original author has write access to the repo (via).
CleanShot 2025-06-13 at 17.18.02.png
It took me a while to figure out that the response only comes if I do @claude here.

A lot of these issues can be fixed over time. But right now, it's not as good as it could be.

I am eager to look towards a future where some of the open-source work could be handed off to Claude asynchronously. The tech in Claude might be close to ready, but the DX with Github ins't there yet. As it stands right now, it might just be more pragmatic to run Claude from the command line locally in a seperate terminal and to check in from there.

For now, I will be turning Claude off on the repo, mainly because it feels like the Cursor offering will work better for me, but who knows ... I might change my mind after a few iterations.