Letting Claude loose on scikit-lego.
TLDR: it works pretty well, human in the loop remains needed and there are a few small hiccups to overcome but I am eager to try it out more.
I figured it was time to see if we could maybe also automate some things from GitHub. I've used Claude a lot from the command line, and I was impressed with a lot of things, but doing something from the command line while hand-holding is different, fundamentally from seeing if Claude can actually pick up a full PR on its own. With that in mind, I figured that it might be good to have it work on a PR for scikit-lego. It's a scikit-learn utility library that I made, and I have this one issue that is reasonably well documented, but should also require Claude to know about some of the internals.
Setup
Setting this up was relatively easy. A lot is automated when you run this in Claude Code:
/install-github-app
This configures all the keys and gives you a nice step-by-step interface to walk through. However, I did notice it was slightly counterintuitive to get things working right from the start This is in part because there tends to be a bit of a delay. It really can take 30 seconds before Claude responds on GitHub issues.
Long story short, this is the workflow that I ended up with after iterating.
name: Claude Code
on:
issue_comment:
types: [created]
pull_request_review_comment:
types: [created]
issues:
types: [opened, assigned]
pull_request_review:
types: [submitted]
jobs:
claude:
if: |
(github.event_name == 'issue_comment' && contains(github.event.comment.body, '@claude')) ||
(github.event_name == 'pull_request_review_comment' && contains(github.event.comment.body, '@claude')) ||
(github.event_name == 'pull_request_review' && contains(github.event.review.body, '@claude')) ||
(github.event_name == 'issues' && (contains(github.event.issue.body, '@claude') || contains(github.event.issue.title, '@claude')))
runs-on: ubuntu-latest
permissions:
contents: write
issues: write
pull-requests: write
id-token: write
steps:
- name: Checkout repository
uses: actions/checkout@v4
with:
fetch-depth: 1
- name: Run Claude Code
id: claude
uses: anthropics/claude-code-action@beta
with:
anthropic_api_key: ${{ secrets.ANTHROPIC_API_KEY }}
github_token: ${{ secrets.GITHUB_TOKEN }}
Towards a review
From here, notified Claude on issue #695 which implements a linear embedding trick for scikit-learn pipelines. Claude replied with a branch that contained the work, which I could turn into a pull request #748 with a single click.

Claude is also configured to review any PR. So the first PR that Claude made was first reviewed by Claude before I could get a chance to look.
The first PR looked reasonable, but there were linting errors, and I also noticed that Claude didn't write any documentation for the new feature. A lot of these complaints could be fixed by adding a claude.md
file (link) to the project.
It should be stressed that Claude is not perfect when it comes to reviews. After it reviewed itself, it failed to Notice that you do want to clone estimators for safety. I forgot to do that. I had to pinpoint that so the human really needs to remain in the loop.
As expected
I gotta say, a lot of these steps really do work as expected. I can ping Claude from an issue, mention on the PR, and as a review message in the PR. It all works. It's all asynchronously too. So I could have my first look after breakfast and maybe have another review after lunch.
I was generally also impressed with how Claude knew about some of the details of how to implement scikit-learn. It's a popular library, I know. But the convention that properties set within .fit()
should end with an underscore (like self.estimator_
) is not universally known.
In hindsight, one thing that really does help here is that I have a pretty clear issue with an implementation that really helps motivate what needs to be done. I can imagine that without having the example in the issue, this implementation might not have gone as smoothly.
The setup works pretty well, but a fair warning on a few things:
- The text that Claude writes inside of GitHub is too long. Text that it writes during the review phase does feel like you're reading pages and it feels like it should be much, much shorter. Maybe
claude.md
could fix this. - You really need to
@claude
in every response in every thread for it to trigger any work. Replying with a "yes please that" is not enough. - It can be hard to know if Claude is working or not. When you respond it can take 30 seconds before you see a reply. Some sort of notifier/ping would be nice here.
