Letting Claude loose on scikit-lego.

2025-06-13

TLDR: it works pretty well, human in the loop remains needed and there are a few small hiccups to overcome but I am eager to try it out more.

I figured it was time to see if we could maybe also automate some things from GitHub. I've used Claude a lot from the command line, and I was impressed with a lot of things, but doing something from the command line while hand-holding is different, fundamentally from seeing if Claude can actually pick up a full PR on its own. With that in mind, I figured that it might be good to have it work on a PR for scikit-lego. It's a scikit-learn utility library that I made, and I have this one issue that is reasonably well documented, but should also require Claude to know about some of the internals.

Setup

Setting this up was relatively easy. A lot is automated when you run this in Claude Code:

/install-github-app

This configures all the keys and gives you a nice step-by-step interface to walk through. However, I did notice it was slightly counterintuitive to get things working right from the start This is in part because there tends to be a bit of a delay. It really can take 30 seconds before Claude responds on GitHub issues.

Long story short, this is the workflow that I ended up with after iterating.

name: Claude Code

on:
  issue_comment:
    types: [created]
  pull_request_review_comment:
    types: [created]
  issues:
    types: [opened, assigned]
  pull_request_review:
    types: [submitted]

jobs:
  claude:
    if: |
      (github.event_name == 'issue_comment' && contains(github.event.comment.body, '@claude')) ||
      (github.event_name == 'pull_request_review_comment' && contains(github.event.comment.body, '@claude')) ||
      (github.event_name == 'pull_request_review' && contains(github.event.review.body, '@claude')) ||
      (github.event_name == 'issues' && (contains(github.event.issue.body, '@claude') || contains(github.event.issue.title, '@claude')))
    runs-on: ubuntu-latest
    permissions:
      contents: write
      issues: write
      pull-requests: write
      id-token: write
    steps:
      - name: Checkout repository
        uses: actions/checkout@v4
        with:
          fetch-depth: 1

      - name: Run Claude Code
        id: claude
        uses: anthropics/claude-code-action@beta
        with:
          anthropic_api_key: ${{ secrets.ANTHROPIC_API_KEY }}
          github_token: ${{ secrets.GITHUB_TOKEN }}

Towards a review

From here, notified Claude on issue #695 which implements a linear embedding trick for scikit-learn pipelines. Claude replied with a branch that contained the work, which I could turn into a pull request #748 with a single click.

CleanShot 2025-06-13 at 17.01.14.png
The description of the issue, it's not a hard one but certainly not incredibly simple either.

Claude is also configured to review any PR. So the first PR that Claude made was first reviewed by Claude before I could get a chance to look.

The first PR looked reasonable, but there were linting errors, and I also noticed that Claude didn't write any documentation for the new feature. A lot of these complaints could be fixed by adding a claude.md file (link) to the project.

It should be stressed that Claude is not perfect when it comes to reviews. After it reviewed itself, it failed to Notice that you do want to clone estimators for safety. I forgot to do that. I had to pinpoint that so the human really needs to remain in the loop.

As expected

I gotta say, a lot of these steps really do work as expected. I can ping Claude from an issue, mention on the PR, and as a review message in the PR. It all works. It's all asynchronously too. So I could have my first look after breakfast and maybe have another review after lunch.

I was generally also impressed with how Claude knew about some of the details of how to implement scikit-learn. It's a popular library, I know. But the convention that properties set within .fit() should end with an underscore (like self.estimator_) is not universally known.

In hindsight, one thing that really does help here is that I have a pretty clear issue with an implementation that really helps motivate what needs to be done. I can imagine that without having the example in the issue, this implementation might not have gone as smoothly.

The setup works pretty well, but a fair warning on a few things:

  • The text that Claude writes inside of GitHub is too long. Text that it writes during the review phase does feel like you're reading pages and it feels like it should be much, much shorter. Maybe claude.md could fix this.
  • You really need to @claude in every response in every thread for it to trigger any work. Replying with a "yes please that" is not enough.
  • It can be hard to know if Claude is working or not. When you respond it can take 30 seconds before you see a reply. Some sort of notifier/ping would be nice here.
CleanShot 2025-06-13 at 17.18.02.png
It took me a while to figure out that the response only comes if I do @claude here.

Socratic prompt

2025-06-10

I spotted a fun prompt (via, gist) the other day. It starts like this:

You are a teacher of algorithms and data-structures who specializes in the use of the socratic method of teaching concepts. You build up a foundation of understanding with your student as they advance using first principles thinking. Explain the subject that the student provides to you using this approach. By default, do not explain using source code nor artifacts until the student asks for you to do so. Furthermore, do not use analysis tools. Instead, explain concepts in natural language. You are to assume the role of teacher where the teacher asks a leading question to the student. The student thinks and responds. Engage misunderstanding until the student has sufficiently demonstrated that they've corrected their thinking. Continue until the core material of a subject is completely covered. I would benefit most from an explanation style in which you frequently pause to confirm, via asking me test questions, that I've understood your explanations so far. Particularly helpful are test questions related to simple, explicit examples. When you pause and ask me a test question, do not continue the explanation until I have answered the questions to your satisfaction. I.e. do not keep generating the explanation, actually wait for me to respond first. Thanks! Keep your responses friendly, brief and conversational.

After which you declare what you want to learn. Maybe something like "I want to talk about minimum spanning trees".

It's designed for ChatGPT but I've found it to work pretty well in Claude and Mistral as well. If you're keen to learn something and also allow yourself to take the time to let it sink in, this approach feels like it might apply generally!

Optional Chaining

2025-06-09

Today I learned, thanks to this blogpost, that Javascript allows for optional chaining.

That means that instead of writing code like this:

if (user && user.profile && user.profile.avatar) {
  console.log(user.profile.avatar);
}

You can write it like this:

console.log(user?.profile?.avatar);

The question mark indicates that the item may not exist. If user or user.profile or user.profile.avatar returns undefined it will just return that directly without the need to chain predicates together with &&.

This is very neat! Thanks to Matt Smith for writing it down!