Today I learned that there is a computer vision dataset of 196 invasive species. It's a pretty big dataset too, 19K images with bounding box annotations and 1.2M unlabelled images in addition. The dataset comes with a paper as well as a website and there a few interesting things about it.
For starters the dataset also respects a taxonomy, which means that the classes can be associated with eachother hierarchically. That can be useful because some species may belong to the same family.
But the dataset even goes a step further by taking life cycles into account. Insects of a specfic species look different when they are mere larva, and you may also be interested in what the eggs look like.
The paper gives more details on how they used CLIP to construct the embeddings as well as some benchmarks on common models. But I thought it was interesting to see domain knowledge seeping into the data collection methods like this. Neat!