• vunderba 30 minutes ago
    I can't speak to the specifics, but I’ve always theorized that one of the problems was difficulty understanding periodic data (hands, banana bunches, piano keys) particularly in the context of a larger cohesive image (768x512 in SD days).

    Those kinds of structures tend to cause issues. One possible reason that Flux got much better at handling them is simply that it’s a much larger model (on average about four times the size of its SDXL predecessors) at around twelve billion parameters.

    On a slightly related note, I actually added a test around deliberate 4/6 finger hands to my comparison site because hands had become such a solved problem that they turned into an interesting benchmark. It let me check whether models could effectively generate images outside the enormous bias in the training data toward hands with exactly five digits.

    https://genai-showdown.specr.net/#count-tyrone-rugen

  • cheevly 4 hours ago
    The same way text models improved.
    [-]
    • kelseyfrog 4 hours ago
      Remind me what that was?
      [-]
      • yorwba 3 hours ago
        More trainable parameters, more data, higher quality data.