I think that your problem statement is missing the most important information:
Is the order/position of the sub-tiles known or not? If yes, it should be fairly trivial (as
@DrJohnZoidberg shows). If not, then the problem becomes a lot more complex.
I am assuming it's the latter, otherwise the problem is fairly easy. In this case, one would have to assemble the image using some sort of
edge similarity measure.
The simple algorithm: Pick a random tile to start with. Then find the tile that best matches one of those edges (look for something like the some of squared differences between adjoining edge pixels as the measure). Find the next best fitting tile, etc. There is the possibility of not fitting all tiles of course.
To improve on the above, one can look for best edge-pairs: To start with, find the best pair of matching edges, and glue those tiles together into larger pieces. Then look for best match between all added edges and all unadded edges. If the new unadded tile touches more than one edge of the added tiles, then apply the fit metric to all touching edges.
Additional thoughts: You may want to apply an
information criterion to the edges as well. So the best candidate would be something like: score for closest match plus score for more information. Where over here, information would be the diversity of pixel colours on the edge (e.g., sqrt of sum of squared adjacent pixel deltas along the same edge). This avoids picking edges that are for example just one colour and matching them to each other as good fits. You could also look more than one pixel edge in (so one layer into the border of the tile), and use a similar edge metric, or even a gradient based edge difference metric (do the pixels deltas follow a gradient? possibly even non-horizontal or non-vertical).
I expect that for most images, you would get quite far with the above, although with (a pre-trained) AI model - you could probably use it score the entire image by level of "shuffled" or "non-shuffled" prompts, allowing you to differentiate between hard to discern tile placements.
Also, one could train a model that tells you whether or not a tile fits into surrounding tiles. This will likely do something to the edge metrics above implicitly, but may do better, since it is more likely to contextualize the contents of the tiles properly. It's trivial to get the data to train such a model, since all you need are images, that you tile up and feed in.
EDIT: I see it is the latter case - good.
It took me a few mins to write the above.