[)roi(]
Executive Member
- Joined
- Apr 15, 2005
- Messages
- 6,282
The error function can't be simply scaled up as a multiple of 255; as the error isn't a factor of colour component but error distribution; this dither when applied correctly has a distinct honey comb effect on the pixels, losing floating point precision results in something quite different. As to optimisation, I implied the results are calculated i.e. the float division is not executed for every pixel, as this is unrelated to the pixel values, but in the case of the texture it would have to be re its storage / retrieval.You're not even using the right terminology. What do you mean by "optimized out" - the compiler doesn't have advance knowledge of the data that will be run so it can't optimize out anything we've discussed. Do you perhaps mean "early out" or "skip" paths not taken? This won't happen either because this is a SIMT/D architecture, which will rather execute both sides of the branch and predicate the results for short code blocks. The error function can easily be scaled up to X/255, which will either have hardware to normalize it to 0-1, or will simply require that it be multiplied by 1/255, which is much faster than running through a flattened condition tree.
As to the branch, sure it has a cost (that I don't doubt), but to understand how much, it would probably be far simpler to compare the frame rendering times for the current version of the kernel versus an alternative that avoids branch by using textures. Where I'm not so convinced is that dependant texture reads from the cache are also costly and whether this cost is as you imply marginal in comparison with the branches.
True error diffusion dithering is anyway impossible to implement on the GPU as all the good ones require that the calculated error be proportionately diffused across multiple neighboring pixels at the same time (carry forward).
The GPU compatible dithering formulation I've used is an adaptation of ordered dithering where an error is applied to the current pixel based on its value and the offset of its neighbouring pixels -- this is determined in my code with it's calculated modulus of the chosen dither matrix size. It's this linkage to the neighbouring pixels that I believe invalidates the advantage that could be derived from the use of textures and the cache. If I'm correct this is going to completely nullify any ability to optimise the texture cache reads. Now whether that is worse, the same or better than the ugly duckling is anyone's guess. Performance profiling would be needed.
Anyway I'm in two minds about whether its really worth the extra effort, yet I do agree that the nearest neighbour colour downsampling for the reduced palettes would probably benefit from using a texture as opposed to all the ifs.
Alway thanks for the thought provoking debate.
Last edited:

