Faster parallel computing

More like "Faster parallel computing" for developers who don't understanding anything about memory access or caching.
 
More like "Faster parallel computing" for developers who don't understanding anything about memory access or caching.

what I was thinking, lol


I'm not so sure it an obvious bleh! -- how things are laid out in memory happens at the compiler (linked with the ABI); whilst I can request a block of resource; it's ultimately up to the compiler and the system to accommodate -- milk sounds like a different approach;

In a lot of ways it sounds more like another level of optimisation (re the implied reference to closures) , like (but not the same) as the extra compiler optimisation performed: for example, at the LLVM IR, Swift SIL or Rust MiR layers.
 
[)roi(];18420058 said:
I'm not so sure it an obvious bleh! -- how things are laid out in memory happens at the compiler (linked with the ABI); whilst I can request a block of resource; it's ultimately up to the compiler and the system to accommodate -- milk sounds like a different approach;

In a lot of ways it sounds more like another level of optimisation (re the implied reference to closures) , like (but not the same) as the extra compiler optimisation performed: for example, at the LLVM IR, Swift SIL or Rust MiR layers.

It sounds like Milk tries to create some sort of pre-pass that figures out which iterations of a loop being parallelized access the same memory, and then schedules those iterations that it expects to access the same cache-lines to run on the same core. This can definitely reduce bandwidth in some scenarios, but I doubt it helps very many cases, and it would definitely be worse than something I would implement (given that unlike the compiler, I am aware of the full context of the algorithm and the run-time order of the data).

Given that Milk is an OpenMP extension, I doubt it affects memory layout at all.
 
It sounds like Milk tries to create some sort of pre-pass that figures out which iterations of a loop being parallelized access the same memory, and then schedules those iterations that it expects to access the same cache-lines to run on the same core. This can definitely reduce bandwidth in some scenarios, but I doubt it helps very many cases, and it would definitely be worse than something I would implement (given that unlike the compiler, I am aware of the full context of the algorithm and the run-time order of the data).

Given that Milk is an OpenMP extension, I doubt it affects memory layout at all.
As I implied with:
...milk sounds like a different approach;
Unfortunately it lack specifics; so we're left with conjecture.. yet considering it's MIT it probably has some measure of validity.

I'm not sure I've ever seen a new approach escape trial by fire. So whilst the idea might appear superficially novel; so too were many of the optimisations that we've come to rely upon.
 
Last edited:
Top
Sign up to the MyBroadband newsletter
X