Faster parallel computing

cguy

Executive Member
Joined
Jan 2, 2013
Messages
5,724
More like "Faster parallel computing" for developers who don't understanding anything about memory access or caching.
 

[)roi(]

Executive Member
Joined
Apr 15, 2005
Messages
6,281
More like "Faster parallel computing" for developers who don't understanding anything about memory access or caching.
what I was thinking, lol

I'm not so sure it an obvious bleh! -- how things are laid out in memory happens at the compiler (linked with the ABI); whilst I can request a block of resource; it's ultimately up to the compiler and the system to accommodate -- milk sounds like a different approach;

In a lot of ways it sounds more like another level of optimisation (re the implied reference to closures) , like (but not the same) as the extra compiler optimisation performed: for example, at the LLVM IR, Swift SIL or Rust MiR layers.
 

cguy

Executive Member
Joined
Jan 2, 2013
Messages
5,724
[)roi(];18420058 said:
I'm not so sure it an obvious bleh! -- how things are laid out in memory happens at the compiler (linked with the ABI); whilst I can request a block of resource; it's ultimately up to the compiler and the system to accommodate -- milk sounds like a different approach;

In a lot of ways it sounds more like another level of optimisation (re the implied reference to closures) , like (but not the same) as the extra compiler optimisation performed: for example, at the LLVM IR, Swift SIL or Rust MiR layers.
It sounds like Milk tries to create some sort of pre-pass that figures out which iterations of a loop being parallelized access the same memory, and then schedules those iterations that it expects to access the same cache-lines to run on the same core. This can definitely reduce bandwidth in some scenarios, but I doubt it helps very many cases, and it would definitely be worse than something I would implement (given that unlike the compiler, I am aware of the full context of the algorithm and the run-time order of the data).

Given that Milk is an OpenMP extension, I doubt it affects memory layout at all.
 

[)roi(]

Executive Member
Joined
Apr 15, 2005
Messages
6,281
It sounds like Milk tries to create some sort of pre-pass that figures out which iterations of a loop being parallelized access the same memory, and then schedules those iterations that it expects to access the same cache-lines to run on the same core. This can definitely reduce bandwidth in some scenarios, but I doubt it helps very many cases, and it would definitely be worse than something I would implement (given that unlike the compiler, I am aware of the full context of the algorithm and the run-time order of the data).

Given that Milk is an OpenMP extension, I doubt it affects memory layout at all.
As I implied with:
...milk sounds like a different approach;
Unfortunately it lack specifics; so we're left with conjecture.. yet considering it's MIT it probably has some measure of validity.

I'm not sure I've ever seen a new approach escape trial by fire. So whilst the idea might appear superficially novel; so too were many of the optimisations that we've come to rely upon.
 
Last edited:
Top