That looks cool but also kind of normal. I've done some tests before with memory scaling on my system and gotten similar results -
View attachment 1749893
View attachment 1749895
That's about 19% better read, 20% better write, 18% better copy, 12% better latency. That's been the differential between green stick memory and memory with racing stripes since forever really. Which kind of leads on to my point.
Real world that increased bandwidth and reduced latency hasn't mattered much. It's like a couple percent here and there outside of synthetic benchmarks with some applications (especially hevc \ x264 encoding and rendering) not caring at all. I'd be interested to see if that changes and why but my gut tells me it won't because code isn't written with the expectation that that memory bandwidth is available.
Memory overclocking is interesting but it's not rewarding from a daily use point of view - just kinda neat to see "number go up"
Sure, but it is still dependent on the interconnect, and I am making a correction to my previous post. The memory clock driver is not a CCD, but a CKD. My brain isn't always working. This will be paired with the registered clock driver (RCD). As technology advances, more onboarding is done on modules, increasing latencies. For normal consumers it is not a big deal since we have small workloads compared to compute nodes; however, we still have interconnect between logic blocks. A CKD will reduce the compute task by improving the signal strength by re-timing (the signal) and distributing the clock sent by the processor, essentially managing the clock to eliminate noise. Fewer errors. Jitter mitigated. You could get much better timings without reducing the clock because the syncing is done on chip, allowing improved interconnectivity. Frequencies be damned.
I'm no engineer, but I have done some reading on the standards as published by JEDEC, and though more applicable to compute nodes, and as you pointed out, "some applications", quicker data access, and its acceleration, could positively change the relationship between the CPU and RAM. The question is, how positive?
Potentially, cache could also be more optimally handled. Process synchronisation will be improved. Overall reduced voltage. The CPU not having to send multiple clocks to the DIMM (or SO-DIMM) will also reduce the CPU's power consumption and reduce pin usage. The CPU package could be made more compact as a result.
Generally CAMM is the better RAM improvement, in the current, and CAMM could also have a CKD, however CAMM unlike SO-DIMM will reduce the traces between the module and the CPU. Again, higher speeds, reduced power consumption, errors mitigated.
PCI-e is also evolving. Improved retimers (and switches) using less power, reducing noise. There could be huge gains made in this decade. Moving towards ecosystem synchronous harmony.
Yeah, programming to leverage the above, optimally, will be needed. CAMM is already adopted, channel loss between the CPU and that endpoint will be reduced. How this will translate into gains, only reviews will tell, but I do think that CKD will show promise. Basically everything on the PCI-e roadmap is addressed by the CKD at a RAM level.