Intel launches new 28-core Xeon CPU

PhireSide

Executive Member
Joined
Dec 31, 2006
Messages
8,685
When overclocked this bad boy (along with the necessary motherboard and cooling solution) pulls more watts than a small microwave.

Damn.
 

Messugga

Honorary Master
Joined
Sep 4, 2007
Messages
10,409
Weird core count. Does anyone have any insight as to why they went for 24? 3x 8 core CPUs glued together?
 

Wasabee!

Expert Member
Joined
Apr 5, 2012
Messages
4,270
  • Ryzen Threadripper 2920X: 12-cores, 24-threads, clocked at 3.5GHz to 4.3GHz
  • Ryzen Threadripper 2950X: 16-cores, 32-threads, clocked at 3.5GHz to 4.4GHz
  • Ryzen Threadripper 2970WX: 24-cores, 48-threads, clocked at 3.0GHz to 4.2GHz
  • Ryzen Threadripper 2990WX: 32-cores, 64-threads, clocked at 3.0GHz to 4.2GHz
Or wait a few months and get a Zen 2 Threadripper.
 

Genisys

Executive Member
Joined
Jan 12, 2016
Messages
9,476
  • Ryzen Threadripper 2920X: 12-cores, 24-threads, clocked at 3.5GHz to 4.3GHz
  • Ryzen Threadripper 2950X: 16-cores, 32-threads, clocked at 3.5GHz to 4.4GHz
  • Ryzen Threadripper 2970WX: 24-cores, 48-threads, clocked at 3.0GHz to 4.2GHz
  • Ryzen Threadripper 2990WX: 32-cores, 64-threads, clocked at 3.0GHz to 4.2GHz
Or wait a few months and get a Zen 2 Threadripper.
Or just buy the best product for your use case and don't care about the never ending AMD vs Intel fight. After all, if you are spending more than R40 000 on a CPU odds are you are not going to care about the AMD vs Intel crap, an considering its a Xeon you are probably not going to only be buying one.
 

Wasabee!

Expert Member
Joined
Apr 5, 2012
Messages
4,270
Or just buy the best product for your use case and don't care about the never ending AMD vs Intel fight. After all, if you are spending more than R40 000 on a CPU odds are you are not going to care about the AMD vs Intel crap, an considering its a Xeon you are probably not going to only be buying one.
You need to define 'best product' as it could mean most expensive with best performance. Value, power consumption, heat output, security, performance etc are all factors in 'best product'. Since I'm sure most applications of this CPU will be commercial, I think most buyers will actually look very carefully at all these factors and ensure they've made the right decision.

Saying that you 'just' buy the 'best product' is like saying you just pop in the corner shop and get some milk. This CPU will in most applications be an asset part of some infrastructure or system working to recover the capex cost, most buyers will think very carefully before buying this - especially with AMD being so aggressive in this market space.

Just read up what first gen Threadripper/Zen did to this market, Intel changed their whole strategy because of AMD since then.
 

cguy

Expert Member
Joined
Jan 2, 2013
Messages
4,878
I'm a muppet. Meant 28. Which is also a weird core count.
Heh. To answer your question, it isn't split into multiple chips. There is some symmetry on the chip that will usually make the core count divisible by 2 and usually 4. There are other units on the chip such as the PCIE root complex, snooping hardware (home agent), UPI interfaces, memory controllers, etc. that are laid out with the core units (which usually also include a dedicated L1/L2 and slice of L3). Because they are laid out together, and these other units are of different sizes, the core count can be pretty arbitrary.

AMD has taken to splitting into multi core modules. The NUMA characteristics of this is a nightmare, but it does allow some production efficiencies.
 
Last edited:

Messugga

Honorary Master
Joined
Sep 4, 2007
Messages
10,409
Heh. To answer your question, it isn't split into multiple chips. There is some symmetry on the chip that will usually make the core count divisible by 2 and usually 4. There are other units on the chip such as the PCIE root complex, snooping hardware (home agent), UPI interfaces, memory controllers, etc. that are laid out with the core units (which usually also include a dedicated L1/L2 and slice of L3). Because they are laid out together, and these other units are of different sizes, the core count can be pretty arbitrary.
Noted.
I would've thought that it would be simpler to produce CPUs with core counts being an exponentiation of two. Just from a simplification of actually working with the things, point of view.
AMD Naples CPUs are pretty much four desktop CPUs interconnected on a die-to-die level, if memory serves. In such a scenario, I'd expect that one would want your dies to be close in terms of performance so as not to have too big a disjoint between workloads being returned by the cores in the CPU.

Anyway, I'll go educate myself on the architectures in order to understand your statements better.
 

cguy

Expert Member
Joined
Jan 2, 2013
Messages
4,878
Noted.
I would've thought that it would be simpler to produce CPUs with core counts being an exponentiation of two. Just from a simplification of actually working with the things, point of view.
AMD Naples CPUs are pretty much four desktop CPUs interconnected on a die-to-die level, if memory serves. In such a scenario, I'd expect that one would want your dies to be close in terms of performance so as not to have too big a disjoint between workloads being returned by the cores in the CPU.

Anyway, I'll go educate myself on the architectures in order to understand your statements better.
BTW, I edited in a note about the AMD’s after I posted.

The AMD MCM will typically be multiple interconnected chips with identical performance characteristics. They have variable clocking, but can hit the same peak unless overlooked.

My comment on NUMA referred to how data is shared among the caches and memory (fetch from local L3, remote L3, memory connected to local die, memory connected to remote die, local PCIE, remote PCIE, etc), this creates a lot more variability in both latencies and bandwidths, so it’s harder to optimize performance.
 

Messugga

Honorary Master
Joined
Sep 4, 2007
Messages
10,409
BTW, I edited in a note about the AMD’s after I posted.

The AMD MCM will typically be multiple interconnected chips with identical performance characteristics. They have variable clocking, but can hit the same peak unless overlooked.

My comment on NUMA referred to how data is shared among the caches and memory (fetch from local L3, remote L3, memory connected to local die, memory connected to remote die, local PCIE, remote PCIE, etc), this creates a lot more variability in both latencies and bandwidths, so it’s harder to optimize performance.
I was actually looking at HPC architecture (SISD, MISD, SIMD, MIMD, hybrid versions, NUMA, etc.) for my studies this week, so this topic feels quite relevant for me at present. :)
 

cguy

Expert Member
Joined
Jan 2, 2013
Messages
4,878
I was actually looking at HPC architecture (SISD, MISD, SIMD, MIMD, hybrid versions, NUMA, etc.) for my studies this week, so this topic feels quite relevant for me at present. :)
Great. It’s fascinating writing code for one architecture vs the other. Even the type of algorithm that may be optimal may change. Check out the Bitonic Sort.
 

Messugga

Honorary Master
Joined
Sep 4, 2007
Messages
10,409
Great. It’s fascinating writing code for one architecture vs the other. Even the type of algorithm that may be optimal may change. Check out the Bitonic Sort.
Will do. I have a set of CFD-based exercises which I have to do, that demonstrates some of the constraints one might run into based on which architecture you're using, depending on which parameters you pass the code at run-time. As I'm sure you know, ARCHER is a hybrid system with what amounts to all of the above-mentioned architectures implemented in one way, shape or form with various bottlenecks depending on whether you're on-cpu, on-node, between nodes, between racks, etc.
 

cguy

Expert Member
Joined
Jan 2, 2013
Messages
4,878
Will do. I have a set of CFD-based exercises which I have to do, that demonstrates some of the constraints one might run into based on which architecture you're using, depending on which parameters you pass the code at run-time. As I'm sure you know, ARCHER is a hybrid system with what amounts to all of the above-mentioned architectures implemented in one way, shape or form with various bottlenecks depending on whether you're on-cpu, on-node, between nodes, between racks, etc.
Sounds like a good set of exercises. When implementing such systems I always add measurements of how long individual operations take as well as how long one resource is waiting on a other. Sifting through it afterwards often yields surprising results.
 

Messugga

Honorary Master
Joined
Sep 4, 2007
Messages
10,409
Sounds like a good set of exercises. When implementing such systems I always add measurements of how long individual operations take as well as how long one resource is waiting on a other. Sifting through it afterwards often yields surprising results.
That's one of the outputs. I've been doing a lot of charts, logging for times, etc. etc. The entire module is geared towards performance optimizing of software for HPC and understanding why what works in which scenarios. It's been pretty trivial thus far, but with the CFD work, it's ramping up in complexity and difficulty. I have a paper to write over the next couple of weeks, summarizing all of this stuff.
This is the sort of work that I feel makes it worth while to spend the extra money on a foreign education. I don't believe there are many, if any, such opportunities in SA universities, at present.
 

cguy

Expert Member
Joined
Jan 2, 2013
Messages
4,878
That's one of the outputs. I've been doing a lot of charts, logging for times, etc. etc. The entire module is geared towards performance optimizing of software for HPC and understanding why what works in which scenarios. It's been pretty trivial thus far, but with the CFD work, it's ramping up in complexity and difficulty. I have a paper to write over the next couple of weeks, summarizing all of this stuff.
This is the sort of work that I feel makes it worth while to spend the extra money on a foreign education. I don't believe there are many, if any, such opportunities in SA universities, at present.
Yup, I’ve never heard of anything close in SA. Some GPU courses, but even those were somewhat basic.

Doing real HPC is highly in demand. Think of how beneficial it is if you can get results 10x faster or use 10x less resources on a $50m supercomputer.
 
Top