AMD bulldozer

Bollocks to your definition of a cpu and what constitutes a core.

So by your logic if a CPU does not have a FPU, MMU, Cache etc on the same die or share it between cores it's not a true cpu? Wonder why they called these things cpus through the last 30 years or so then.

I guess, if you really think about it, each module can actually execute two actual processes at the same time. There's no queuing, and none of that hyper-threading whereby you have a process queued, and when say the FPU opens up, that queued process can be processed, meaning you can kinda process more things at the same time, just as long as the core has some free resources...

Bulldozer allows two processes to actually run simutaneously. In theory, if you code it right, you can have one thread doing FPU calculations and just general maniputation, with the other thread doing all the other good stuff like piecing data together and sending it else where (this is the benefit of share cache... each processors can call for a value). Obviously, what's happening now is Windows just sees general cores, and will through any thread at them, which means that you have a skewed application of resources. So instead of getting two processes, one doing integer + floating point calc, the other just doing integer point calc, Windows could assign two threads to a module, each thread wanting to do floating point calc. When the latter happens, then one thread has to be queued, and this slows down the process.


A Bulldozer module is a dual core then, it can execute two threads simultaneously. Whether is shares an FPU unit between the two cores is irrelevant. It still runs two processes at the same time.

It's like saying you don't have two hands because you need both of them to left heavy objects, but for other tasks, you can use them simultaneously... doesn't make sense does it ;).
 
I guess, if you really think about it, each module can actually execute two actual processes at the same time. There's no queuing, and none of that hyper-threading whereby you have a process queued, and when say the FPU opens up, that queued process can be processed, meaning you can kinda process more things at the same time, just as long as the core has some free resources...

Bulldozer allows two processes to actually run simutaneously. In theory, if you code it right, you can have one thread doing FPU calculations and just general maniputation, with the other thread doing all the other good stuff like piecing data together and sending it else where (this is the benefit of share cache... each processors can call for a value). Obviously, what's happening now is Windows just sees general cores, and will through any thread at them, which means that you have a skewed application of resources. So instead of getting two processes, one doing integer + floating point calc, the other just doing integer point calc, Windows could assign two threads to a module, each thread wanting to do floating point calc. When the latter happens, then one thread has to be queued, and this slows down the process.


A Bulldozer module is a dual core then, it can execute two threads simultaneously. Whether is shares an FPU unit between the two cores is irrelevant. It still runs two processes at the same time.

It's like saying you don't have two hands because you need both of them to left heavy objects, but for other tasks, you can use them simultaneously... doesn't make sense does it ;).

You're preaching to the converted :D

There are actually companies out there that manufacture dual core cpus without all the extras like FPUs etc but they are mostly used in embedded markets. By that other dudes definition those would not be dual core cpus if I follow his logic.
 
You're preaching to the converted :D

There are actually companies out there that manufacture dual core cpus without all the extras like FPUs etc but they are mostly used in embedded markets. By that other dudes definition those would not be dual core cpus if I follow his logic.

I was trying to add to your argument :p.

Anyway, in my studies this year, I was taught how to program PIC controllers, to do cool stuff :p. However, while you might think the micky-mouse microprocessors don't really compare, they are actually very similar to the big boy processors.

Ultimately, the biggest thing you want a processor/processor core to have and that actually defines what the processors is able to do, is and ALU.

ALU (an arithmetic logic unit) - This is what does addition/subtraction/multiplication/division. This, for all intensive purposes can just do addition, and will still be an ALU.

There's loads of other stuff that your processor can do and should have, but fundamentally speaking, you really just need this (ALU), some memory and a control over few inputs/outputs, and then you have actually got a 'processor'... While the applications of it would be limited, it can still process mathematical functions - hence processor.

Things like FPU's are called coprocessors. They only handle the operation of floating point numbers (you need a special manipulation of these because of their format), however, the FPU only handles operations, it doesn't actually control inputs and outputs. It won't and cant do anything without the main core telling it what to do, so in a sense, the FPU is an extension, not a processor itself.

So having two processors that share an FPU, still makes it two processors ;). lol
 
I was trying to add to your argument :p.

Anyway, in my studies this year, I was taught how to program PIC controllers, to do cool stuff :p. However, while you might think the micky-mouse microprocessors don't really compare, they are actually very similar to the big boy processors.

Ultimately, the biggest thing you want a processor/processor core to have and that actually defines what the processors is able to do, is and ALU.

ALU (an arithmetic logic unit) - This is what does addition/subtraction/multiplication/division. This, for all intensive purposes can just do addition, and will still be an ALU.

There's loads of other stuff that your processor can do and should have, but fundamentally speaking, you really just need this (ALU), some memory and a control over few inputs/outputs, and then you have actually got a 'processor'... While the applications of it would be limited, it can still process mathematical functions - hence processor.

Things like FPU's are called coprocessors. They only handle the operation of floating point numbers (you need a special manipulation of these because of their format), however, the FPU only handles operations, it doesn't actually control inputs and outputs. It won't and cant do anything without the main core telling it what to do, so in a sense, the FPU is an extension, not a processor itself.

So having two processors that share an FPU, still makes it two processors ;). lol

Pic is really cool, also used them years ago and also made some money out of them doing PS modding :D

I'm from a time of 6502, Z80 etc where I did some machine code and assembly language (which I love) so I fully comprehend what you are on about.
 
Pic is really cool, also used them years ago and also made some money out of them doing PS modding :D

I'm from a time of 6502, Z80 etc where I did some machine code and assembly language (which I love) so I fully comprehend what you are on about.

Wicked :D, ha ha. I really love programming in Assembly, C++ get's a little bit... abstract and iffy when you using lot of pointers... Where as in Assembly, it's all pointers :D.

Anyway, there's no real way to claim that the Bulldozer isn't 8 cores, it might not be 8 'normal' cores (as per what we have grown used to in the x86 processor sector), it is still 8 individual cores.
 
You're preaching to the converted :D

There are actually companies out there that manufacture dual core cpus without all the extras like FPUs etc but they are mostly used in embedded markets. By that other dudes definition those would not be dual core cpus if I follow his logic.

then go to wikipedia and tell them that cause that's where the definition comes from. They use that modules for their answer to hypertreading but hypertreading doesn't turn your 4 core cpu into a 8 core now does it.
 
then go to wikipedia and tell them that cause that's where the definition comes from. They use that modules for their answer to hypertreading but hypertreading doesn't turn your 4 core cpu into a 8 core now does it.

Hmmm, it's by no means an answer to hyperthreading. It's a way to increase the threading of a processor, but actually sticking in another processor. Hyperthreading allows for another process to be queued, on a processor, when some of that processors resources free up, it can be processed. I'll do an analogy:

Say you are at a cafe, and they sell you a list of items.

So now you can get the following:
Hot dog, chips, coffee, tea.

So now this cafe only has one till. A normal processor (in-order) will to the follow:

Jake wants all four items, so he just to each of them acquiring each of the items he wants, and then goes to the till and pays.

Joe only wants a hot-dog and a coffee, so he goes to the hot-dog stand collects the item, then goes to chips stand and does nothing, then goes to the coffee stand collects coffee, then goes to the tea stand and gets nothing, then finally goes and pays...

In cases of the former, it's fine, but in the latter, we see it's not really efficient.

So then you have out-of order, which means that in Joe's case:
He goes to the Hot-dog stand, gets the hot-dog, then get goes straight to the coffee stand, gets coffee, then goes to pay. Which effectively cuts out two stands he has to go to.

Remember, only one person can queue at a time here though. So this means that When jake is in the process of collecting, Joe must wait for Jake to leave before he can enter the Queue.

Hyper-threading works by allowing the left over resources to allow another process to take place.

So now there's another person behind Joe, John, who only wants chips and tea.

So while Joe moves to the Hot-Dog stand, John moves to the chips stand, and they collect their respective items simultaneously, which gives the illusion that there's more stands available than there really are (hence you see two threads and one one...), and when it comes to paying, Joe was in the queue first, so he pays first and leaves before John...

The limitations though, is that what if you get two Jakes, then you still get the same queuing you got before, so you don't save anytime at all.


Bulldozer basically approaches the problem in saying:
Ok, not many people want tea, but we have to sell it. So why not have two hot-dog stands, two chips stands, and two coffee stands, but only one tea stand.

Thus, you are accepting two people at the exact same time, that can get the same items, with the exception of tea, where there might be queuing happening there, but because it isn't frequent to have a person wanting tea, the chances of two people wanting tea is even slimmer, thus it's rare to get queuing happening there!

In effect, there's two processors, but they just share a 'unit' that isn't actually used frequently. So instead of paying another attended, you just share. This does mean it might take twice as long in the worst case, but it is infrequent for the worst case...


Why AMD's processors aren't performing though, is that they hired slow attendants that don't operate as fast at Intel's attendants.

I hope this makes sense to you, it makes sense in my head :p ha ha ha.
 
then go to wikipedia and tell them that cause that's where the definition comes from. They use that modules for their answer to hypertreading but hypertreading doesn't turn your 4 core cpu into a 8 core now does it.

I repeat
http://en.wikipedia.org/wiki/Multi-core_processor
A multi-core processor implements multiprocessing in a single physical package. Designers may couple cores in a multi-core device tightly or loosely. For example, cores may or may not share caches, and they may implement message passing or shared memory inter-core communication methods. Common network topologies to interconnect cores include bus, ring, two-dimensional mesh, and crossbar. Homogeneous multi-core systems include only identical cores, heterogeneous multi-core systems have cores which are not identical. Just as with single-processor systems, cores in multi-core systems may implement architectures such as superscalar, VLIW, vector processing, SIMD, or multithreading.

You are staring so hard at that 'module' that you can't see the processors/cores

I'm done with you.
 
I repeat
http://en.wikipedia.org/wiki/Multi-core_processor


You are staring so hard at that 'module' that you can't see the processors/cores

I'm done with you.

A dual-core Bulldozer processor has a single module, a quad-core processor has two modules and an octo-core processor has four modules.
http://www.wikipedia.org/wiki/Bulldozer_(microarchitecture)

Read this as well
http://www.hardwaresecrets.com/article/Inside-the-AMD-Bulldozer-Architecture/1078/3
 
each module consists of two cores; do you need an infographic?

no lol
If they marketed it as 4 modules/8 threads and not 4 modules 8 cores people will begin to compare Bulldozer with any other CPUs by using their usual "apples-to-apples" (i.e. "cores-to-cores" or, more precisely, "threads-to-threads") method.

Do you know what a ALU is? That integer cores equals ALUs you get on a gpu.

That so called cores needs a multithreaded application otherwise its not going to do anything. Its the same as Intels hypertreading concept only they splitted a core in 2 1 to handle each thread. Like hypertreading its useless if the program can't use it.

On a multiprocessor or multi-core system, the threads or tasks will actually run at the same time, with each processor or core running a particular thread or task.
http://www.wikipedia.org/wiki/Thread_(computer_science)

But that so called cores can handle only one thread each.

That's why Bulldozers performance is so poor in the benchmarks. People are comparing Cores to cores instead of Comparing it thread to thread. If a program is not multithreaded that cores performs poorly. That's saying its basically one core devided in 2 one for each thread. If you can call that 8 cores then you can call the 2600k with hypertreading also a 8 core cpu because with multithreaded apps hypertreading excels.
 
Last edited:
I'm just gonna throw in a bunch of things here...

Here's how I understand it:

The main thing that is missed here, is that you are talking about a Zambezi chipset, not a Bulldozer chipset. Bulldozer refers to the modules that consists of 2 cores per module, each core with its own L1 cache, and 2mb of both L2 and L3 cache for each Bulldozer module. With the total 8mb L3 available to all Bulldozer modules.

Guru3D

Each Zambezi processor will have Bulldozer units, or segmented modules. I can explain this very simply, one Bulldozer module is two logical AMD64 CPU cores tied together. With four Bulldozer modules you thus get eight logical CPU cores. Have a peek at the die photo below where you will understand it more easily after observing.
Zambezi.jpg

Along with that they have the following in terms of cache:

Each CPU core has 128 KB of Level 1 Cache, 16 KB/Core, a 64-byte cacheline, 4-way associative, write-through.
Then there is 8 MB of Level 2 Cache, 2 MB per ”Bulldozer” module, a 64-byte cacheline, 16-way associative.

Then there is a third cache.
AMD has designed a shared 8 MB L3 cache with 64 way associatively for both cores in a “Bulldozer” module. Each Bulldozer unit (each two cores) have 2 MB L3 cache.
Each “module” can access the entirety of the 8MB L3, although cache partitioning happens on a core & thread generation basis so it’s virtually impossible for one “core” to get “allocated” the full 8MB of L3.

Intel works as follows:

4 cores, each with their own L1 and L2 cache and 8mb shared L3 for all 4 cores.

The Sandy Bridge cache memory consists of:
a 32KB L1 Data cache, 32KB Instruction cache (= 64KB L1)
a 256KB L2 cache per core.
Then there's a nice L3 cache that is shared in-between the CPU cores which is 8MB in total for the Core i7 2600 processors and 6MB for the Core i5 2500.
The L3 cache is where the magic happens, surrounding the segments inside the die, the L3 cache sits in the physical form of a ringbus. Thus the L3 cache can be used by the processor cores and also the graphics core.

All I'm saying is that you should not be comparing Intel's design with AMD's. It's completely different...

The fact that they have L1 Cache per core, (or 8x L1 caches already let's me know there are 8 cores) and then all they did was share the L2 and L3 Cache between 2 cores, or 1 Bulldozer module, but making the full 8mb L3 available to the Bulldozer modules.

Intel has 4x L1 caches for their 4 Cores, but each core has its own L2 cache and a shared L3 cache for the cores and onchip graphics.

AMD has a Zambezi chipset, with Bulldozer 'modules' on it. The design was done in a way so that they

look for a way to maximize peak bandwidth across the different cores, and maximize the use of silicon area through the use of shared modules.

Functions with high utilization (such things as Integer pipelines, Level1 data caches) are dedicated in each core.

The other units are now effectively shared between two cores and include: Fetch, Decode, Floating point pipelines, and the Level2 cache This design allows two Cores to each use a larger, higher-performance function unit (ex: floating point unit) as they need it with less total die area than having separate, smaller function units for each Core.

Look, all I'm saying is that they have completely different designs that they have used to achieve what they wanted to achieve.

I agree with everyone here that they are not performing as well as they should, but I know there is a Linux fix for the use of the modules vs cores where the Zambezi chipsets performs quite well.

Only linky I could find now... Not the best, but it's early in the morning...


In the end, the problem lies with the OS not being able to properly manage the way that the Zambezi handles it's threads and applications with the Bulldozer modules containing 2 cores (Not to be mean, but that's where you are stuck as well Shovenose).

Is that wrong of AMD to release it now? I don't know, all I know is that Zambezi might not be the best, but it all points to Win 8 being released in August with Piledriver supposedly around the same time, then I will have a look at the results.

In any case, this is now a too long a thing and I don't even know how much sense this post makes, but I'm posting it now..
 
In the end, the problem lies with the OS not being able to properly manage the way that the Zambezi handles it's threads and applications with the Bulldozer modules containing 2 cores (Not to be mean, but that's where you are stuck as well Shovenose).
Bit lame imho to blame an OS, but if that's true we'd see the results when the fix is out...
 
Bit lame imho to blame an OS, but if that's true we'd see the results when the fix is out...

Agree 100% with you there.. Don't think it is right blaming the OS, but that is what the results point to (reading reviewers) and if that is the case and Win 8 gets a fix for it or Win 7 gets one earlier, then it will be fun.

I'm a bit undecided regarding this, because I'm not against bringing out Tech that needs something else to change (or a little ahead of its time) as long as the change will bring good results in the end.
 
Agree 100% with you there.. Don't think it is right blaming the OS, but that is what the results point to (reading reviewers) and if that is the case and Win 8 gets a fix for it or Win 7 gets one earlier, then it will be fun.

I'm a bit undecided regarding this, because I'm not against bringing out Tech that needs something else to change (or a little ahead of its time) as long as the change will bring good results in the end.
I'm curious (not having looked into it) does linux also "make it appear slow" ?
 
I don't follow the whole Bulldozer setup as closely, but from what I read it was below par, also regarding the threading between cores and modules, but there was a fix for it and now it seems to be doing quite well...
 
Top
Sign up to the MyBroadband newsletter
X