Forums - Open Redstone Engineers
Architecture Comparison,questions. .. - Printable Version

+- Forums - Open Redstone Engineers (https://forum.openredstone.org)
+-- Forum: ORE General (https://forum.openredstone.org/forum-39.html)
+--- Forum: Tutorials (https://forum.openredstone.org/forum-24.html)
+---- Forum: Advanced Tutorials (https://forum.openredstone.org/forum-26.html)
+----- Forum: Concepts (https://forum.openredstone.org/forum-28.html)
+----- Thread: Architecture Comparison,questions. .. (/thread-5096.html)

Pages: 1 2 3 4


RE: Architecture Comparison,questions. .. - TSO - 11-21-2014

No, x86 has a lot of other shit that ARM does not have, and it becomes extremely difficult to strip that down to a smaller number of pages. I would guess you could take that down to maybe 900 pages if you paint using the high gloss. Most of what that 3439 page manual is about are the memory systems, a brief overview of coprocessors, and all the hex instructions and the limitations of the operations (fun fact, Jcc can only take a value from cl.), but the biggest difference is that x86 has a variable length instruction set, while ARM is always a 32 bit (IIRC) instruction set. Also, every x86 instruction has a varying execution time for each instruction because some idiot thinks microcoding is more speed efficient than a hardware solution.

The x86 book does not cover out of order execution, either. That's what the 1200 page x86 general optimization manual is for, and then you will also need the processor family's optimization manual (400ish pages), and finally the multithreading development manual (oddly enough, just under 200 pages) if you want to take advantage of hyperthreading or just splitting tasks efficiently. Onboard hardware also gets it's own book even though it is integral to the system. I haven't read into it too much yet, but I think the manuals actually cover things like exactly how the Out of Order Execution is performed and stuff.


RE: Architecture Comparison,questions. .. - LordDecapo - 11-22-2014

TSO... microcode is strictly to port old x86 programs to newer nicertain architecture right? We have found fast ways to do the most common case, so the rarity suffers from microcode here and there, but it's a worthy compromise given the massive speed boosts of modern systems compared to when the orIginla x86 ISA was designed.
And comparing x86 to ARM is simply unfair, x86 is based in the lalate 80's early 90's, with a tad bit of new sprinkled on, ARM is made for better optimization of today's applications, mostly mobile applications.
ARMs full manual would be about the same size, I can get them if u like, ARM has a lot of nuances that u would need the full write up on (I'd bet it's 700-1300 pages, depending on the writing style) in order to properly be able to say make a raw binary code compiler.


And everyone, In this write up I would give simplified versions and examples of the logic and algorithms being used, as well as architectural layouts of system components and give short and sweet descriptions of each of these components and algorithms.
I will compare and contcontrast the pros and cons, and also anything biased of mine will be in italics so u know it's stuff that I have decided I like best from looking it up and/or attempting to make the system in MC and/or Logisim..

I plan to learn HDL soon after I get this done and my current version of my ISA,, then I can make it in there and it be something IRL applicable, so I can check the statements I make to be true or false.


RE: Architecture Comparison,questions. .. - Magazorb - 11-22-2014

x86 isn't one of the more complicated ones just so you're informed XD, memory systems can be striped down to 100 pages if you know how to word them right (tricky but doable) but yes x86 is definitely more well documented, I'll probably start laying some concrete down for my ISA sometime over the next month Smile

But yes a explanation for people to get the basics of different arch stuff would be useful Big Grin


RE: Architecture Comparison,questions. .. - TSO - 11-22-2014

@LD:
I don't need the intel manuals, because I'm looking at them right now, as I have been for weeks. I'm going to say again, they all total to at least 6000 pages, and the x86 manual is almost 4000 of those.

Also, x86 processors all use microcoding for nearly all operations. It is also used to manage things like security permissions and operating system interface.

Intel®64 and IA-32 Architectures Optimization Reference Manual Wrote:2.2.1 Intel® Microarchitecture Code Name Sandy Bridge Pipeline Overview


Figure 2-4 depicts the pipeline and major components of a processor core that’s based on Intel microarchitecture code name Sandy Bridge. The pipeline consists of

• An in-order issue front end that fetches instructions and decodes them into micro-ops (micro-operations). The front end feeds the next pipeline stages with a continuous stream of micro-ops from the most likely path that the program will execute.

• An out-of-order, superscalar execution engine that dispatches up to six micro-ops to execution, per cycle. The allocate/rename block reorders micro-ops to "dataflow" order so they can execute as soon as their sources are ready and execution resources are available.

• An in-order retirement unit that ensures that the results of execution of the micro-ops, including any exceptions they may have encountered, are visible according to the original program order.

The flow of an instruction in the pipeline can be summarized in the following progression:

1. The Branch Prediction Unit chooses the next block of code to execute from the program. The processor searches for the code in the following resources, in this order:

a. Decoded ICache

b. Instruction Cache, via activating the legacy decode pipeline

c. L2 cache, last level cache (LLC) and memory, as necessary

[Image: 3703342.png]
2. The micro-ops corresponding to this code are sent to the Rename/retirement block. They enter into the scheduler in program order, but execute and are de-allocated from the scheduler according to data-flow order. For simultaneously ready micro-ops, FIFO ordering is nearly always maintained.

Micro-op execution is executed using execution resources arranged in three stacks. The execution units in each stack are associated with the data type of the instruction.

Branch mispredictions are signaled at branch execution. It re-steers the front end which delivers micro-ops from the correct path. The processor can overlap work preceding the branch misprediction with work from the following corrected path.

3. Memory operations are managed and reordered to achieve parallelism and maximum performance. Misses to the L1 data cache go to the L2 cache. The data cache is non-blocking and can handle multiple simultaneous misses.

4. Exceptions (Faults, Traps) are signaled at retirement (or attempted retirement) of the faulting instruction.

Each processor core based on Intel microarchitecture code name Sandy Bridge can support two logical processor if Intel HyperThreading Technology is enabled.

2.1 THE HASWELL MICROARCHITECTURE


The Haswell microarchitecture builds on the successes of the Sandy Bridge and Ivy Bridge microarchitectures. The basic pipeline functionality of the Haswell microarchitecture is depicted in Figure 2-1. In general, Most of the features described in Section 2.1.1 - Section 2.1.4 also apply to the Broadwell microarchitecture. Enhancements of the Broadwell microarchitecture is summarized in Section 2.1.6.

The Haswell microarchitecture offers the following innovative features:
[Image: 9006977.png]

• Support for Intel® Advanced Vector Extensions 2 (AVX2), FMA
• Support for general-purpose, new instructions to accelerate integer numeric, encryption.
• Support for Intel Transactional Synchronization Extensions (TSX)
• Each core can dispatch up to 8 micro-ops per cycle
• 256-bit data path for memory operation, FMA, AVX floating-point and AVX2 integer execution units
• Improved L1D and L2 cache bandwidth
• Two FMA execution pipelines
• Four arithmetic logical units (ALUs)
• Three store address ports
• Two branch execution units
• Advanced power management features for IA processor core and uncore sub-systems
• Support for optional fourth level cache

The microarchitecture supports flexible integration of multiple processor cores with a shared uncore subsystem consisting of a number of components including a ring interconnect to multiple slices of L3 (an off-die L4 is optional), processor graphics, integrated memory controller, interconnect fabrics, etc. An example of the system integration view of four CPU cores with uncore components is illustrated in Figure 2-2.

2.1.1 The Front End

The front end of Intel microarchitecture code name Haswell builds on that of Intel microarchitecture code name Sandy Bridge and Intel microarchitecture code name Ivy Bridge, see Section 2.2.2 and Section 2.2.7. Additional enhancement in the front end include:

• The uop cache (or decoded ICache) is partitioned equally between two logical processors,
• The instruction decoders will alternate between each active logical processor. If one sibling logical processor is idle, the active logical processor will use the decoders continuously.
• The LSD/micro-op queue can detect small loops up to 56 micro-ops. The 56-entry micro-op queue is shared by two logical processors if Hyper-Threading Technology is active (Intel microarchitecture Sandy Bridge provides duplicated 28-entry micro-op queue in each core).

2.1.2 The Out-of-Order Engine

The key components and significant improvements to the out-of-order engine are summarized below:

Renamer: The Renamer moves micro-ops from the micro-op queue to bind to the dispatch ports in the Scheduler with execution resources. Zero-idiom, one-idiom and zero-latency register move operations are performed by the Renamer to free up the Scheduler and execution core for improved performance.

Scheduler: The Scheduler controls the dispatch of micro-ops onto the dispatch ports. There are eight dispatch ports to support the out-of-order execution core. Four of the eight ports provided execution resources for computational operations. The other 4 ports support memory operations of up to two 256-bit load and one 256-bit store operation in a cycle.

Execution Core: The scheduler can dispatch up to eight micro-ops every cycle, one on each port. Of the four ports providing computational resources, each provides an ALU, two of these execution pipes provided dedicated FMA units. With the exception of division/square-root, STTNI/AESNI units, most floating-point and integer SIMD execution units are 256-bit wide. The four dispatch ports servicing memory operations consist with two dual-use ports for load and store-address operation. Plus a dedicated 3rd store-address port and one dedicated store-data port. All memory ports can handle 256-bit memory micro-ops. Peak floating-point throughput, at 32 single-precision operations per cycle and 16 double-precision operations per cycle using FMA, is twice that of Intel microarchitecture code name Sandy Bridge.

The out-of-order engine can handle 192 uops in flight compared to 168 in Intel icroarchitecture code name Sandy Bridge.

2.1.3 Execution Engine
The following table summarizes which operations can be dispatched on which port.
The reservation station (RS) is expanded to 60 entries deep (compared to 54 entries in Intel microarchitecture code name Sandy Bridge). It can dispatch up to eight micro-ops in one cycle if the micro-ops are ready to execute. The RS dispatch a micro-op through an issue port to a specific execution cluster, arranged in several stacks to handle specific data types or granularity of data.

When a source of a micro-op executed in one stack comes from a micro-op executed in another stack, a delay can occur. The delay occurs also for transitions between Intel SSE integer and Intel SSE floating-point operations. In some of the cases the data transition is done using a micro-op that is added to the instruction flow. Table 2-23 describes how data, written back after execution, can bypass to micro-op execution in the following cycles.

That comparison is not unfair. At some point in the late 70's, programmers began to realize that all this microcoding was equivalent to a runtime compiler interfacing to a much simpler processor than what the microcoding modeled. RISC was born from the idea that there is a loss in processor power when it is forced to spend some of it's time going through microcode and figuring out what the hell you just asked it to do and then finally decoding into what was often an entirely different instruction set to finish. The RISC model is to place all of the operations' coding on the programmer and then have only that basic core processor that took the micro-ops that the CISC micro-code gave. In essence, you wrote the compiled output of the microcode yourself instead of having the CPU do it for you. (Also, I though ARM was a lot older, but it doesn't matter because I know for a fact that some of Sun Microsystems' RISC possessors are damn near as old as x86.)

@maga:
The memory system is about 100 (very important) pages and just explains the instruction fetch delay depending upon witch cache level you are accessing as well as data types, all the registers, the memory limitations based upon the execution mode, and much much more.


RE: Architecture Comparison,questions. .. - LordDecapo - 11-22-2014

Hey TSO,,, you do know that a straight x86 code didn't require as much microcode right?
Originally it had just some straight ops that were decoded just like normal old ops, the MAIN and original purpose to expand the microcode use was due to wanting to keep backward compatibility with older programs, in which there IS had much more complicated ops so they had to be microcoded to be able to run on an x86 sub system..
notice how a lot of new CPUs have all this stuff on them... then stuck in a corner of the cores there's an "x86 unit" or what ever each processor decides designer decided to call it.

And x86 did have a lengthy aND over complicated memory system. That requires much paper work... but it also rather shit compared to current days setups


RE: Architecture Comparison,questions. .. - Magazorb - 11-22-2014

(11-22-2014, 06:44 AM)TSO Wrote: @LD:
I don't need the intel manuals, because I'm looking at them right now, as I have been for weeks. I'm going to say again, they all total to at least 6000 pages, and the x86 manual is almost 4000 of those.

Also, x86 processors all use microcoding for nearly all operations. It is also used to manage things like security permissions and operating system interface.

Intel®64 and IA-32 Architectures Optimization Reference Manual Wrote:2.2.1 Intel® Microarchitecture Code Name Sandy Bridge Pipeline Overview


Figure 2-4 depicts the pipeline and major components of a processor core that’s based on Intel microarchitecture code name Sandy Bridge. The pipeline consists of

• An in-order issue front end that fetches instructions and decodes them into micro-ops (micro-operations). The front end feeds the next pipeline stages with a continuous stream of micro-ops from the most likely path that the program will execute.

• An out-of-order, superscalar execution engine that dispatches up to six micro-ops to execution, per cycle. The allocate/rename block reorders micro-ops to "dataflow" order so they can execute as soon as their sources are ready and execution resources are available.

• An in-order retirement unit that ensures that the results of execution of the micro-ops, including any exceptions they may have encountered, are visible according to the original program order.

The flow of an instruction in the pipeline can be summarized in the following progression:

1. The Branch Prediction Unit chooses the next block of code to execute from the program. The processor searches for the code in the following resources, in this order:

a. Decoded ICache

b. Instruction Cache, via activating the legacy decode pipeline

c. L2 cache, last level cache (LLC) and memory, as necessary

[Image: 3703342.png]
2. The micro-ops corresponding to this code are sent to the Rename/retirement block. They enter into the scheduler in program order, but execute and are de-allocated from the scheduler according to data-flow order. For simultaneously ready micro-ops, FIFO ordering is nearly always maintained.

Micro-op execution is executed using execution resources arranged in three stacks. The execution units in each stack are associated with the data type of the instruction.

Branch mispredictions are signaled at branch execution. It re-steers the front end which delivers micro-ops from the correct path. The processor can overlap work preceding the branch misprediction with work from the following corrected path.

3. Memory operations are managed and reordered to achieve parallelism and maximum performance. Misses to the L1 data cache go to the L2 cache. The data cache is non-blocking and can handle multiple simultaneous misses.

4. Exceptions (Faults, Traps) are signaled at retirement (or attempted retirement) of the faulting instruction.

Each processor core based on Intel microarchitecture code name Sandy Bridge can support two logical processor if Intel HyperThreading Technology is enabled.

2.1 THE HASWELL MICROARCHITECTURE


The Haswell microarchitecture builds on the successes of the Sandy Bridge and Ivy Bridge microarchitectures. The basic pipeline functionality of the Haswell microarchitecture is depicted in Figure 2-1. In general, Most of the features described in Section 2.1.1 - Section 2.1.4 also apply to the Broadwell microarchitecture. Enhancements of the Broadwell microarchitecture is summarized in Section 2.1.6.

The Haswell microarchitecture offers the following innovative features:
[Image: 9006977.png]

• Support for Intel® Advanced Vector Extensions 2 (AVX2), FMA
• Support for general-purpose, new instructions to accelerate integer numeric, encryption.
• Support for Intel Transactional Synchronization Extensions (TSX)
• Each core can dispatch up to 8 micro-ops per cycle
• 256-bit data path for memory operation, FMA, AVX floating-point and AVX2 integer execution units
• Improved L1D and L2 cache bandwidth
• Two FMA execution pipelines
• Four arithmetic logical units (ALUs)
• Three store address ports
• Two branch execution units
• Advanced power management features for IA processor core and uncore sub-systems
• Support for optional fourth level cache

The microarchitecture supports flexible integration of multiple processor cores with a shared uncore subsystem consisting of a number of components including a ring interconnect to multiple slices of L3 (an off-die L4 is optional), processor graphics, integrated memory controller, interconnect fabrics, etc. An example of the system integration view of four CPU cores with uncore components is illustrated in Figure 2-2.

2.1.1 The Front End

The front end of Intel microarchitecture code name Haswell builds on that of Intel microarchitecture code name Sandy Bridge and Intel microarchitecture code name Ivy Bridge, see Section 2.2.2 and Section 2.2.7. Additional enhancement in the front end include:

• The uop cache (or decoded ICache) is partitioned equally between two logical processors,
• The instruction decoders will alternate between each active logical processor. If one sibling logical processor is idle, the active logical processor will use the decoders continuously.
• The LSD/micro-op queue can detect small loops up to 56 micro-ops. The 56-entry micro-op queue is shared by two logical processors if Hyper-Threading Technology is active (Intel microarchitecture Sandy Bridge provides duplicated 28-entry micro-op queue in each core).

2.1.2 The Out-of-Order Engine

The key components and significant improvements to the out-of-order engine are summarized below:

Renamer: The Renamer moves micro-ops from the micro-op queue to bind to the dispatch ports in the Scheduler with execution resources. Zero-idiom, one-idiom and zero-latency register move operations are performed by the Renamer to free up the Scheduler and execution core for improved performance.

Scheduler: The Scheduler controls the dispatch of micro-ops onto the dispatch ports. There are eight dispatch ports to support the out-of-order execution core. Four of the eight ports provided execution resources for computational operations. The other 4 ports support memory operations of up to two 256-bit load and one 256-bit store operation in a cycle.

Execution Core: The scheduler can dispatch up to eight micro-ops every cycle, one on each port. Of the four ports providing computational resources, each provides an ALU, two of these execution pipes provided dedicated FMA units. With the exception of division/square-root, STTNI/AESNI units, most floating-point and integer SIMD execution units are 256-bit wide. The four dispatch ports servicing memory operations consist with two dual-use ports for load and store-address operation. Plus a dedicated 3rd store-address port and one dedicated store-data port. All memory ports can handle 256-bit memory micro-ops. Peak floating-point throughput, at 32 single-precision operations per cycle and 16 double-precision operations per cycle using FMA, is twice that of Intel microarchitecture code name Sandy Bridge.

The out-of-order engine can handle 192 uops in flight compared to 168 in Intel icroarchitecture code name Sandy Bridge.

2.1.3 Execution Engine
The following table summarizes which operations can be dispatched on which port.
The reservation station (RS) is expanded to 60 entries deep (compared to 54 entries in Intel microarchitecture code name Sandy Bridge). It can dispatch up to eight micro-ops in one cycle if the micro-ops are ready to execute. The RS dispatch a micro-op through an issue port to a specific execution cluster, arranged in several stacks to handle specific data types or granularity of data.

When a source of a micro-op executed in one stack comes from a micro-op executed in another stack, a delay can occur. The delay occurs also for transitions between Intel SSE integer and Intel SSE floating-point operations. In some of the cases the data transition is done using a micro-op that is added to the instruction flow. Table 2-23 describes how data, written back after execution, can bypass to micro-op execution in the following cycles.

That comparison is not unfair. At some point in the late 70's, programmers began to realize that all this microcoding was equivalent to a runtime compiler interfacing to a much simpler processor than what the microcoding modeled. RISC was born from the idea that there is a loss in processor power when it is forced to spend some of it's time going through microcode and figuring out what the hell you just asked it to do and then finally decoding into what was often an entirely different instruction set to finish. The RISC model is to place all of the operations' coding on the programmer and then have only that basic core processor that took the micro-ops that the CISC micro-code gave. In essence, you wrote the compiled output of the microcode yourself instead of having the CPU do it for you. (Also, I though ARM was a lot older, but it doesn't matter because I know for a fact that some of Sun Microsystems' RISC possessors are damn near as old as x86.)

@maga:
The memory system is about 100 (very important) pages and just explains the instruction fetch delay depending upon witch cache level you are accessing as well as data types, all the registers, the memory limitations based upon the execution mode, and much much more.

Yorn, man your just a suborn fan-boy of the x86, fact most IPs use 90% of the things x86 use and then add a ton of features more, most documentation for architectures use references to x86 that would be several hundred pages long for all the nitty gritty and then explain the difference, saving them many pages, goes back to the old programming saying "Don't repeat your self" stop following your father blindly, he says things about modern programmer and how they "have the speed to not have to program efficiently" THAT IS WRONG!, coders don't program, don't ever confuse a coder and a programer, a coder is a retard and a programmer is a engineer, the fact you concistantly blindly against us as if you're god is becoming annoying, you're statments majority of times are incorrect and you treat them like law.

Do your research please, x86 isn't a big architecture, hence why other architectures incorpate it, their are some architectures that are so big that adding x86 to it would be a pain, guess what ARM is one of those, they arent a architecture there a IP, a company with many architectures and some of them really are good and much more advanced then x86, Haswell is so heavily expanded on x86 to give nothing but serial computations (infact back in earily 2000's intel made a 7GHz processor prototype as a attempty to get even more serial computation out of their cores, they failed measable with requiring so much power and gained so little performance) that they just contiune adding as much as they can to get as many instructions about what's going to happen and start speculating, this wasn't defined in x86 orginaly, but i'll contiune saying that x86 is as what intell wants to call it (although they did a dirty move to call it theirs ages ago when it was some other guys and then got law suits made so that AMD couldn't use it but that resulted as having all the different names for what's still x86 but with extensions, yes intel had done some bad stuff but they seem mostly nice these days)

If you really insist for me to give you a nice example of what destroys x86? CBEA is one (this actual dominated the computational market in the 2000's, no x86 machine could keep up, even today the aging arch with no update still can outperform modern x86 in computation), PCP is another, Mill is also a 3rd.

now to go back to microcode, did you know most code is focused on intels "x86" so they run signicantly better on intel hardware then AMD? no probably not, and code that's focused on AMD tends to run not so well on intels chips, this is a fact and people do still do this, just because your too ignorant to open your eyes to realise it's still being done (infact more now then ever before and this will only ever continue to expand with many alternative archs getting funding to become competitive, heck AMD has released statements saying they will support other archs as well as x86 and have proccessors that will just happen to run code of the other)

you're arguments come from where? where's your evidence, where's your references? like you just make points over and over that we can't find anything to support them, we do our research a lot, as of late lord more so then me, but we tend to have a idea what we're talking about and then you just seem to arguagainst what we say pointlessly, then you seem to think your right, then you seem to stray from what you orginaly said and pretend that's what you said in the begining when it agrees with what we was saying the whole time and then pretend you was right... but if we ever make open suggestions or point it out that's what we said at the start but you said no you don't respond.

It's unlike me to question other peoples research as it's not often can i say I've researched what they have, but now i have to question, where on earth is your research coming from? my assumption is your dad as it's very x86 and that would have been the arch of the time, but you seem unacknowledged of any other archs... and you seem suborn that other archs aren't competitive, i can't comprehend where on earth this seems to be comming from...

I'm sorry if this seems like a forward attack but i really need to rant about this about you because it's bugging me and being as much as i like to learn new things question people seems to very good for learning Big Grin

So please do share with us where your knowledge has came from.


RE: Architecture Comparison,questions. .. - TSO - 11-25-2014

@Magazorb
I have waited two days to answer this because it was quite upsetting and insulting. I therefore felt the need to calm myself before responding.

(11-22-2014, 06:17 PM)Magazorb Wrote:
TSO Wrote:snip

Yorn, man your just a suborn fan-boy of the x86

You quoted a post where I was responding to LD's statement that Intel doesn't microcode often, in which I showed two prominent examples where they do (an the rest of the architectures in that manual also use it heavily), and then stated that I think that the use of microcoding in x86 is inefficient and stupid. Your response is that I love intel and x86? The only reason why I even know it instead of ARM or something more efficient is because I am using a computer with a Sandy Bridge architecture.

Magazorb Wrote:fact most IPs use 90% of the things x86 use and then add a ton of features more, most documentation for architectures use references to x86 that would be several hundred pages long for all the nitty gritty and then explain the difference, saving them many pages, goes back to the old programming saying "Don't repeat your self" stop following your father blindly, he says things about modern programmer and how they "have the speed to not have to program efficiently" THAT IS WRONG!, coders don't program, don't ever confuse a coder and a programer, a coder is a retard and a programmer is a engineer, the fact you concistantly blindly against us as if you're god is becoming annoying, you're statments majority of times are incorrect and you treat them like law.

I can see the value in what you are saying about me, but I don't see how this is relevant to the discussion at this time. I am not a god, I am incorrect very often, in fact; but in this discussion, I directly cited the Intel manual on this topic. Last time I checked, Intel generally is a solid source for information about Intel.

Magaxorb Wrote:Do your research please, x86 isn't a big architecture, hence why other architectures incorpate it, their are some architectures that are so big that adding x86 to it would be a pain, guess what ARM is one of those, they arent a architecture there a IP, a company with many architectures and some of them really are good and much more advanced then x86, Haswell is so heavily expanded on x86 to give nothing but serial computations (infact back in earily 2000's intel made a 7GHz processor prototype as a attempty to get even more serial computation out of their cores, they failed measable with requiring so much power and gained so little performance) that they just contiune adding as much as they can to get as many instructions about what's going to happen and start speculating, this wasn't defined in x86 orginaly, but i'll contiune saying that x86 is as what intell wants to call it (although they did a dirty move to call it theirs ages ago when it was some other guys and then got law suits made so that AMD couldn't use it but that resulted as having all the different names for what's still x86 but with extensions, yes intel had done some bad stuff but they seem mostly nice these days)

I know x86 isn't the biggest architecture, and I know it's not the most efficient. The prior direction of the discussion was me saying that it was unnecessarily large (and, by extension, so are most larger architectures). (Also, that's not exactly how that dirty move went down, but I'm not going to get into it right now.)

Magazorb Wrote:If you really insist for me to give you a nice example of what destroys x86? CBEA is one (this actual dominated the computational market in the 2000's, no x86 machine could keep up, even today the aging arch with no update still can outperform modern x86 in computation), PCP is another, Mill is also a 3rd.

I thought everything was a better alternative to x86, but if there are only three, I guess I must be wrong.

Magazorb Wrote:now to go back to microcode, did you know most code is focused on intels "x86" so they run signicantly better on intel hardware then AMD? no probably not, and code that's focused on AMD tends to run not so well on intels chips, this is a fact and people do still do this, just because your too ignorant to open your eyes to realise it's still being done (infact more now then ever before and this will only ever continue to expand with many alternative archs getting funding to become competitive, heck AMD has released statements saying they will support other archs as well as x86 and have proccessors that will just happen to run code of the other)

Funny thing, I actually did, which was why I was specifying Intel. The reason that AMD and Intel differ in efficiency is because AMD does not have licencing for Intel microcode (except on the i80386 and a few others), they only have an x86 licence.

On that topic, rather recently, Intel threatened to revoke said x86 licence after AMD acquired an ARM licence and stated they were going to make a dual compatible CPU. I get the feeling that they are going to make it dual compatible through microcoding and a recognition algorithm, otherwise one would have to specify in the program which architecture to use. I would assume that such a header would be unrecognized by other systems, meaning that they would fail to run the program properly. Even then, the processor must default to one of the two modes when initially loading the operating system or any other programs... it just sounds like a mess to me.

Magazorb Wrote:you're arguments come from where? where's your evidence, where's your references? like you just make points over and over that we can't find anything to support them, we do our research a lot, as of late lord more so then me, but we tend to have a idea what we're talking about and then you just seem to arguagainst what we say pointlessly, then you seem to think your right, then you seem to stray from what you orginaly said and pretend that's what you said in the begining when it agrees with what we was saying the whole time and then pretend you was right... but if we ever make open suggestions or point it out that's what we said at the start but you said no you don't respond.

I would check... oh I don't know... the link that was right there in the post, or if you didn't see that, check the manual's name which I showed right in the quote header. If I don't respond it's because you were right, meaning that the conversation need not go on. What's the problem in that? Often times we are saying the same thing and through either communication error or a difference in notation we come to a disagreement, which we then spend like four pages bickering about until one of us gives up. Once we both realize that we were saying the same thing, the conversation ends.

Magazorb Wrote:It's unlike me to question other peoples research as it's not often can i say I've researched what they have, but now i have to question, where on earth is your research coming from? my assumption is your dad as it's very x86 and that would have been the arch of the time, but you seem unacknowledged of any other archs... and you seem suborn that other archs aren't competitive, i can't comprehend where on earth this seems to be comming from...

Actually, my father doesn't know much about how the computer makes it happen, he just knows how to code for it. Even then, he has forgotten a lot. My research comes from a lot of places. In this discussion (about x86), though, it comes entirely from the x86 manuals. Again, though, my position in the discussion was that x86 was an inefficient pain in the ass with too much microcode and too many superfluous opcodes compared to other architectures.

Magazorb Wrote:I'm sorry if this seems like a forward attack but i really need to rant about this about you because it's bugging me and being as much as i like to learn new things question people seems to very good for learning Big Grin

So please do share with us where your knowledge has came from.

Basically it all comes down to the Intel manuals (but only for this discussion), wikipedia, and an ancient book on Boolean logic and the algebra of sets (unless math has changed, I'm going to say this is still a valid source). Most of the other times, I'm coming up with a solution to problems myself without research, but we haven't really had a discussion around that because those are mostly conceptual ideas.

@LordDecapo
Yes, I was aware that the 8086 did not have as much microcode, but let's examine even something as simple as pushl %eax. It gets broken down into what is equivalently subl %esp, $0x4 followed by movl %eax, %(esp). Intel x86 processors will convert that PUSH command to those two micro operations, and I think AMD does as well, but that depends upon how they optimize the stack machine. I can view that x86 manual, open to a random opcode, and there is pseudocoding for that operation that describes what the microcode performs.

In fact, let's do just that. (Everything is from the Intel 64 and IA-32 Architectures Software Developer's Manual, my own comments are inside bracets)

Randomly checked Opcodes: AAA, FLDENV, BLSMSK, MINPD, MOVS, MOV, MUL, PUSH, SAR/SAL/SHR/SHL


Code:
opcode: AAA: ASCII adjust after addition
description:
Adjusts the sum of two unpacked BCD values to create an unpacked BCD result. The AL register is the implied source and destination operand for this instruction. The AAA instruction is only useful when it follows an ADD instruction that adds (binary addition) two unpacked BCD values and stores a byte result in the AL register. The AAA instruction then adjusts the contents of the AL register to contain the correct 1-digit unpacked BCD result.

If the addition produces a decimal carry, the AH register increments by 1, and the CF and AF flags are set. If there was no decimal carry, the CF and AF flags are cleared and the AH register is unchanged. In either  case, bits 4 through 7 of the AL register are set to 0.

This instruction executes as described in compatibility mode and legacy mode. It is not valid in 64-bit mode.

Operation:
      IF 64-Bit Mode
            THEN
                 #UD;
            ELSE
                  IF ((AL AND 0FH) > 9) or (AF = 1)
                          THEN
                                AL ← AL + 6;
                                AH ← AH + 1;
                                AF ← 1;
                                CF ← 1;
                                AL ← AL AND 0FH;
                           ELSE
                                AF ← 0;
                                CF ← 0;
                                AL ← AL AND 0FH;
                 FI;
      FI;
____________________________________________________________________

opcode: FLDENV: Load x87 FPU Environment

description:
loads the complete x87 FPU environment from me... [okay, that obviously is going to have a lot of microcode, and this description is massive]

operation:
FPUControlWord ← SRC[FPUControlWord];
FPUStatusWord ← SRC[FPUStatusWord];
FPUTagWord ← SRC[FPUTagWord];
FPUDataPointer ← SRC[FPUDataPointer];
FPUInstructionPointer ← SRC[FPUInstructionPointer];
FPULastInstructionOpcode ← SRC[FPULastInstructionOpcode];

____________________________________________________________________

opcode: BLSMSK
description:
Sets all the lower bits of the destination operand to "1" up to and including lowest set bit (=1) in the source operand. If source operand is zero, BLSMSK sets all bits of the destination operand to 1 and also sets CF to 1.

This instruction is not supported in real mode and virtual-8086 mode. The operand size is always 32 bits if not in 64-bit mode. In 64-bit mode operand size 64 requires VEX.W1. VEX.W1 is ignored in non-64-bit modes. An attempt to execute this instruction with VEX.L not equal to 0 will cause #UD.

operation:
temp ← (SRC-1) XOR (SRC) ;
SF ← temp[OperandSize -1];
ZF ← 0;
IF SRC = 0
      CF ← 1;
ELSE
      CF ← 0;
FI
DEST ← temp;

____________________________________________________________________

opcode: MINPD: Return Minimum Packed Double-Precision Floating-Point Values
description:
Performs an SIMD compare of the packed double-precision floating-point values in the first source operand and the second source operand and returns the minimum value for each pair of values to the destination operand.

If the values being compared are both 0.0s (of either sign), the value in the second operand (source operand) is returned. If a value in the second operand is an SNaN, that SNaN is forwarded unchanged to the destination (that is, a QNaN version of the SNaN is not returned).

If only one value is a NaN (SNaN or QNaN) for this instruction, the second operand (source operand), either a NaN or a valid floating-point value, is written to the result. If instead of this behavior, it is required that the NaN source operand (from either the first or second operand) be returned, the action of MINPD can be emulated using a sequence of instructions, such as, a comparison followed by AND, ANDN and OR.

In 64-bit mode, use of the REX.R prefix permits this instruction to access additional registers (XMM8-XMM15).

128-bit Legacy SSE version: The second source can be an XMM register or an 128-bit memory location. The destination is not distinct from the first source XMM register and the upper bits (VLMAX-1:128) of the corresponding YMM register destination are unmodified.

VEX.128 encoded version: the first source operand is an XMM register or 128-bit memory location. The destination operand is an XMM register. The upper bits (VLMAX-1:128) of the corresponding YMM register destination are zeroed.

operation:
MIN(SRC1, SRC2)
{
IF ((SRC1 = 0.0) and (SRC2 = 0.0)) THEN DEST ← SRC2;
    ELSE IF (SRC1 = SNaN) THEN DEST ← SRC2; FI;
    ELSE IF (SRC2 = SNaN) THEN DEST ← SRC2; FI;
    ELSE IF (SRC1 < SRC2) THEN DEST ← SRC1;
    ELSE DEST ← SRC2;
FI;
}
MINPD (128-bit Legacy SSE version)
DEST[63:0] ← MIN(SRC1[63:0], SRC2[63:0])
DEST[127:64] ← MIN(SRC1[127:64], SRC2[127:64])
DEST[VLMAX-1:128] (Unmodified)

VMINPD (VEX.128 encoded version)
DEST[63:0] ← MIN(SRC1[63:0], SRC2[63:0])
DEST[127:64] ← MIN(SRC1[127:64], SRC2[127:64])
DEST[VLMAX-1:128] ← 0

VMINPD (VEX.256 encoded version)
DEST[63:0] ← MIN(SRC1[63:0], SRC2[63:0])
DEST[127:64] ← MIN(SRC1[127:64], SRC2[127:64])
DEST[191:128] ← MIN(SRC1[191:128], SRC2[191:128])
DEST[255:192] ← MIN(SRC1[255:192], SRC2[255:192])

____________________________________________________________________

opcode: MOVS: Move Data From String to String
description: [it's really long, but in short, this opcode moves strings through memory]

operation:
DEST ← SRC;

Non-64-bit Mode:

IF (Byte move)
THEN IF DF = 0
THEN
(E)SI ← (E)SI + 1;
(E)DI ← (E)DI + 1;
ELSE
(E)SI ← (E)SI – 1;
(E)DI ← (E)DI – 1;
FI;
ELSE IF (Word move)
THEN IF DF = 0
(E)SI ← (E)SI + 2;
(E)DI ← (E)DI + 2;
FI;
ELSE
(E)SI ← (E)SI – 2;
(E)DI ← (E)DI – 2;
FI;
ELSE IF (Doubleword move)
THEN IF DF = 0
(E)SI ← (E)SI + 4;
(E)DI ← (E)DI + 4;
FI;
ELSE
(E)SI ← (E)SI – 4;
(E)DI ← (E)DI – 4;
FI;
FI;

64-bit Mode:

IF (Byte move)
THEN IF DF = 0
THEN
(R|E)SI ← (R|E)SI + 1;
(R|E)DI ← (R|E)DI + 1;
ELSE
(R|E)SI ← (R|E)SI – 1;
(R|E)DI ← (R|E)DI – 1;
FI;
ELSE IF (Word move)
THEN IF DF = 0
(R|E)SI ← (R|E)SI + 2;
(R|E)DI ← (R|E)DI + 2;
FI;
ELSE
(R|E)SI ← (R|E)SI – 2;
(R|E)DI ← (R|E)DI – 2;
FI;
ELSE IF (Doubleword move)
THEN IF DF = 0
(R|E)SI ← (R|E)SI + 4;
(R|E)DI ← (R|E)DI + 4;
FI;
ELSE
(R|E)SI ← (R|E)SI – 4;
(R|E)DI ← (R|E)DI – 4;
FI;
ELSE IF (Quadword move)
THEN IF DF = 0
(R|E)SI ← (R|E)SI + 8;
(R|E)DI ← (R|E)DI + 8;
FI;
ELSE
(R|E)SI ← (R|E)SI – 8;
(R|E)DI ← (R|E)DI – 8;
FI;
FI;

____________________________________________________________________

opcode: MOV: Move
description: Copies second operand (source operand) to the first operand (destination operand). The source operand can... [jesus christ this description is long]

operation: [damn, I thought there wouldn't be any microcode for this instruction, but there is if a memory location is one or both operands]
DEST ← SRC;

Loading a segment register while in protected mode results in special checks and actions, as described in the following listing. These checks are performed on the segment selector and the segment descriptor to which it points.

IF SS is loaded
THEN
IF segment selector is NULL
THEN #GP(0); FI;
IF segment selector index is outside descriptor table limits
or segment selector's RPL ≠ CPL
or segment is not a writable data segment
or DPL ≠ CPL
THEN #GP(selector); FI;
IF segment not marked present
THEN #SS(selector);
ELSE
SS ← segment selector;
SS ← segment descriptor; FI;
FI;
IF DS, ES, FS, or GS is loaded with non-NULL selector
THEN
IF segment selector index is outside descriptor table limits
or segment is not a data or readable code segment
or ((segment is a data or nonconforming code segment)
or ((RPL > DPL) and (CPL > DPL))
THEN #GP(selector); FI;
IF segment not marked present
THEN #NP(selector);
ELSE
SegmentRegister ← segment selector;
SegmentRegister ← segment descriptor; FI;
FI;
IF DS, ES, FS, or GS is loaded with NULL selector
THEN
SegmentRegister ← segment selector;
SegmentRegister ← segment descriptor;
FI;

____________________________________________________________________

opcode: MUL: Unsigned Integer Multiply

description:
Performs an unsigned multiplication of the first operand (destination operand) and the second operand (source operand) and stores the result in the destination operand. The destination operand is an implied operand located in AL, AX, or EAX (depending on the size of the operand); the source operand is located in a general purpose register or memory location...

The result is stored in AX, register pair DX:AX, or register pair EDX:EAX... [and so on]

operation:
IF (Byte operation)
THEN
AX ← AL ∗ SRC;
ELSE (* Word or doubleword operation *)
IF OperandSize = 16
THEN
DX:AX ← AX ∗ SRC;
ELSE IF OperandSize = 32
THEN EDX:EAX ← EAX ∗ SRC; FI;
ELSE (* OperandSize = 64 *)
RDX:RAX ← RAX ∗ SRC;
FI;
FI;

____________________________________________________________________

opcode: PUSH: Push Word, Doubleword, or Quadword Onto the Stack

description:
Decrements the stack pointer then stores the source operand on the top of the stack. Operand and... [this one is huge too]

operation:
IF StackAddrSize = 64
THEN
IF OperandSize = 64
THEN
RSP ← RSP – 8;
Memory[SS:RSP] ← SRC; (* push quadword *)
ELSE IF OperandSize = 32
THEN
RSP ← RSP – 4;
Memory[SS:RSP] ← SRC; (* push dword *)
ELSE (* OperandSize = 16 *)
RSP ← RSP – 2;
Memory[SS:RSP] ← SRC; (* push word *)
FI;
ELSE IF StackAddrSize = 32
THEN
IF OperandSize = 64
THEN
ESP ← ESP – 8;
Memory[SS:ESP] ← SRC; (* push quadword *)
ELSE IF OperandSize = 32
THEN
ESP ← ESP – 4;
Memory[SS:ESP] ← SRC; (* push dword *)
ELSE (* OperandSize = 16 *)
ESP ← ESP – 2;
Memory[SS:ESP] ← SRC; (* push word *)
FI;
ELSE (* StackAddrSize = 16 *)
IF OperandSize = 32
THEN
SP ← SP – 4;
Memory[SS:SP] ← SRC; (* push dword *)
ELSE (* OperandSize = 16 *)
SP ← SP – 2;
Memory[SS:SP] ← SRC; (* push word *)
FI;
FI;  

____________________________________________________________________

opcode: SAR,SAL,SHR,SHL: Shift

description:
Shifts the bits in the first operand (destination operand) to the left or right by the number of bits specified in the second operand (count operand). Bits shifted... [describes the effected flags and the four types of shift]

operation:
IF 64-Bit Mode and using REX.W
THEN
countMASK ← 3FH;
ELSE
countMASK ← 1FH;
FI
tempCOUNT ← (COUNT AND countMASK);
tempDEST ← DEST;
WHILE (tempCOUNT ≠ 0)
DO
IF instruction is SAL or SHL
THEN
CF ← MSB(DEST);
ELSE (* Instruction is SAR or SHR *)
CF ← LSB(DEST);
FI;
IF instruction is SAL or SHL
THEN
DEST ← DEST ∗ 2;
ELSE
IF instruction is SAR
THEN
DEST ← DEST / 2; (* Signed divide, rounding toward negative infinity *)
ELSE (* Instruction is SHR *)
DEST ← DEST / 2 ; (* Unsigned divide *)
FI;
FI;
tempCOUNT ← tempCOUNT – 1;
OD;
(* Determine overflow for the various instructions *)
IF (COUNT and countMASK) = 1
THEN
IF instruction is SAL or SHL
THEN
OF ← MSB(DEST) XOR CF;
ELSE
IF instruction is SAR
THEN
OF ← 0;
ELSE (* Instruction is SHR *)
OF ← MSB(tempDEST);
FI;
FI;
ELSE IF (COUNT AND countMASK) = 0
THEN
All flags unchanged;
ELSE (* COUNT not 1 or 0 *)
OF ← undefined;
FI;
FI;

____________________________________________________________________

It's actually difficult to find something that is microcode for itself, but I found two for you.


Code:
opcode: ADD: Add

description:
Adds the destination operand (first operand) and the source operand (second operand) and then stores the result in the destination operand. The destination... [after this, it's just limits on what memory classes and types you can use]

operation:
DEST ← DEST + SRC;

____________________________________________________________________

opcode: SBB : Integer Subtraction with Borrow

description:
Adds the source operand (second operand) and the carry (CF) flag, and subtracts the result from the destination operand (first operand). The result of the subtraction is stored... [it goes on and on and on and on]

operation:
DEST ← (DEST – (SRC + CF)) [I would assume that the add/subtract unit can pull all of this off in one move]

And there we have it. It seems like only the oldest and most basic of operations can actually get by without their own microcode. On top of the written micro-operations, the system also runs a check against the IOPL flags, CS, and the control registers, on nearly all of the operations for security reasons. Then, the microcoding is used to control the out of order execution algorithm as well as the in order retirement unit. Finally, I can't guarantee the actual binary values of operations and operands are preserved through the micro-op decoder.


RE: Architecture Comparison,questions. .. - Magazorb - 11-26-2014

(11-25-2014, 01:25 AM)TSO Wrote: @Magazorb
I have waited two days to answer this because it was quite upsetting and insulting. I therefore felt the need to calm myself before responding.

(11-22-2014, 06:17 PM)Magazorb Wrote:
TSO Wrote:snip

Yorn, man your just a suborn fan-boy of the x86

You quoted a post where I was responding to LD's statement that Intel doesn't microcode often, in which I showed two prominent examples where they do (an the rest of the architectures in that manual also use it heavily), and then stated that I think that the use of microcoding in x86 is inefficient and stupid. Your response is that I love intel and x86? The only reason why I even know it instead of ARM or something more efficient is because I am using a computer with a Sandy Bridge architecture.

Magazorb Wrote:fact most IPs use 90% of the things x86 use and then add a ton of features more, most documentation for architectures use references to x86 that would be several hundred pages long for all the nitty gritty and then explain the difference, saving them many pages, goes back to the old programming saying "Don't repeat your self" stop following your father blindly, he says things about modern programmer and how they "have the speed to not have to program efficiently" THAT IS WRONG!, coders don't program, don't ever confuse a coder and a programer, a coder is a retard and a programmer is a engineer, the fact you concistantly blindly against us as if you're god is becoming annoying, you're statments majority of times are incorrect and you treat them like law.

I can see the value in what you are saying about me, but I don't see how this is relevant to the discussion at this time. I am not a god, I am incorrect very often, in fact; but in this discussion, I directly cited the Intel manual on this topic. Last time I checked, Intel generally is a solid source for information about Intel.

Magaxorb Wrote:Do your research please, x86 isn't a big architecture, hence why other architectures incorpate it, their are some architectures that are so big that adding x86 to it would be a pain, guess what ARM is one of those, they arent a architecture there a IP, a company with many architectures and some of them really are good and much more advanced then x86, Haswell is so heavily expanded on x86 to give nothing but serial computations (infact back in earily 2000's intel made a 7GHz processor prototype as a attempty to get even more serial computation out of their cores, they failed measable with requiring so much power and gained so little performance) that they just contiune adding as much as they can to get as many instructions about what's going to happen and start speculating, this wasn't defined in x86 orginaly, but i'll contiune saying that x86 is as what intell wants to call it (although they did a dirty move to call it theirs ages ago when it was some other guys and then got law suits made so that AMD couldn't use it but that resulted as having all the different names for what's still x86 but with extensions, yes intel had done some bad stuff but they seem mostly nice these days)

I know x86 isn't the biggest architecture, and I know it's not the most efficient. The prior direction of the discussion was me saying that it was unnecessarily large (and, by extension, so are most larger architectures). (Also, that's not exactly how that dirty move went down, but I'm not going to get into it right now.)

Magazorb Wrote:If you really insist for me to give you a nice example of what destroys x86? CBEA is one (this actual dominated the computational market in the 2000's, no x86 machine could keep up, even today the aging arch with no update still can outperform modern x86 in computation), PCP is another, Mill is also a 3rd.

I thought everything was a better alternative to x86, but if there are only three, I guess I must be wrong.

Magazorb Wrote:now to go back to microcode, did you know most code is focused on intels "x86" so they run signicantly better on intel hardware then AMD? no probably not, and code that's focused on AMD tends to run not so well on intels chips, this is a fact and people do still do this, just because your too ignorant to open your eyes to realise it's still being done (infact more now then ever before and this will only ever continue to expand with many alternative archs getting funding to become competitive, heck AMD has released statements saying they will support other archs as well as x86 and have proccessors that will just happen to run code of the other)

Funny thing, I actually did, which was why I was specifying Intel. The reason that AMD and Intel differ in efficiency is because AMD does not have licencing for Intel microcode (except on the i80386 and a few others), they only have an x86 licence.

On that topic, rather recently, Intel threatened to revoke said x86 licence after AMD acquired an ARM licence and stated they were going to make a dual compatible CPU. I get the feeling that they are going to make it dual compatible through microcoding and a recognition algorithm, otherwise one would have to specify in the program which architecture to use. I would assume that such a header would be unrecognized by other systems, meaning that they would fail to run the program properly. Even then, the processor must default to one of the two modes when initially loading the operating system or any other programs... it just sounds like a mess to me.

Magazorb Wrote:you're arguments come from where? where's your evidence, where's your references? like you just make points over and over that we can't find anything to support them, we do our research a lot, as of late lord more so then me, but we tend to have a idea what we're talking about and then you just seem to arguagainst what we say pointlessly, then you seem to think your right, then you seem to stray from what you orginaly said and pretend that's what you said in the begining when it agrees with what we was saying the whole time and then pretend you was right... but if we ever make open suggestions or point it out that's what we said at the start but you said no you don't respond.

I would check... oh I don't know... the link that was right there in the post, or if you didn't see that, check the manual's name which I showed right in the quote header. If I don't respond it's because you were right, meaning that the conversation need not go on. What's the problem in that? Often times we are saying the same thing and through either communication error or a difference in notation we come to a disagreement, which we then spend like four pages bickering about until one of us gives up. Once we both realize that we were saying the same thing, the conversation ends.

Magazorb Wrote:It's unlike me to question other peoples research as it's not often can i say I've researched what they have, but now i have to question, where on earth is your research coming from? my assumption is your dad as it's very x86 and that would have been the arch of the time, but you seem unacknowledged of any other archs... and you seem suborn that other archs aren't competitive, i can't comprehend where on earth this seems to be comming from...

Actually, my father doesn't know much about how the computer makes it happen, he just knows how to code for it. Even then, he has forgotten a lot. My research comes from a lot of places. In this discussion (about x86), though, it comes entirely from the x86 manuals. Again, though, my position in the discussion was that x86 was an inefficient pain in the ass with too much microcode and too many superfluous opcodes compared to other architectures.

Magazorb Wrote:I'm sorry if this seems like a forward attack but i really need to rant about this about you because it's bugging me and being as much as i like to learn new things question people seems to very good for learning Big Grin

So please do share with us where your knowledge has came from.

Basically it all comes down to the Intel manuals (but only for this discussion), wikipedia, and an ancient book on Boolean logic and the algebra of sets (unless math has changed, I'm going to say this is still a valid source). Most of the other times, I'm coming up with a solution to problems myself without research, but we haven't really had a discussion around that because those are mostly conceptual ideas.

@LordDecapo
Yes, I was aware that the 8086 did not have as much microcode, but let's examine even something as simple as pushl %eax. It gets broken down into what is equivalently subl %esp, $0x4 followed by movl %eax, %(esp). Intel x86 processors will convert that PUSH command to those two micro operations, and I think AMD does as well, but that depends upon how they optimize the stack machine. I can view that x86 manual, open to a random opcode, and there is pseudocoding for that operation that describes what the microcode performs.

In fact, let's do just that. (Everything is from the Intel 64 and IA-32 Architectures Software Developer's Manual, my own comments are inside bracets)

Randomly checked Opcodes: AAA, FLDENV, BLSMSK, MINPD, MOVS, MOV, MUL, PUSH, SAR/SAL/SHR/SHL


Code:
opcode: AAA: ASCII adjust after addition
description:
Adjusts the sum of two unpacked BCD values to create an unpacked BCD result. The AL register is the implied source and destination operand for this instruction. The AAA instruction is only useful when it follows an ADD instruction that adds (binary addition) two unpacked BCD values and stores a byte result in the AL register. The AAA instruction then adjusts the contents of the AL register to contain the correct 1-digit unpacked BCD result.

If the addition produces a decimal carry, the AH register increments by 1, and the CF and AF flags are set. If there was no decimal carry, the CF and AF flags are cleared and the AH register is unchanged. In either  case, bits 4 through 7 of the AL register are set to 0.

This instruction executes as described in compatibility mode and legacy mode. It is not valid in 64-bit mode.

Operation:
      IF 64-Bit Mode
            THEN
                 #UD;
            ELSE
                  IF ((AL AND 0FH) > 9) or (AF = 1)
                          THEN
                                AL ← AL + 6;
                                AH ← AH + 1;
                                AF ← 1;
                                CF ← 1;
                                AL ← AL AND 0FH;
                           ELSE
                                AF ← 0;
                                CF ← 0;
                                AL ← AL AND 0FH;
                 FI;
      FI;
____________________________________________________________________

opcode: FLDENV: Load x87 FPU Environment

description:
loads the complete x87 FPU environment from me... [okay, that obviously is going to have a lot of microcode, and this description is massive]

operation:
FPUControlWord ← SRC[FPUControlWord];
FPUStatusWord ← SRC[FPUStatusWord];
FPUTagWord ← SRC[FPUTagWord];
FPUDataPointer ← SRC[FPUDataPointer];
FPUInstructionPointer ← SRC[FPUInstructionPointer];
FPULastInstructionOpcode ← SRC[FPULastInstructionOpcode];

____________________________________________________________________

opcode: BLSMSK
description:
Sets all the lower bits of the destination operand to "1" up to and including lowest set bit (=1) in the source operand. If source operand is zero, BLSMSK sets all bits of the destination operand to 1 and also sets CF to 1.

This instruction is not supported in real mode and virtual-8086 mode. The operand size is always 32 bits if not in 64-bit mode. In 64-bit mode operand size 64 requires VEX.W1. VEX.W1 is ignored in non-64-bit modes. An attempt to execute this instruction with VEX.L not equal to 0 will cause #UD.

operation:
temp ← (SRC-1) XOR (SRC) ;
SF ← temp[OperandSize -1];
ZF ← 0;
IF SRC = 0
      CF ← 1;
ELSE
      CF ← 0;
FI
DEST ← temp;

____________________________________________________________________

opcode: MINPD: Return Minimum Packed Double-Precision Floating-Point Values
description:
Performs an SIMD compare of the packed double-precision floating-point values in the first source operand and the second source operand and returns the minimum value for each pair of values to the destination operand.

If the values being compared are both 0.0s (of either sign), the value in the second operand (source operand) is returned. If a value in the second operand is an SNaN, that SNaN is forwarded unchanged to the destination (that is, a QNaN version of the SNaN is not returned).

If only one value is a NaN (SNaN or QNaN) for this instruction, the second operand (source operand), either a NaN or a valid floating-point value, is written to the result. If instead of this behavior, it is required that the NaN source operand (from either the first or second operand) be returned, the action of MINPD can be emulated using a sequence of instructions, such as, a comparison followed by AND, ANDN and OR.

In 64-bit mode, use of the REX.R prefix permits this instruction to access additional registers (XMM8-XMM15).

128-bit Legacy SSE version: The second source can be an XMM register or an 128-bit memory location. The destination is not distinct from the first source XMM register and the upper bits (VLMAX-1:128) of the corresponding YMM register destination are unmodified.

VEX.128 encoded version: the first source operand is an XMM register or 128-bit memory location. The destination operand is an XMM register. The upper bits (VLMAX-1:128) of the corresponding YMM register destination are zeroed.

operation:
MIN(SRC1, SRC2)
{
IF ((SRC1 = 0.0) and (SRC2 = 0.0)) THEN DEST ← SRC2;
    ELSE IF (SRC1 = SNaN) THEN DEST ← SRC2; FI;
    ELSE IF (SRC2 = SNaN) THEN DEST ← SRC2; FI;
    ELSE IF (SRC1 < SRC2) THEN DEST ← SRC1;
    ELSE DEST ← SRC2;
FI;
}
MINPD (128-bit Legacy SSE version)
DEST[63:0] ← MIN(SRC1[63:0], SRC2[63:0])
DEST[127:64] ← MIN(SRC1[127:64], SRC2[127:64])
DEST[VLMAX-1:128] (Unmodified)

VMINPD (VEX.128 encoded version)
DEST[63:0] ← MIN(SRC1[63:0], SRC2[63:0])
DEST[127:64] ← MIN(SRC1[127:64], SRC2[127:64])
DEST[VLMAX-1:128] ← 0

VMINPD (VEX.256 encoded version)
DEST[63:0] ← MIN(SRC1[63:0], SRC2[63:0])
DEST[127:64] ← MIN(SRC1[127:64], SRC2[127:64])
DEST[191:128] ← MIN(SRC1[191:128], SRC2[191:128])
DEST[255:192] ← MIN(SRC1[255:192], SRC2[255:192])

____________________________________________________________________

opcode: MOVS: Move Data From String to String
description: [it's really long, but in short, this opcode moves strings through memory]

operation:
DEST ← SRC;

Non-64-bit Mode:

IF (Byte move)
THEN IF DF = 0
THEN
(E)SI ← (E)SI + 1;
(E)DI ← (E)DI + 1;
ELSE
(E)SI ← (E)SI – 1;
(E)DI ← (E)DI – 1;
FI;
ELSE IF (Word move)
THEN IF DF = 0
(E)SI ← (E)SI + 2;
(E)DI ← (E)DI + 2;
FI;
ELSE
(E)SI ← (E)SI – 2;
(E)DI ← (E)DI – 2;
FI;
ELSE IF (Doubleword move)
THEN IF DF = 0
(E)SI ← (E)SI + 4;
(E)DI ← (E)DI + 4;
FI;
ELSE
(E)SI ← (E)SI – 4;
(E)DI ← (E)DI – 4;
FI;
FI;

64-bit Mode:

IF (Byte move)
THEN IF DF = 0
THEN
(R|E)SI ← (R|E)SI + 1;
(R|E)DI ← (R|E)DI + 1;
ELSE
(R|E)SI ← (R|E)SI – 1;
(R|E)DI ← (R|E)DI – 1;
FI;
ELSE IF (Word move)
THEN IF DF = 0
(R|E)SI ← (R|E)SI + 2;
(R|E)DI ← (R|E)DI + 2;
FI;
ELSE
(R|E)SI ← (R|E)SI – 2;
(R|E)DI ← (R|E)DI – 2;
FI;
ELSE IF (Doubleword move)
THEN IF DF = 0
(R|E)SI ← (R|E)SI + 4;
(R|E)DI ← (R|E)DI + 4;
FI;
ELSE
(R|E)SI ← (R|E)SI – 4;
(R|E)DI ← (R|E)DI – 4;
FI;
ELSE IF (Quadword move)
THEN IF DF = 0
(R|E)SI ← (R|E)SI + 8;
(R|E)DI ← (R|E)DI + 8;
FI;
ELSE
(R|E)SI ← (R|E)SI – 8;
(R|E)DI ← (R|E)DI – 8;
FI;
FI;

____________________________________________________________________

opcode: MOV: Move
description: Copies second operand (source operand) to the first operand (destination operand). The source operand can... [jesus christ this description is long]

operation: [damn, I thought there wouldn't be any microcode for this instruction, but there is if a memory location is one or both operands]
DEST ← SRC;

Loading a segment register while in protected mode results in special checks and actions, as described in the following listing. These checks are performed on the segment selector and the segment descriptor to which it points.

IF SS is loaded
THEN
IF segment selector is NULL
THEN #GP(0); FI;
IF segment selector index is outside descriptor table limits
or segment selector's RPL ≠ CPL
or segment is not a writable data segment
or DPL ≠ CPL
THEN #GP(selector); FI;
IF segment not marked present
THEN #SS(selector);
ELSE
SS ← segment selector;
SS ← segment descriptor; FI;
FI;
IF DS, ES, FS, or GS is loaded with non-NULL selector
THEN
IF segment selector index is outside descriptor table limits
or segment is not a data or readable code segment
or ((segment is a data or nonconforming code segment)
or ((RPL > DPL) and (CPL > DPL))
THEN #GP(selector); FI;
IF segment not marked present
THEN #NP(selector);
ELSE
SegmentRegister ← segment selector;
SegmentRegister ← segment descriptor; FI;
FI;
IF DS, ES, FS, or GS is loaded with NULL selector
THEN
SegmentRegister ← segment selector;
SegmentRegister ← segment descriptor;
FI;

____________________________________________________________________

opcode: MUL: Unsigned Integer Multiply

description:
Performs an unsigned multiplication of the first operand (destination operand) and the second operand (source operand) and stores the result in the destination operand. The destination operand is an implied operand located in AL, AX, or EAX (depending on the size of the operand); the source operand is located in a general purpose register or memory location...

The result is stored in AX, register pair DX:AX, or register pair EDX:EAX... [and so on]

operation:
IF (Byte operation)
THEN
AX ← AL ∗ SRC;
ELSE (* Word or doubleword operation *)
IF OperandSize = 16
THEN
DX:AX ← AX ∗ SRC;
ELSE IF OperandSize = 32
THEN EDX:EAX ← EAX ∗ SRC; FI;
ELSE (* OperandSize = 64 *)
RDX:RAX ← RAX ∗ SRC;
FI;
FI;

____________________________________________________________________

opcode: PUSH: Push Word, Doubleword, or Quadword Onto the Stack

description:
Decrements the stack pointer then stores the source operand on the top of the stack. Operand and... [this one is huge too]

operation:
IF StackAddrSize = 64
THEN
IF OperandSize = 64
THEN
RSP ← RSP – 8;
Memory[SS:RSP] ← SRC; (* push quadword *)
ELSE IF OperandSize = 32
THEN
RSP ← RSP – 4;
Memory[SS:RSP] ← SRC; (* push dword *)
ELSE (* OperandSize = 16 *)
RSP ← RSP – 2;
Memory[SS:RSP] ← SRC; (* push word *)
FI;
ELSE IF StackAddrSize = 32
THEN
IF OperandSize = 64
THEN
ESP ← ESP – 8;
Memory[SS:ESP] ← SRC; (* push quadword *)
ELSE IF OperandSize = 32
THEN
ESP ← ESP – 4;
Memory[SS:ESP] ← SRC; (* push dword *)
ELSE (* OperandSize = 16 *)
ESP ← ESP – 2;
Memory[SS:ESP] ← SRC; (* push word *)
FI;
ELSE (* StackAddrSize = 16 *)
IF OperandSize = 32
THEN
SP ← SP – 4;
Memory[SS:SP] ← SRC; (* push dword *)
ELSE (* OperandSize = 16 *)
SP ← SP – 2;
Memory[SS:SP] ← SRC; (* push word *)
FI;
FI;  

____________________________________________________________________

opcode: SAR,SAL,SHR,SHL: Shift

description:
Shifts the bits in the first operand (destination operand) to the left or right by the number of bits specified in the second operand (count operand). Bits shifted... [describes the effected flags and the four types of shift]

operation:
IF 64-Bit Mode and using REX.W
THEN
countMASK ← 3FH;
ELSE
countMASK ← 1FH;
FI
tempCOUNT ← (COUNT AND countMASK);
tempDEST ← DEST;
WHILE (tempCOUNT ≠ 0)
DO
IF instruction is SAL or SHL
THEN
CF ← MSB(DEST);
ELSE (* Instruction is SAR or SHR *)
CF ← LSB(DEST);
FI;
IF instruction is SAL or SHL
THEN
DEST ← DEST ∗ 2;
ELSE
IF instruction is SAR
THEN
DEST ← DEST / 2; (* Signed divide, rounding toward negative infinity *)
ELSE (* Instruction is SHR *)
DEST ← DEST / 2 ; (* Unsigned divide *)
FI;
FI;
tempCOUNT ← tempCOUNT – 1;
OD;
(* Determine overflow for the various instructions *)
IF (COUNT and countMASK) = 1
THEN
IF instruction is SAL or SHL
THEN
OF ← MSB(DEST) XOR CF;
ELSE
IF instruction is SAR
THEN
OF ← 0;
ELSE (* Instruction is SHR *)
OF ← MSB(tempDEST);
FI;
FI;
ELSE IF (COUNT AND countMASK) = 0
THEN
All flags unchanged;
ELSE (* COUNT not 1 or 0 *)
OF ← undefined;
FI;
FI;

____________________________________________________________________

It's actually difficult to find something that is microcode for itself, but I found two for you.


Code:
opcode: ADD: Add

description:
Adds the destination operand (first operand) and the source operand (second operand) and then stores the result in the destination operand. The destination... [after this, it's just limits on what memory classes and types you can use]

operation:
DEST ← DEST + SRC;

____________________________________________________________________

opcode: SBB : Integer Subtraction with Borrow

description:
Adds the source operand (second operand) and the carry (CF) flag, and subtracts the result from the destination operand (first operand). The result of the subtraction is stored... [it goes on and on and on and on]

operation:
DEST ← (DEST – (SRC + CF)) [I would assume that the add/subtract unit can pull all of this off in one move]

And there we have it. It seems like only the oldest and most basic of operations can actually get by without their own microcode. On top of the written micro-operations, the system also runs a check against the IOPL flags, CS, and the control registers, on nearly all of the operations for security reasons. Then, the microcoding is used to control the out of order execution algorithm as well as the in order retirement unit. Finally, I can't guarantee the actual binary values of operations and operands are preserved through the micro-op decoder.

Nice response good sir, better then had anticipated apologies to be a dick like that though.

though a few notes (in no order):

It's a open forum, anyone can read and respond upone anything.

I never said their was only 3, nor did i directly imply, it did say for example.
About the microcode, it's done due to architecture which other architectures using similar to intel's without intel's licences, this is due to the fact they can't just randomly patient things that everyone uses openly, the difference in microcode despite how few thire is now, is due to the difference in micro architecture, Intel goes for a heavily vectorized approach meanwhile AMD doesn't (AMD architecture is really interesting when you break it down it's remarkable how they get so much performance from such few units on such old dies, though it's a really hard achivment to vectorize MIMDs to the extent that intel managed)

my questioning of where you get things from wasn't 1 case. It's a you shoot us down for being wrong with evidence that doesn't prove we're wrong.

wasn't your argument that programmer are bad now days because of a lack of programming in microcode then those retards that just go derp derp derp, "maNz Iz ar pr0graMmar"? (i refer to coders with extremely poor programming skill, unfortunately this is about 70% of "programmers"

supporting code designed for another architecture really isn't that hard,as long as it isn't a after though their's many ways you can go about it, the ways you listed are some, but more excist.
If the differance was obvious enough you can even have HW figure it out, would imagen that you would know what the code is you're going to execute is though once you was running with a kernel (only speculation).


that concludes the notes.

I must apologies for offending you, however to be self critical can help, I'll fully admit at times i forget to look into all the details before declaring laws XD my point although i did word it very aggressively and perhaps in hindsight offensive is that you should be willing to accept incorrectness in your points and have a more open mind about how people respond to what you say Smile

Anyhow have a nice time.


RE: Architecture Comparison,questions. .. - TSO - 11-26-2014

(11-26-2014, 12:34 AM)Magazorb Wrote: Nice response good sir, better then had anticipated apologies to be a dick like that though.

though a few notes (in no order):

It's a open forum, anyone can read and respond upone anything.
Not certain of exactly what that was responding to, but yeah... that's kinda' the purpose of open forums

Magazorb Wrote:I never said their was only 3, nor did i directly imply, it did say for example.

I was joking about there being only three, as I said, I always have thought of x86 as one of the less efficient architectures.

Magazorb Wrote:my questioning of where you get things from wasn't 1 case. It's a you shoot us down for being wrong with evidence that doesn't prove we're wrong.

Other than the thing with my computer and it's forwarding, I can't think of anywhere else I've done this

Magazorb Wrote:About the microcode, it's done due to architecture which other architectures using similar to intel's without intel's licences, this is due to the fact they can't just randomly patient things that everyone uses openly, the difference in microcode despite how few thire is now, is due to the difference in micro architecture, Intel goes for a heavily vectorized approach meanwhile AMD doesn't (AMD architecture is really interesting when you break it down it's remarkable how they get so much performance from such few units on such old dies, though it's a really hard achivment to vectorize MIMDs to the extent that intel managed)

Magazorb Wrote:wasn't your argument that programmer are bad now days because of a lack of programming in microcode then those retards that just go derp derp derp, "maNz Iz ar pr0graMmar"? (i refer to coders with extremely poor programming skill, unfortunately this is about 70% of "programmers"

So, I talked to some people, I couldn't get the guy from HP, he was too busy designing a part for... that one thing ('cuz that's not vague), but I did get a hold of the one from Intel, who then skype'd his friend that works across the street at AMD (they used to work together across the other street at HP) and we all had a little discussion.

Microcoding is done because the amount of die space for a processor that could actually perform all 1200 IA-86 operations in hardware would be entirely redundant (to which I asked why IA-86 is so damn redundant, they didn't have an answer), a few hundred times larger, require thousands of times the power, and be a few hundred (the same few hundred as before) times slower due to internal bussing delays. (Actually, has HP announced anything really really big (besides job termination) in like... the last year? I'll mention it if they have.)

Instead, microcoding is used to compile the IA-86 assembly into machine code that the individual analytic ports (AMD calls them SP or something like that) use. Using haswell as an example, each core has eight of these ports, and each port has a certain range of operations they can perform. For Intel, the actual clock rates on the ports vary depending upon the size of the IA-86 instruction received and the amount of microcoding and port sharing the operation requires. Intel calls this Turbo boost or something like that. Due to things and stuff, AMD can't have the same microcoding for legal reasons, and doesn't build processors with the same internal machine code anyway (in general, microcoding is a lot like a runtime compiler). As you said, Intel focuses on SIMD and parallelism, AMD... didn't really seem to have a focus (I couldn't get much from him, probably because he didn't know me) all I could really get was that their goal was to just integrate everything as one chip that does everything.

Now, both agreed that microcoding is sometimes idiotic. In all honesty, how often are you going to need to perform a double triple vector string back-flip shift (this is not a real x86 opcode), which they just happen to have the opcode for, if you can even remember it exists when you get to it? At some time in the late 70's some architect realized that he could just code the double triple vector string back-flip shift himself and not have the CPU waste time converting that abstract opcode into the smaller micro-ops that weren't that hard to code, and RISC was born. The idea being that microcode was slowing down processors, and there wasn't really any reason why the assembly programmer or a compiler couldn't just write out what the opcode mapped to anyway. Apple used Sun Microsystems (their building and fab used to be behind HP, across the other other street, right next to HP's old fab, which is now Alligent) which had RISC processors, which stomped the piss out of x86 when it came to SIMD shit like... music, graphics, drawing, ex cetera, and turned apple into the computer of the jobless hippie artist. >.>

On the other hand, microcode is absolutely necessary for things like Intel's out of order engine and the reordering thing on the other side, which are areas where RISC processors suffer, giving rise to the the CISC processors' comeback after out of order execution was successfully implemented in the late 90's/early 2000's.

Magazorb Wrote:supporting code designed for another architecture really isn't that hard,as long as it isn't a after though their's many ways you can go about it, the ways you listed are some, but more excist.
If the differance was obvious enough you can even have HW figure it out, would imagen that you would know what the code is you're going to execute is though once you was running with a kernel (only speculation).

I was thinking of how to do it efficiently, and really the only way is if it doesn't have to check and doesn't have to alter all your files on install to include a header of some type. The only problem I see is that as far as the computer is concerned, it's all just 1's and 0's, and I'm sure that every part of ARM codes to something in x86 due to it's sheer size (obviously, that something is not the right thing), so I don't know how it could be done without some kind of microcode that runs through the program until it starts to question what exactly the program is doing or it finds something that is not a supported opcode. Defaulting to ARM makes that easier, because then it's just checking the first byte (I think) of every 32 bits and seeing if it can find an unsupported opcode instead of trying to decipher varying length opcodes and instruction sets that can be any length and all that IA-86 bull shit.

Also, the AMD guy would not comment when I asked about that, but (for the lulz) I'm sure Intel will try to revoke AMD's x86 licence if they make any headway on this. (heheh, capitalism)

Magazorb Wrote:that concludes the notes.

I must apologies for offending you, however to be self critical can help, I'll fully admit at times i forget to look into all the details before declaring laws XD my point although i did word it very aggressively and perhaps in hindsight offensive is that you should be willing to accept incorrectness in your points and have a more open mind about how people respond to what you say Smile

Anyhow have a nice time.

It's all good.


RE: Architecture Comparison,questions. .. - Magazorb - 11-26-2014

Their's many ways to be efficient, stop condemning everything to the idea of 1, black implys white and white implys black, should polar opposites ever able to mix with a infinite possible ratios you get a grey-scale, the world isn't absolute and nor is it healthy to think so.

and that license stuff was a big mess between intel and AMD that went on for many years, it wasn't any good for anyone, which is why neither do it anymore and it would be unsurprising if employees of either company are obligated to not share their individual view.