wow, doesn't he know how to bitch? XD yer sorry about the FPU i did write up a post that i was ment to put up, i think i just might have not have press send but i lost that and can't be asked to write up a full on apologie about my missunderstanding of FPUs again, but yes, what i say about FPUs take litely, but i'm aware their are units which are of 64bit that can be broken down to run several other smaller previsions (weather they have extra stuff for the required extra stuff or what i've no idea so don't qoute me for the essacts, but they excist, they do... search it up)
al so i DID make it clear they had several control units, modern gpus for the most part are only M-SIMD processors, meaning 1 CU controls many processors doing virualy the same thing, and having several arrays of those, for AMD it's 40/44 in the 290/290X respectivly, with suspecting that the 390X would be 64, each having 64PEs, thus 64x64=4096 cores (64 works rather beutifuly here so it's easy to see why someone would choice it)
also to correct you a lot of times in smaller GPGPU tasks GPUs will have to do many things at a time, especially if you branch, this is well documented in many GPGPU researches, i never once said or tried to suggest that GPUs would try to run as MIMD, however they do have some aspects that loosely relate.
Also i did mean TFLOPs (that kind of typo would have been obvious to self correct XD, if you're here for the argument really don't bother, i don't care for arguing, i just pointed out some points, the fact they arnt 100% accuratly explained wasn't the point, i gave it in really low acuratcy on purpose, because most people don't care for knowing absolutely everything.
Also the bandwidth i gave was bandwidth, i'm fully aware of how fast they execute instructions is that, however their's times in a GPU where they have to load alot of data rather quickly (normally in GPGPU) this is when the bandwidth proves it's worth and they have shown useful improvements in the past, hence why both AMD and N-vidia have been trying to increase memory bandwidth (N-vidia i believe did a 512bit memory interface once then bailed on it due to costs to produce, meanwhile AMD made a push for it on the 290(X) either way it was a bandwidth measurement, not a instruction measurement.
No offence if GPUs only needed to do graphics processing they wouldn't have cores, the whole transition to Cores was because they really did increase how versitile the GPUs ware, AMD did a huge thing with core cound and N-vidia got left behind back when N-vidia had 400 series, many people still do choice AMD for their double precision performance over N-vidia
Do please try to remember like 10 years ago you would be hard pressed to find a GPU that had any cores on them, they normally had dedicated processor for particular graphic processing things, think of them almost like ASICs but with some verability
I do know a little about image processing, but my post wasn't about image processing, it was about compute, which image processing also does, and yes for image processing it mostly has the GPU running informally but they did still have several processses to go through (much more now days) to resolve a single imagine, i'm aware that every pixel is processed and blarr blarr blarr, again that wasn't the sole purpose of my post, if you want only that you can go and make your self a ASIC that does nothing but image processing, it would get you better performance but GPUs do a whole lot more then what they used too.
Don't get me wrong i don't mean to be rude, i made some observations about where N-vidia was lacking behind AMD (mainly Double Precision) and states some of the things i believed was why, then made a few speculations to the future specs based on recent architectures for both vendors, then explained about the M-SIMD arrays that they tend to have.
also about CUs, N-vidia only recently moved to several CUs per a whole GPU, they seriously did run several hundreds of cores of a single CU, it was just impractical hence why they moved away from it.
I could have also explained how CPUs use 1D memory arrays in their ISs while GPUs normaly use 2D arrays, further more they are made very differently, i'm not going to explain the differences, they should be apparent enough.
but may i ask you a reason, if you have two single precision FPUs, what's to stop you adding a few muxes and a few of the extra parts required to replicate a double FPU, or vice versa? just because things are different doesn't mean that they can't be used for it.
as for input rates i never said anything about them, for the most part GPUs have close to full use of that bandwidth going into them, but not much for returning values, but this normally isn't a issue on anything you'd use a consumer grade GPU on anyhow.
Also i'd advice learning C++ i you want to learn to program in this much depth, that or learn all the other ISs related to processors and the features they bring to ASM and how to make full use of them (most people will say they know ASM at the point they know how to do 5+3, big deal, but the true programmers will probably decline knowing it in well depth even when they're top 10% in terms of programming skill, just purely because they realize how much they don't know, get your self their and you're doing well
I hope this clarifies some of what i was saying in that post, but i don't mind if this goes on so much , Again apologies on lack of FPU knowledge
P.S. Tell your Father his correct, nobody does, because C++ and C compile really well into ASM that it's pretty pointless, the only reason the better programmers like to do this is to see if they can little optimizations or see if theirs a small issue with something, normally it's used for optimizing and understanding what's going on better.
but only a fool will believe java will do everything, those who believe in good programming practice typically believe it's a humans job, not a job for a tool.
al so i DID make it clear they had several control units, modern gpus for the most part are only M-SIMD processors, meaning 1 CU controls many processors doing virualy the same thing, and having several arrays of those, for AMD it's 40/44 in the 290/290X respectivly, with suspecting that the 390X would be 64, each having 64PEs, thus 64x64=4096 cores (64 works rather beutifuly here so it's easy to see why someone would choice it)
also to correct you a lot of times in smaller GPGPU tasks GPUs will have to do many things at a time, especially if you branch, this is well documented in many GPGPU researches, i never once said or tried to suggest that GPUs would try to run as MIMD, however they do have some aspects that loosely relate.
Also i did mean TFLOPs (that kind of typo would have been obvious to self correct XD, if you're here for the argument really don't bother, i don't care for arguing, i just pointed out some points, the fact they arnt 100% accuratly explained wasn't the point, i gave it in really low acuratcy on purpose, because most people don't care for knowing absolutely everything.
Also the bandwidth i gave was bandwidth, i'm fully aware of how fast they execute instructions is that, however their's times in a GPU where they have to load alot of data rather quickly (normally in GPGPU) this is when the bandwidth proves it's worth and they have shown useful improvements in the past, hence why both AMD and N-vidia have been trying to increase memory bandwidth (N-vidia i believe did a 512bit memory interface once then bailed on it due to costs to produce, meanwhile AMD made a push for it on the 290(X) either way it was a bandwidth measurement, not a instruction measurement.
No offence if GPUs only needed to do graphics processing they wouldn't have cores, the whole transition to Cores was because they really did increase how versitile the GPUs ware, AMD did a huge thing with core cound and N-vidia got left behind back when N-vidia had 400 series, many people still do choice AMD for their double precision performance over N-vidia
Do please try to remember like 10 years ago you would be hard pressed to find a GPU that had any cores on them, they normally had dedicated processor for particular graphic processing things, think of them almost like ASICs but with some verability
I do know a little about image processing, but my post wasn't about image processing, it was about compute, which image processing also does, and yes for image processing it mostly has the GPU running informally but they did still have several processses to go through (much more now days) to resolve a single imagine, i'm aware that every pixel is processed and blarr blarr blarr, again that wasn't the sole purpose of my post, if you want only that you can go and make your self a ASIC that does nothing but image processing, it would get you better performance but GPUs do a whole lot more then what they used too.
Don't get me wrong i don't mean to be rude, i made some observations about where N-vidia was lacking behind AMD (mainly Double Precision) and states some of the things i believed was why, then made a few speculations to the future specs based on recent architectures for both vendors, then explained about the M-SIMD arrays that they tend to have.
also about CUs, N-vidia only recently moved to several CUs per a whole GPU, they seriously did run several hundreds of cores of a single CU, it was just impractical hence why they moved away from it.
I could have also explained how CPUs use 1D memory arrays in their ISs while GPUs normaly use 2D arrays, further more they are made very differently, i'm not going to explain the differences, they should be apparent enough.
but may i ask you a reason, if you have two single precision FPUs, what's to stop you adding a few muxes and a few of the extra parts required to replicate a double FPU, or vice versa? just because things are different doesn't mean that they can't be used for it.
as for input rates i never said anything about them, for the most part GPUs have close to full use of that bandwidth going into them, but not much for returning values, but this normally isn't a issue on anything you'd use a consumer grade GPU on anyhow.
Also i'd advice learning C++ i you want to learn to program in this much depth, that or learn all the other ISs related to processors and the features they bring to ASM and how to make full use of them (most people will say they know ASM at the point they know how to do 5+3, big deal, but the true programmers will probably decline knowing it in well depth even when they're top 10% in terms of programming skill, just purely because they realize how much they don't know, get your self their and you're doing well
I hope this clarifies some of what i was saying in that post, but i don't mind if this goes on so much , Again apologies on lack of FPU knowledge
P.S. Tell your Father his correct, nobody does, because C++ and C compile really well into ASM that it's pretty pointless, the only reason the better programmers like to do this is to see if they can little optimizations or see if theirs a small issue with something, normally it's used for optimizing and understanding what's going on better.
but only a fool will believe java will do everything, those who believe in good programming practice typically believe it's a humans job, not a job for a tool.