This is just my personal blog. This is where I come to rant, or just type about whatever happens to be on my mind at the time.
Jan 10th 2019: ARM vs ARM:
I have decided to begin looking into how different implementations of the ARM ISA are being done, as well as other methods of implementing some of the features of the ISA. It has been an interesting set of learning experiences.
As far as optimal implementation of the HW for the core ARM ISA, there has been very little change in what is possible since the ARMv2. Even the 32-bit R15 implementations have the same core ISA with little added to the core ISA (MRS/MSR, UMULL/UMLAL/SMULL/SMLAL, CLZ being it in the core ISA). Yes there have also been a number of extensions, and I am omitting the divide instructions of the ARMv7 and later.
Beginning by looking at the core ISA and its implementations, as well as other methods of implementation not yet applied to the ARM ISA, I have realized that the ARM ISA has more potential than most realize:
A seven stage fan-in pipeline seems to be the most optimal for the ARM ISA
It would be possible to make most STM/LDM operations complete in a single effective cycle with some exceptions.
In Order Multiple Issue of instructions is fairly easy, without adding to much to the transistor count (easier than out of order, and easier to optimize code for). We could have up to 4 instructions per clock in many cases.
Multiply operations could be separated from the main pipeline, thus not causing any delay when taking more than a single clock.
For experimental purposes I am spending a little time working on a simulation of what I feel could be a good implementation of a CPU using the ARM ISA. For this I am concentrating on the ARMv1 ISA, at least for testing my ideas (Ok there is no multiply, that will come later). I will not claim to be great, everything I do is nothing more than making use of what I have learned from others.
With Reason, I Study This:
I am a RISC OS user, have been for a long time. With the ARMv8 we are starting to see implementations that no longer support the ARM ISA, replacing it with the AARCH64 ISA. As the API of RISC OS is closely tied to the ARM ISA we need an option in moving forward.
To implement an ARM ISA based CPU that has performance per speed (MIPS per MHz) that comes close to the high end CPU implementations we can ensure the future of RISC OS.
I think that the newest ARM ISA we can use at present without licensing is the ARMv2 ISA, though it should not be long before we can use the ARMv3 ISA (and that will give us everything).
Now to think about adding a section to my site on CPU design. Something I have played with ever since before university. I have always enjoyed toying with CPU design, and loved the multiple instruction issue research we did in university.
Jan 10th 2019: Lessons of Expectation Learned: A story of looking at GCC's output:
I decided to take a look at the code produced by gcc with the -O3 option to see how it does in optimisation. I was hoping to learn something about optimizing code, perhaps something I did not know. This was a mistake.
I found code sequences that are terrible. Some of the code using registers/flags that are the destination of one instruction in the next sometimes for four or more instructions in a row. Some times this even in a tight loop. Stack operations done in ways that are slow by there nature for what is being done. And so many more terribly slow things.
Further when compiling code that does not make any use of the standard library at all, GCC will still produce code to call the standard library many times. This seems like an obvious optimisation to me, do not call the library if it is not used, though apparently the maintainers of GCC do not see it.
I could forgive these poor implementations if the code being compiled did not give room to produce better results. Though I hand compiled the code after this to test, just doing things the way I automatically do, and tested the results against each other. The result was my crude hand compiled versions of a few tests ran between 1.2 and 4 times faster than the GCC versions. This was unexpected, especially as GCC's ability to optimize is an often quoted reason to avoid Assembly language.
The Lesson Learned:
Just because everyone says a compiler produces extremely optimal code does not make it true.
Believing that a compiler produces optimal code without testing it for your self is asking to be wrong.
Jan 7th 2019: Apologies:
I have removed yesterdays blog post. I should not have directed my personal frustration in a way that speaks about another person in example, that was my wrong.
I hereby apologize to any and all that may have been offended by yesterdays post.
Jan 3rd 2019: A new start:
I realize that when we shared our programs by taking copies with us to the RISC OS users get together for others to copy, at least here in the states, it was easy to get myself to give away copies of just about any little thing I had written. Now that the internet is the means of sharing programs, and there are not many US based RISC OS users, I seem to be a bit shy about posting code.
Well it is time for a new start. It is time to bring back just write it for fun, and put it out there, maybe someone will have a use for it maybe not.
I am starting with a series of tutorials on writing multitasking WIMP programs in ARM Assembly using the BASIC V assembler. This is a restart of a series I began writing back in 2012, and had posted on a forum, though ended up not finishing the posts because of being hesitant to share my code, for fear of errors (even though I tested every bit of it to an extreme).
Now I have decided to rewrite it from scratch, and not worry about the style or quality so much as getting the point across. I am writing the series in my spare time, so a little bit here and there. I have coded the first few, and written the documentation for the first one, so I am off to a decent start with the beginning of the year 2019, it is indeed a new year.