From stewart.brodie@ntlworld.com Thu Mar 1 19:11:02 2007 Path: ewrotcd!?@127.0.0.1!feed-ewrotcd!gemini.csx.cam.ac.uk!news.cam.ac.uk!feed4.jnfs.ja.net!jnfs.ja.net!feeder.news.heanet.ie!newsfeed.esat.net!colt.net!feeder.news-service.com!news2.euro.net!62.253.162.218.MISMATCH!news-in.ntli.net!newsrout1-win.ntli.net!ntli.net!news.highwinds-media.com!newspeer1-win.ntli.net!newsfe6-win.ntli.net.POSTED!53ab2750!not-for-mail From: Stewart Brodie Newsgroups: comp.sys.acorn.programmer Subject: Re: Compiling modules with -zM Message-ID: References: <88d1adbc4e.rik-news@iyonix.elements> User-Agent: Gemini/2.28m (Qt/3.3.7) (Windows-XP) MIME-Version: 1.0 Lines: 117 Content-Type: text/plain; charset=us-ascii Date: Thu, 01 Mar 2007 12:31:01 GMT NNTP-Posting-Host: 82.21.100.163 X-Trace: newsfe6-win.ntli.net 1172752261 82.21.100.163 (Thu, 01 Mar 2007 12:31:01 GMT) NNTP-Posting-Date: Thu, 01 Mar 2007 12:31:01 GMT Organization: NTL Xref: news.chiark.greenend.org.uk comp.sys.acorn.programmer:21694 Rik Griffin wrote: > Am I right in thinking that the only effects of the "-zM" compiler switch > (for Norcroft) is to allow a module to be multiply instantiated? Or are > there other effects that are vital for module code? > > I'm trying to track down an obscure low level bug and noticed that a > couple of my modules aren't compiled with this switch - but then again > none of the modules use instances. Is this likely to cause problems? You are correct. Now for the detailed explanation of what that option actually does :-) What this switch actually does is alter the code generated for finding the address of a non-automatic variables (i.e. static variables, file scope variables, global variables). The compiler knows that the address of the variable is at a specific offset from C$$Data$$Base. This offset is stored in the binary along with a relocation, so that the constant gets C$$Data$$Base added to it. The code generated to load this address into a register is a PC-relative LDR. That is sufficient for application code, because the linker knows what C$$Data$$Base is, so it can apply the relocations, and you end up with an address constant in your binary. For modules, this is not sufficient, although note that modules are linked with C$$Data$$Base being fixed at the value &8000, IIRC, so you get constants that are relative to &8000 (I am assuming that this constant was chosen to avoid zero page disasters during development! It would make much more sense to make it zero). In relocatable code (e.g. a module), each adcon is then further relocated at load-time by the difference between where C$$Data$$Base ended up in RAM and &8000. So now, all your adcons point to the initial copy of the data within the module. With -zM, the compiler knows that it must add an additional constant to the address after loading it. This is the "static data relocation offset". The chooses which constant is added. Your code is compiled -zM (equivalent to -zM1); the C library itself is compiled -zM0 so that its offsets are relative to a different slot. I changed the way this was implemented when I was working on the compiler many years ago (5.1x? 5.2x?). I'll mention both behaviours here, because you'll see the old behaviour in old compiled modules. Old behaviour: this was ugly, but it worked, except in one case that turned up in the late 1990s! Load the ADCON with "LDR v0, [pc, #addrconstant]". Stick in "LDR r12, [r10, #-0]" with a relocation with respect to the correct magic static data relocation offset variable, whose name I can't remember (note the sign bit was encoded in the instruction, not in the constant). Then stick in "ADD v0, r12, v0". vN are just the virtual registers used that are converted to real register numbers later on. This code works OK, adding 8 bytes every time you want an address constant, provided that v0 doesn't get assigned to R12 ... which was the bizarrely rare case that turned up :-) The problem here was that the additional two instructions were just injected directly into the output without the compiler really "knowing" what they were doing. That's how the register allocator could make this critical mistake. New behaviour: this was much neater, and let the compiler see what was going on. Again, you start with the "LDR v0, [pc, #addrconstant]". But now, you insert a new opcode to load the static data relocation offset into a virtual register, "LDR v1, [r10, #-0]", plus the relocation, then add the two together into another virtual register with "ADD v2, v1, v0". v2 is now the result that you wanted, and we also declare that the value in v1 is a constant, plus that v0 and v1 as no longer required. There are several side-effects to this approach, notably in the area of optimisation, but mainly that the register allocator can't use the same register by mistake! By marking the value in 'v1' as a constant, the compiler can remove any future loads of that value and just continue to use the original one. This would happen if you accessed multiple global variables in the same function. Furthermore, because the ADD instruction is now just a normal arithmetic operation that is known to the compiler, it can optimise it away completely in some cases. Example: loading two integers: ; old LDR v0, int1location LDR r12, [r10, #-0] ADD v0, r12, v0 LDR v3, [v0, #0] LDR v4, int2location LDR r12, [r10, #-0] ADD v4, r12, v4 LDR v7, [v4, #0] ; new LDR v0, int1location LDR v1, [r10, #-0] ADD v2, v1, v0 LDR v3, [v2, #0] LDR v4, int2location LDR v5, [r10, #-0] ; constant, so gets converted to "MOV v5, v1" ADD v6, v5, v4 LDR v7, [v6, #0] The old behaviour would always generate 8 instructions because the compiler didn't know about what the ADD instruction or immediately preceding LDR instructions were doing. It cannot be optimised any further. The newversion can be optimised much more aggressively, effectively to this: LDR v0, int1location LDR v1, [r10, #-0] LDR v3, [v0, v1]! LDR v4, int2location LDR v7, [v4, v1]! Note how not only do we need fewer instructions, but fewer memory accesses and fewer registers in the end too (we use more virtual registers, but they get eliminated later). The compiler will always writeback, because it makes v1 discardable after the LDR instructions - so if you were updating the integers and writing them back, it'd just use v0/v4. This change shrunk the RISC OS ROM images by quite a bit too, so we could cram more into our 4MB. -- Stewart Brodie