From stewart.brodie@ntlworld.com Thu Mar  1 19:11:02 2007
Path: ewrotcd!?@127.0.0.1!feed-ewrotcd!gemini.csx.cam.ac.uk!news.cam.ac.uk!feed4.jnfs.ja.net!jnfs.ja.net!feeder.news.heanet.ie!newsfeed.esat.net!colt.net!feeder.news-service.com!news2.euro.net!62.253.162.218.MISMATCH!news-in.ntli.net!newsrout1-win.ntli.net!ntli.net!news.highwinds-media.com!newspeer1-win.ntli.net!newsfe6-win.ntli.net.POSTED!53ab2750!not-for-mail
From: Stewart Brodie <stewart.brodie@ntlworld.com>
Newsgroups: comp.sys.acorn.programmer
Subject: Re: Compiling modules with -zM
Message-ID: <gemini.je85i90043qc203ew.stewart.brodie@ntlworld.com>
References: <88d1adbc4e.rik-news@iyonix.elements>
User-Agent: Gemini/2.28m (Qt/3.3.7) (Windows-XP)
MIME-Version: 1.0
Lines: 117
Content-Type: text/plain; charset=us-ascii
Date: Thu, 01 Mar 2007 12:31:01 GMT
NNTP-Posting-Host: 82.21.100.163
X-Trace: newsfe6-win.ntli.net 1172752261 82.21.100.163 (Thu, 01 Mar 2007 12:31:01 GMT)
NNTP-Posting-Date: Thu, 01 Mar 2007 12:31:01 GMT
Organization: NTL
Xref: news.chiark.greenend.org.uk comp.sys.acorn.programmer:21694

Rik Griffin <nospam@denbridgemarine.com> wrote:

> Am I right in thinking that the only effects of the "-zM" compiler switch
> (for Norcroft) is to allow a module to be multiply instantiated? Or are
> there other effects that are vital for module code?
> 
> I'm trying to track down an obscure low level bug and noticed that a
> couple of my modules aren't compiled with this switch - but then again
> none of the modules use instances. Is this likely to cause problems?

You are correct.


Now for the detailed explanation of what that option actually does :-)


What this switch actually does is alter the code generated for finding the
address of a non-automatic variables (i.e. static variables, file scope
variables, global variables).

The compiler knows that the address of the variable is at a specific offset
from C$$Data$$Base.  This offset is stored in the binary along with a
relocation, so that the constant gets C$$Data$$Base added to it.  The code
generated to load this address into a register is a PC-relative LDR.  That
is sufficient for application code, because the linker knows what
C$$Data$$Base is, so it can apply the relocations, and you end up with an
address constant in your binary.  For modules, this is not sufficient,
although note that modules are linked with C$$Data$$Base being fixed at the
value &8000, IIRC, so you get constants that are relative to &8000 (I am
assuming that this constant was chosen to avoid zero page disasters during
development!  It would make much more sense to make it zero).  In
relocatable code (e.g. a module), each adcon is then further relocated at
load-time by the difference between where C$$Data$$Base ended up in RAM and
&8000.  So now, all your adcons point to the initial copy of the data within
the module.

With -zM<N>, the compiler knows that it must add an additional constant to
the address after loading it.  This is the "static data relocation offset".
The <N> chooses which constant is added.  Your code is compiled -zM
(equivalent to -zM1); the C library itself is compiled -zM0 so that its
offsets are relative to a different slot.  I changed the way this was
implemented when I was working on the compiler many years ago (5.1x? 5.2x?).
I'll mention both behaviours here, because you'll see the old behaviour in
old compiled modules.

Old behaviour: this was ugly, but it worked, except in one case that turned
up in the late 1990s!  Load the ADCON with "LDR v0, [pc, #addrconstant]".
Stick in "LDR r12, [r10, #-0]" with a relocation with respect to the correct
magic static data relocation offset variable, whose name I can't remember
(note the sign bit was encoded in the instruction, not in the constant).
Then stick in "ADD v0, r12, v0".  vN are just the virtual registers used
that are converted to real register numbers later on.  This code works OK,
adding 8 bytes every time you want an address constant, provided that v0
doesn't get assigned to R12 ... which was the bizarrely rare case that
turned up :-)   The problem here was that the additional two instructions
were just injected directly into the output without the compiler really
"knowing" what they were doing.  That's how the register allocator could
make this critical mistake.

New behaviour: this was much neater, and let the compiler see what was going
on.  Again, you start with the "LDR v0, [pc, #addrconstant]".  But now, you
insert a new opcode to load the static data relocation offset into a virtual
register, "LDR v1, [r10, #-0]", plus the relocation, then add the two
together into another virtual register with "ADD v2, v1, v0".  v2 is now the
result that you wanted, and we also declare that the value in v1 is a
constant, plus that v0 and v1 as no longer required.  There are several
side-effects to this approach, notably in the area of optimisation, but
mainly that the register allocator can't use the same register by mistake!
By marking the value in 'v1' as a constant, the compiler can remove any
future loads of that value and just continue to use the original one.  This
would happen if you accessed multiple global variables in the same function.
Furthermore, because the ADD instruction is now just a normal arithmetic
operation that is known to the compiler, it can optimise it away completely
in some cases. Example: loading two integers:

; old
  LDR v0, int1location
  LDR r12, [r10, #-0]
  ADD v0, r12, v0
  LDR v3, [v0, #0]
  LDR v4, int2location
  LDR r12, [r10, #-0]
  ADD v4, r12, v4
  LDR v7, [v4, #0]

; new
  LDR v0, int1location
  LDR v1, [r10, #-0]
  ADD v2, v1, v0
  LDR v3, [v2, #0]
  LDR v4, int2location
  LDR v5, [r10, #-0]   ; constant, so gets converted to "MOV v5, v1"
  ADD v6, v5, v4
  LDR v7, [v6, #0]

The old behaviour would always generate 8 instructions because the compiler
didn't know about what the ADD instruction or immediately preceding LDR
instructions were doing.  It cannot be optimised any further.  The
newversion can be optimised much more aggressively, effectively to this:

  LDR v0, int1location
  LDR v1, [r10, #-0]
  LDR v3, [v0, v1]!
  LDR v4, int2location
  LDR v7, [v4, v1]!

Note how not only do we need fewer instructions, but fewer memory accesses
and fewer registers in the end too (we use more virtual registers, but they
get eliminated later).  The compiler will always writeback, because it makes
v1 discardable after the LDR instructions - so if you were updating the
integers and writing them back, it'd just use v0/v4.

This change shrunk the RISC OS ROM images by quite a bit too, so we could
cram more into our 4MB.

-- 
Stewart Brodie