[parisc-linux] Heavy Iron Reference Docs
John David Anglin
dave at hiauly1.hia.nrc.ca
Tue May 2 09:10:51 MDT 2006
> > I also suggest using an ordered store for the unlock operation. This
> > doesn't cost anything and may help to ensure that the order of memory
> > accesses as seen by another processor occur in the expected sequence.
>
> I've had it drilled into my head that all parisc implmentations
> have strongly ordered memory subsystems. John Marvin (jsm) has
> stated that more than a few times on this list. So "ordered store"
> is the same as a regular store.
Yes, I know this has been said and I've seen it in documents.
However, HP-UX 11i uses lots of "ordered" stores and one is never
quite sure about the currency of information in documentation.
> > It all comes down to this crucial bit of code in the ldcw description:
>
> I'd rather have someone like Jerry Huck or someone in his experience
> comment on this before we go down this path. I'll try to find
> someone to consult with this week.
That would be great and I think help to clear up our questions.
> > It may be this is only reliable on fully coherent systems. While
> > the N-class is classified as a UMA machine, it has two system buses
> > separated by a memory controller. Each system bus can handle four
> > processors with L1 and L2 cache.
>
> AFAIK, N-class has no L2 cache.
However, the rp7400 and rp7420 certainly do. See
<http://www.ccns.pl/zasoby/13/rp7400techwp5.pdf>. The rp7420
appears to have 32MB or 64MB of L2 per processor module.
> But it's worse than you think.
> N-class has two _Merced_ busses connected to the memory controller.
> Each Processor is connected via a double pumped Runway Bus to "Dew"
> which acts as a "bridge" to one of the Merced Busses.
>
> [ Digression - certain document says:
> In PA-RISC, code fetches are non-coherent, such that PCX-W doesn't
> even supply Vindex bits that would allow the code fetches to be
> coherent.
> ]
Because of this, if you want to execute instructions on the stack,
you have to flush the lines from both the instruction and data cache,
and do a sync before transferring to the code on the stack.
I just think that in machines with multiple data caches, we have to
be careful about the coherence of the data in these caches. You have
pointed out that I-cache code fetches are non-coherent. The L2 caches
in the rp7420 are combined. The logic could provide different behavior
for I and D accesses, then again it might not.
If the hardware on all PA-RISC machines guarantees full coherence
of the D-cache on all machines except the V class, then my concern
is misplaced. In that case, coherent_system should be 1 and the
",co" completer should have no effect on the semaphore operation,
other than to reduce the alignment requirement on PA 2.0 machines.
This would also imply that the store byte operation shown in the
semaphore paper doesn't improve spinlock performance.
Are there any PA 2.0 machines for which the value of coherent_system
is 0? If so, which ones?
Dave
--
J. David Anglin dave.anglin at nrc-cnrc.gc.ca
National Research Council of Canada (613) 990-0752 (FAX: 952-6602)
More information about the parisc-linux
mailing list