Jim Blandy wrote:
> Michael Eager <eager eagercon.com> writes:
>> For an example, the SPEs in a Cell processors could
be configured
>> to distribute pieces of an array over different
SPEs.
>>
>>> How do you declare such an array? How do you
index it? What code is
>>> generated for an array access? How does it
relate to C's rules for
>>> pointer arithmetic?
>> In UPC (a parallel extension to C) there is a new
attribute "shared"
>> which says that data is (potentially) distributed
across multiple processors.
>>
>> In UPC, pointer arithmetic works exactly the same
as in C: you can
>> compare pointers, subtract them to get a
difference, and add integers.
>> The compiler generates code which does the correct
computation.
>
> All right. Certainly pointer arithmetic and array
indexing need to be
> fixed to handle such arrays. Support for such a system
will entail
> having the compiler describe to GDB how to index these
things, and
> having GDB understand those descriptions.
This may be more something that is better described in an
ABI than in
DWARF. The compiler may not know how to translate a pointer
into
a physical address. UPC, for example, allows you to specify
the number
of threads at runtime.
The compiler certainly can identify that an array or other
data
is shared, to use UPC's terminology. From there, the target
code
would need to perform some magic to figure out where the
address
actually pointed to.
> If those were fixed, how do the other CORE_ADDR uses
look to you?
> Say, in the frame code? Or the symtab code?
There are uses of CORE_ADDR values which assume that
arithmetic
operations are valid, such as testing whether a PC address
is
within a stepping range. These are not likely to cause
problems,
because code space generally does conform to the linear
space
assumptions that GDB makes.
There are other places where an address is incremented, such
as
in displaying memory contents. I doubt that the code knows
what what it is displaying, only to display n words starting
at
x address in z format. This would probably result in
incorrect
results if the data spanned from one processor/thread to
another.
(At least at a first approximation, this may well be an
acceptable
restriction.)
Symtab code would need a hook which converted the ELF
<section,offset> into a
<processor,thread,offset> for shared
objects. Again, that would require target-dependent magic.
I think that there's some similarity with TLS handling, but
I haven't looked at this closely. This sounds pretty
straight forward.
One problem may be that it may not be clear whether one has
a
pointer to a linear code space or to a distributed NUMA data
space.
It might be reasonable to model the linear code space as a
64-bit
CORE_ADDR, with the top half zero, while a NUMA address has
non-zero
values in the top half. (I don't know if there might be
alias
problems, where zero might be valid for the top half of a
NUMA address.)
I'd be very happy figuring out where to put a hook which
allowed me
to translate a NUMA CORE_ADDR into a physical address,
setting the
thread appropriately. A bit of a kludge, but probably
workable.
--
Michael Eager eager eagercon.com
1960 Park Blvd., Palo Alto, CA 94306 650-325-8077
|