|
List Info
Thread: Coroutines without explicit support in the VM
|
|
| Coroutines without explicit support in
the VM |
  United States |
2007-08-21 11:15:08 |
It occurred to me that a Neko-based language might be able
to implement
coroutines without adding support for coroutines to the VM.
After all,
it's been done with JavaScript; see JavaScript Strands
(http://ww
w.xucia.com/strands-doc/index.html). So what would be
the
limitations of not having explicit coroutine support in the
VM?
Matt
--
Neko : One VM to run them all
(http://nekovm.org)
|
|
| Re: Coroutines without explicit support
in the VM |
  France |
2007-08-22 12:27:44 |
Matt Campbell a écrit :
> It occurred to me that a Neko-based language might be
able to implement
> coroutines without adding support for coroutines to the
VM. After all,
> it's been done with JavaScript; see JavaScript Strands
> (http://ww
w.xucia.com/strands-doc/index.html). So what would be
the
> limitations of not having explicit coroutine support in
the VM?
Yes, that would be possible.
Nicolas
--
Neko : One VM to run them all
(http://nekovm.org)
|
|
| Re: Coroutines without explicit support
in the VM |
  United States |
2007-08-22 13:36:24 |
Nicolas Cannasse wrote:
> Yes, that would be possible.
So do you think it would be desirable to have coroutines at
hte VM
level? I see this dilemma with VM-level coroutines: If you
implement
them as Lua does, with only a VM stack per coroutine, then
the
implementation is portable and has low memory usage per
coroutine but
can't be used with JIT; but if you implement them as LuaJIT
and the Io
language do, with a CPU-level stack per coroutine, then this
requires
platform-specific magic and (I think) has higher memory
usage per
coroutine. By implementing coroutines in a higher-level
compiler and a
runtime library, as JavaScript Strands does, this dilemma
would be
avoided. Maybe haXe could have optional coroutines; I would
make this
an option configurable at compile time, because coroutines
as
implemented by JS Strands would probably introduce
unacceptable overhead
in some cases.
Matt
--
Neko : One VM to run them all
(http://nekovm.org)
|
|
| Re: Coroutines without explicit support
in the VM |
  Australia |
2007-08-22 21:01:01 |
On Wed, 2007-08-22 at 13:36 -0500, Matt Campbell wrote:
> Nicolas Cannasse wrote:
> > Yes, that would be possible.
>
> So do you think it would be desirable to have
coroutines at hte VM
> level? I see this dilemma with VM-level coroutines:
If you implement
> them as Lua does, with only a VM stack per coroutine,
then the
> implementation is portable and has low memory usage per
coroutine but
> can't be used with JIT; but if you implement them as
LuaJIT and the Io
> language do, with a CPU-level stack per coroutine, then
this requires
> platform-specific magic and (I think) has higher memory
usage per
> coroutine. By implementing coroutines in a
higher-level compiler and a
> runtime library, as JavaScript Strands does, this
dilemma would be
> avoided. Maybe haXe could have optional coroutines; I
would make this
> an option configurable at compile time, because
coroutines as
> implemented by JS Strands would probably introduce
unacceptable overhead
> in some cases.
Felix uses the higher level compiler trick, and generates
high
performance machine binaries. There's no reason for
coroutines
to impose any particularly onerous overhead.
A JIT for a VM would not be using the machine stack for
user data anyhow: it would be using the machine stack only
for transient temporary values. The JIT has to respect the
data structures used by VM-emulation (otherwise JIT
generated
machine code won't interoperate with interpreted code).
Coroutines use hardly any store. In Felix:
channel: one bit flag + linked list of threads
= one word cost
thread: pointer to current continuation:
= one word cost
linked list of threads: two words per node
scheduler_queue: linked_list of threads
continuation:
caller return continuation: one word
current frame address: one word
service request pointer: one word
There is extra store in the continuation object for every
piece of data it needs to access (i.e. local variables).
Threads communicate on channels, using global data,
or via pointers passed as arguments when they're
constructed.
Each continuation consists of flat code with only gotos,
PLUS two special instructions:
CALL: returns a new continuation to execute
RETURN: returns the callers continuation
which the driver code executes by simply resuming
the returned object (until NULL is returned).
Although Felix generates C++ classes to do all this, and
has a lot of sophisticated optimisations to eliminate most
of the CALL/RETURN overhead (eg inlining), a VM could use
a simple mechanical layout, and the JIT would respect this
too: the JIT doesn't handle thread interleaving, it just
generates the flat code of the continuation body.
In particular note that CALL and RETURN work by *returning*
control, and so can ONLY be used when the machine stack is
empty. Instead, the stack consists of heap allocated
objects
linked together, which the GC cleans up.
This mechanism has been tested on my AMD64 3200 1G box,
and can handle in excess of a million threads with a
transaction
rate over 500K transactions per second.
A VM implementation should be just as fast IMHO .. since
the
Felix machine code generated is just machine code following
an abstract protocol anyhow. The only difference for Neko
would
be you'd be emulating the C++ with C eg the 'resume()'
method
of each continuation would be a pointer (instead of using
C++
generated virtual table dispatch).
In the JIT this pointer would be to generated machine code.
In the VM emulation, it would be a pointer to a thunk that
invokes the VM on the remaining data which would be
bytecode.
(or something similar).
--
John Skaller <skaller at users dot sf dot net>
Felix, successor to C++: http://felix.sf.net
--
Neko : One VM to run them all
(http://nekovm.org)
|
|
| Re: Coroutines without explicit support
in the VM |
  France |
2007-08-24 02:44:31 |
> Felix uses the higher level compiler trick, and
generates high
> performance machine binaries. There's no reason for
coroutines
> to impose any particularly onerous overhead.
I haven't been studying the pro or cons of high and low
level approachs
so I'm not qualified to compared how the two would apply to
Neko.
However, I tend to favor highlevel approaches when they are
possible
since they are less tricky to implement and more
crossplatform.
> A JIT for a VM would not be using the machine stack
for
> user data anyhow: it would be using the machine stack
only
> for transient temporary values. The JIT has to respect
the
> data structures used by VM-emulation (otherwise JIT
generated
> machine code won't interoperate with interpreted
code).
Well, this can be discussed.
Currently the Neko JIT is using the VM stack, but it could
also use the
C stack instead, and only push back values on the VM stack
when calling
a bytecode primitive.
This is possible because JIT is based on a per-function
basis, and
functions can't access outside of their local stack in
Neko.
Nicolas
--
Neko : One VM to run them all
(http://nekovm.org)
|
|
| Re: Coroutines without explicit support
in the VM |
  Australia |
2007-08-24 03:40:20 |
On Fri, 2007-08-24 at 09:44 +0200, Nicolas Cannasse wrote:
> > Felix uses the higher level compiler trick, and
generates high
> > performance machine binaries. There's no reason
for coroutines
> > to impose any particularly onerous overhead.
>
> I haven't been studying the pro or cons of high and low
level approachs
> so I'm not qualified to compared how the two would
apply to Neko.
> However, I tend to favor highlevel approaches when they
are possible
> since they are less tricky to implement and more
crossplatform.
I haven't studied Neko so I can only guess what it does
> > A JIT for a VM would not be using the machine
stack for
> > user data anyhow: it would be using the machine
stack only
> > for transient temporary values. The JIT has to
respect the
> > data structures used by VM-emulation (otherwise
JIT generated
> > machine code won't interoperate with interpreted
code).
>
> Well, this can be discussed.
>
> Currently the Neko JIT is using the VM stack, but it
could also use the
> C stack instead, and only push back values on the VM
stack when calling
> a bytecode primitive.
>
> This is possible because JIT is based on a per-function
basis, and
> functions can't access outside of their local stack in
Neko.
Where does the callers "return address" go when
you call a function?
Felix does this in effect:
struct con_t {
int pc;
con_t *caller;
arg_t arg;
con_t *call(con_t *caller_a, args) :
pc(0), caller(caller_a), arg(arg_a) { return this;}
con_t *resume() {
switch (pc) {
case 0:
initially ..
// SUBROUTINE CALL
pc = 1; // return address
return new subroutine()-> call (this,args);
case 1:
...
// RETURN
return caller;
}
}
}
and the driver looks like:
con_t *p = new root_proc()->call(NULL);
while(p) p = p->resume();
What this does is maintain the call stack on the heap.
Each object is a continuation, pointing to its caller stack
frame.
With gcc, a computed goto on a machine address is used
instead
of the switch on an integer case number.
So, switching context is a matter of switching the pointer
p
between all the current continuations (threads =
coroutines).
Felix uses 'service calls' and a service call flag in
addition
to the above to escape the while loop above, but I won't
describe that. Suffice it to say, exchange of control is
managed
by the client issuing service calls to read and write
channels.
[Note .. a service "call" is of course actually a
yield .. :]
An unbalanced read or write on a channel pops the
requesting
thread off the scheduler list, whereas a balanced one
pushes
both the reader(writer) and matching writer(reader) onto
the scheduler list).
Channels are created by the application and owned by
whoever has a pointer to them. If all the owners of a
channel
are suspended waiting for service, then none will ever
come.
And, there are no pointers to these routines (threads).
So the garbage collector just disposes of them.. it's very
cool.
The system cannot deadlock in the sense that necessarily
locked
up threads (hanging on channels that cannot ever be serviced
because
no one else knows them) are unreachable.
Of course starvation is possible, there is absolutely no
fairness, there is little need for locking, but the
cooperating
threads do have to run on a single processor.
The same mechanism can (and often is) used for any VM which
has
a VM stack: just swap the stack. This would work for
machine
stacks too, except:
(a) swapping machine stacks is unreliable, except
by using native pre-emptive threading
(b) machine stacks on OS like Unix have bounded
pre-allocated address space (runs out
fast on 32 bit machine)
As you can see, the principal constraint for the emulated
stack is simply that the machine stack is empty at the
point
control is yielded to the driver. This is almost invariably
the case in a bytecode interpreter anyhow .. unless
primitives
invoke it directly and recursively. In that case, you need
to
arrange for a service call to spawn the invocation instead.
BTW: the swapping of 'call' with 'yield' is known as
'control
inversion' because the master/slave caller/callee
relationship
is inverted. The driver loop 'reinverts' control to effect
the
intended semantics.
Almost all (bytecode) interpreters control invert already.
The magic of Felix is that it makes high performance
machine
code do it transparently.
But the bottom line for Neko is that it shouldn't be all
that hard to implement for the bytecode interpreter/VM,
and the JIT shouldn't be any major hassle. The big PLUS for
bytecode/VM compared with native code is that there is no
*extra* overhead in effecting the control inversion,
since bytecode interpretation already requires primitives
to return on a piecemeal basis.
A VM -- and Ocaml bytecode interpreter does this -- can
also context switch and yield points
"pre-emptively" either
by an emulated timer (in Python, every 3 bytecode
instructions),
or driven by actual OS pre-emptions (as in Ocaml VM) if
they're
available.
--
John Skaller <skaller at users dot sf dot net>
Felix, successor to C++: http://felix.sf.net
--
Neko : One VM to run them all
(http://nekovm.org)
|
|
[1-6]
|
|