List Info

Thread: Coroutines without explicit support in the VM




Coroutines without explicit support in the VM
country flaguser name
United States
2007-08-21 11:15:08
It occurred to me that a Neko-based language might be able
to implement 
coroutines without adding support for coroutines to the VM. 
After all, 
it's been done with JavaScript; see JavaScript Strands 
(http://ww
w.xucia.com/strands-doc/index.html).  So what would be
the 
limitations of not having explicit coroutine support in the
VM?

Matt

-- 
Neko : One VM to run them all
(http://nekovm.org)

Re: Coroutines without explicit support in the VM
country flaguser name
France
2007-08-22 12:27:44
Matt Campbell a écrit :
> It occurred to me that a Neko-based language might be
able to implement
> coroutines without adding support for coroutines to the
VM.  After all,
> it's been done with JavaScript; see JavaScript Strands
> (http://ww
w.xucia.com/strands-doc/index.html).  So what would be
the
> limitations of not having explicit coroutine support in
the VM?

Yes, that would be possible.

Nicolas

-- 
Neko : One VM to run them all
(http://nekovm.org)

Re: Coroutines without explicit support in the VM
country flaguser name
United States
2007-08-22 13:36:24
Nicolas Cannasse wrote:
> Yes, that would be possible.

So do you think it would be desirable to have coroutines at
hte VM 
level?  I see this dilemma with VM-level coroutines:  If you
implement 
them as Lua does, with only a VM stack per coroutine, then
the 
implementation is portable and has low memory usage per
coroutine but 
can't be used with JIT; but if you implement them as LuaJIT
and the Io 
language do, with a CPU-level stack per coroutine, then this
requires 
platform-specific magic and (I think) has higher memory
usage per 
coroutine.  By implementing coroutines in a higher-level
compiler and a 
runtime library, as JavaScript Strands does, this dilemma
would be 
avoided.  Maybe haXe could have optional coroutines; I would
make this 
an option configurable at compile time, because coroutines
as 
implemented by JS Strands would probably introduce
unacceptable overhead 
in some cases.

Matt

-- 
Neko : One VM to run them all
(http://nekovm.org)

Re: Coroutines without explicit support in the VM
country flaguser name
Australia
2007-08-22 21:01:01
On Wed, 2007-08-22 at 13:36 -0500, Matt Campbell wrote:
> Nicolas Cannasse wrote:
> > Yes, that would be possible.
> 
> So do you think it would be desirable to have
coroutines at hte VM 
> level?  I see this dilemma with VM-level coroutines: 
If you implement 
> them as Lua does, with only a VM stack per coroutine,
then the 
> implementation is portable and has low memory usage per
coroutine but 
> can't be used with JIT; but if you implement them as
LuaJIT and the Io 
> language do, with a CPU-level stack per coroutine, then
this requires 
> platform-specific magic and (I think) has higher memory
usage per 
> coroutine.  By implementing coroutines in a
higher-level compiler and a 
> runtime library, as JavaScript Strands does, this
dilemma would be 
> avoided.  Maybe haXe could have optional coroutines; I
would make this 
> an option configurable at compile time, because
coroutines as 
> implemented by JS Strands would probably introduce
unacceptable overhead 
> in some cases.

Felix uses the higher level compiler trick, and generates
high
performance machine binaries. There's no reason for
coroutines
to impose any particularly onerous overhead.

A JIT for a VM would not be using the machine stack for
user data anyhow: it would be using the machine stack only
for transient temporary values. The JIT has to respect the
data structures used by VM-emulation (otherwise JIT
generated
machine code won't interoperate with interpreted code).

Coroutines use hardly any store. In Felix:

	channel: one bit flag + linked list of threads
		= one word cost

	thread: pointer to current continuation:
		= one word cost

	linked list of threads: two words per node

	scheduler_queue: linked_list of threads

	continuation: 
		caller return continuation: one word
		current frame address: one word
		service request pointer: one word

There is extra store in the continuation object for every
piece of data it needs to access (i.e. local variables).
	
Threads communicate on channels, using global data,
or via pointers passed as arguments when they're
constructed.

Each continuation consists of flat code with only gotos,
PLUS two special instructions:

	CALL: returns a new continuation to execute
	RETURN: returns the callers continuation

which the driver code executes by simply resuming
the returned object (until NULL is returned).

Although Felix generates C++ classes to do all this, and
has a lot of sophisticated optimisations to eliminate most
of the CALL/RETURN overhead (eg inlining), a VM could use
a simple mechanical layout, and the JIT would respect this
too: the JIT doesn't handle thread interleaving, it just
generates the flat code of the continuation body.

In particular note that CALL and RETURN work by *returning*
control, and so can ONLY be used when the machine stack is
empty. Instead, the stack consists of heap allocated
objects
linked together, which the GC cleans up.

This mechanism has been tested on my AMD64 3200 1G box,
and can handle in excess of a million threads with a
transaction
rate over 500K transactions per second.

A VM implementation should be just as fast IMHO .. since
the
Felix machine code generated is just machine code following
an abstract protocol anyhow. The only difference for Neko
would
be you'd be emulating the C++ with C eg the 'resume()'
method
of each continuation would be a pointer (instead of using
C++
generated virtual table dispatch).

In the JIT this pointer would be to generated machine code.
In the VM emulation, it would be a pointer to a thunk that
invokes the VM on the remaining data which would be
bytecode.
(or something similar).


-- 
John Skaller <skaller at users dot sf dot net>
Felix, successor to C++: http://felix.sf.net

-- 
Neko : One VM to run them all
(http://nekovm.org)

Re: Coroutines without explicit support in the VM
country flaguser name
France
2007-08-24 02:44:31
> Felix uses the higher level compiler trick, and
generates high
> performance machine binaries. There's no reason for
coroutines
> to impose any particularly onerous overhead.

I haven't been studying the pro or cons of high and low
level approachs
so I'm not qualified to compared how the two would apply to
Neko.
However, I tend to favor highlevel approaches when they are
possible
since they are less tricky to implement and more
crossplatform.

> A JIT for a VM would not be using the machine stack
for
> user data anyhow: it would be using the machine stack
only
> for transient temporary values. The JIT has to respect
the
> data structures used by VM-emulation (otherwise JIT
generated
> machine code won't interoperate with interpreted
code).

Well, this can be discussed.

Currently the Neko JIT is using the VM stack, but it could
also use the
C stack instead, and only push back values on the VM stack
when calling
a bytecode primitive.

This is possible because JIT is based on a per-function
basis, and
functions can't access outside of their local stack in
Neko.

Nicolas

-- 
Neko : One VM to run them all
(http://nekovm.org)

Re: Coroutines without explicit support in the VM
country flaguser name
Australia
2007-08-24 03:40:20
On Fri, 2007-08-24 at 09:44 +0200, Nicolas Cannasse wrote:
> > Felix uses the higher level compiler trick, and
generates high
> > performance machine binaries. There's no reason
for coroutines
> > to impose any particularly onerous overhead.
> 
> I haven't been studying the pro or cons of high and low
level approachs
> so I'm not qualified to compared how the two would
apply to Neko.
> However, I tend to favor highlevel approaches when they
are possible
> since they are less tricky to implement and more
crossplatform.

I haven't studied Neko so I can only guess what it does 

> > A JIT for a VM would not be using the machine
stack for
> > user data anyhow: it would be using the machine
stack only
> > for transient temporary values. The JIT has to
respect the
> > data structures used by VM-emulation (otherwise
JIT generated
> > machine code won't interoperate with interpreted
code).
> 
> Well, this can be discussed.
> 
> Currently the Neko JIT is using the VM stack, but it
could also use the
> C stack instead, and only push back values on the VM
stack when calling
> a bytecode primitive.
> 
> This is possible because JIT is based on a per-function
basis, and
> functions can't access outside of their local stack in
Neko.

Where does the callers "return address" go when
you call a function?

Felix does this in effect:

struct con_t { 
	int pc; 
	con_t *caller; 
	arg_t arg;

	con_t *call(con_t *caller_a, args) : 
		pc(0), caller(caller_a), arg(arg_a) { return this;}

	con_t *resume() {	
		switch (pc) {
		case 0:
			initially ..
			// SUBROUTINE CALL 
			pc = 1; // return address
			return new subroutine()-> call (this,args);
		case 1:
			...
			// RETURN
			return caller;
		}
	}
}

and the driver looks like:

	con_t *p = new root_proc()->call(NULL);
	while(p) p = p->resume();

What this does is maintain the call stack on the heap.
Each object is a continuation, pointing to its caller stack
frame.

With gcc, a computed goto on a machine address is used
instead 
of the switch on an integer case number.

So, switching context is a matter of switching the pointer
p
between all the current continuations (threads =
coroutines).

Felix uses 'service calls' and a service call flag in
addition
to the above to escape the while loop above, but I won't
describe that. Suffice it to say, exchange of control is
managed
by the client issuing service calls to read and write
channels.
[Note .. a service "call" is of course actually a
yield .. :]

An unbalanced read or write on a channel pops the
requesting
thread off the scheduler list, whereas a balanced one
pushes
both the reader(writer) and matching writer(reader) onto
the scheduler list).

Channels are created by the application and owned by 
whoever has a pointer to them. If all the owners of a
channel
are suspended waiting for service, then none will ever
come.
And, there are no pointers to these routines (threads).

So the garbage collector just disposes of them.. it's very
cool.

The system cannot deadlock in the sense that necessarily
locked
up threads (hanging on channels that cannot ever be serviced
because
no one else knows them) are unreachable.

Of course starvation is possible, there is absolutely no
fairness, there is little need for locking, but the
cooperating
threads do have to run on a single processor.

The same mechanism can (and often is) used for any VM which
has
a VM stack: just swap the stack. This would work for
machine
stacks too, except:

	(a) swapping machine stacks is unreliable, except
		by using native pre-emptive threading

	(b) machine stacks on OS like Unix have bounded
		pre-allocated address space (runs out
		fast on 32 bit machine)

As you can see, the principal constraint for the emulated
stack is simply that the machine stack is empty at the
point
control is yielded to the driver. This is almost invariably
the case in a bytecode interpreter anyhow .. unless
primitives
invoke it directly and recursively. In that case, you need
to
arrange for a service call to spawn the invocation instead.

BTW: the swapping of 'call' with 'yield' is known as
'control
inversion' because the master/slave caller/callee
relationship
is inverted. The driver loop 'reinverts' control to effect
the
intended semantics.

Almost all (bytecode) interpreters control invert already.
The magic of Felix is that it makes high performance
machine
code do it transparently.

But the bottom line for Neko is that it shouldn't be all
that hard to implement for the bytecode interpreter/VM,
and the JIT shouldn't be any major hassle. The big PLUS for
bytecode/VM compared with native code is that there is no
*extra* overhead in effecting the control inversion,
since bytecode interpretation already requires primitives
to return on a piecemeal basis.

A VM -- and Ocaml bytecode interpreter does this -- can
also context switch and yield points
"pre-emptively" either
by an emulated timer (in Python, every 3 bytecode
instructions),
or driven by actual OS pre-emptions (as in Ocaml VM) if
they're
available.


-- 
John Skaller <skaller at users dot sf dot net>
Felix, successor to C++: http://felix.sf.net

-- 
Neko : One VM to run them all
(http://nekovm.org)

[1-6]

about | contact  Other archives ( Real Estate discussion Medical topics )