Dear all!
We deployed some SIP server application on top of
multi-core Sparc machine.
The CPU number is matched with the number of application
processes. All processes are
waiting on the same socket and doing the same thing in a
loop: call a blocking recvfrom
and then process the received message. The transport we
are using is UDP.
What we observed is that when we set the CPU number to be
larger than 16,
(user application processes number is also 16), very
severe performance degradation will occur.
We used lockstat to profile the kernel and lock contention
and referenced
opensolaris source code. What we found is as following:
1) the function "mutex_vector_enter" accounts
for 50% of the CPU time.
2) most of mutex_vector_enter is called by
"cv_wait_sig", and the call graph is:
recvfrom -> syscall_trap32-> recvfrom -> recvit
-> sotpi_recvmsg->so_lock_read_intr-> cv_wait_sig
3) so_lock_read_intr seems like a function to serialize
the function "kstrgetmsg" which copy data from
kernel space to user buffer. Thus, only one user can do this
copy at a time. It seems unnessesary for simply UDP
processing, because kernel can just lock during dequeuing
the packet from sock queue, but doesn't lock during copying
data from kernel to user space, just like what linux does.
Our question is: Is there an alternative path for
"recvfrom" which is more simple than the current
sotpi_recvmsg imeplementation, and will not lock during
copying data from kernel to user space ?
We believe there exists a more direct channel between the
user application and the network stack for UDP, can anyone
with this knowledge give us a hand? Thanks!
Cheers
Yours
Jia
_______________________________________________
edu-discuss mailing list
edu-discuss opensolaris.org
|