List Info

Thread: Codecs implementation




Codecs implementation
user name
2006-12-12 11:25:47
Hi devs,

Currently I'm working on the speex implementation. Its
almost done but I 
have some problems with computing the duration of the media
based on the 
in data.

So as I'm nearly finishing work on the speex codec I need
some help.

Emil send me a ilbc implementation in java written by Jean
Lorchat and I 
started looking at it and how can be plugged in to JMF. But 
I have some 
difficulties maybe Jean can help me )
The encoder and decoder requires byte arrays as input and
output. So as 
I'm not so familiar with this codec  I don't know for
example how to 
compute the length of the output array for the IlbcDecoder
based on the 
input one. And also the same problem as the one with speex
how to 
compute the duration in milliseconds of the media based on
the given 
length of the data.

damencho


------------------------------------------------------------
---------
To unsubscribe, e-mail: dev-unsubscribesip-communicator.dev.java.net
For additional commands, e-mail: dev-helpsip-communicator.dev.java.net

Codecs implementation
user name
2006-12-12 12:34:21
Hi damencho,

I hear someone is talking about me 

> Currently I'm working on the speex implementation. Its
almost done but I
> have some problems with computing the duration of the
media based on the
> in data.

Shouldn't be hard to fix. Let's discuss ilbc, shall we.

> The encoder and decoder requires byte arrays as input
and output. So as
> I'm not so familiar with this codec  I don't know for
example how to
> compute the length of the output array for the
IlbcDecoder based on the
> input one. 

As per RFC specifications of the ilbc codec, the input data
MUST be 8000
Hz sampled 16 bits data. To express this in more friendly
words, the
data comes as 16 bits elements (a short int) and you need
8000 such
elements (samples) to represent one second.

Now, we still have to talk about how to feed it to the
codec. Once again
we refer to the RFC and it says that ilbc can operate using
two modes.
It always has to handle data by blocks, but can do so with
20 or 30
milliseconds blocks. Since we have 8000 samples per seconds
this means
that an input block is exactly 160 samples (in 20ms mode) or
240 samples
(in 30 ms mode).

I'll have to look at the code again but I think that there
is a version
of the coding/decoding function that works with shorts[].
Otherwise, you
have to split all your 16-bits values in the byte array.
This means that
if you use byte arrays, they are 320 bytes large (20ms mode)
or 480
bytes large (30 ms mode).

Of course, the size of the compressed data is also defined
in the RFC.
For the 20ms mode, the compressed data stream is 304 bits
large (38
bytes) and in the 30ms mode, it is 400 bits large (50
bytes). Bitrates
are respectively around 15 and 13 kbps.

Symmetrically, if you decode 50 bytes of data, you will get
480 bytes of
sound (or 240 samples or 30 ms) and if you decode 38 bytes
of data
you'll get 320 bytes of data (or 160 samples or 20 ms).

As a sidenote, the 20ms/30ms mode must be configurable
because although
ilbc is interoperable, a 20ms stream is no good when decoded
with 30ms
mode et vice versa.

> And also the same problem as the one with speex how to
> compute the duration in milliseconds of the media based
on the given
> length of the data.

As you can see from ilbc example, it all depends on the
specifications.
If you have some document about speex at handy, I'll check
right away.
Otherwise I'll dig the web to find the answers =)

Cheers,
Jean

------------------------------------------------------------
---------
To unsubscribe, e-mail: dev-unsubscribesip-communicator.dev.java.net
For additional commands, e-mail: dev-helpsip-communicator.dev.java.net

Codecs implementation
user name
2006-12-12 12:59:43
Hi Jean,
thanks for the quick answer I think it will be very helpful
and it 
answers my questions )
actually the problem with speex is not the duration I
mentioned.
I'm testing with asterisk and it seems when encoding speech
asterisk 
sends rtp packets with more than one frames.
The RFC mentions that the decoder must detect and handle if
such data is 
passed to him.  But it seems jspeex doesn't .
I cannot get the number of frames in a packet. In the RFC is
mentioned 
that in the rtp packet there is no sense to put such data as
decoder 
must detect this.
Anyway thanks again a will struggle a little bit more with
speex and 
will start with the ilbc. I will write for my progress.

damencho


Jean Lorchat wrote:
> Hi damencho,
>
> I hear someone is talking about me 
>
>   
>> Currently I'm working on the speex implementation.
Its almost done but I
>> have some problems with computing the duration of
the media based on the
>> in data.
>>     
>
> Shouldn't be hard to fix. Let's discuss ilbc, shall we.
>
>   
>> The encoder and decoder requires byte arrays as
input and output. So as
>> I'm not so familiar with this codec  I don't know
for example how to
>> compute the length of the output array for the
IlbcDecoder based on the
>> input one. 
>>     
>
> As per RFC specifications of the ilbc codec, the input
data MUST be 8000
> Hz sampled 16 bits data. To express this in more
friendly words, the
> data comes as 16 bits elements (a short int) and you
need 8000 such
> elements (samples) to represent one second.
>
> Now, we still have to talk about how to feed it to the
codec. Once again
> we refer to the RFC and it says that ilbc can operate
using two modes.
> It always has to handle data by blocks, but can do so
with 20 or 30
> milliseconds blocks. Since we have 8000 samples per
seconds this means
> that an input block is exactly 160 samples (in 20ms
mode) or 240 samples
> (in 30 ms mode).
>
> I'll have to look at the code again but I think that
there is a version
> of the coding/decoding function that works with
shorts[]. Otherwise, you
> have to split all your 16-bits values in the byte
array. This means that
> if you use byte arrays, they are 320 bytes large (20ms
mode) or 480
> bytes large (30 ms mode).
>
> Of course, the size of the compressed data is also
defined in the RFC.
> For the 20ms mode, the compressed data stream is 304
bits large (38
> bytes) and in the 30ms mode, it is 400 bits large (50
bytes). Bitrates
> are respectively around 15 and 13 kbps.
>
> Symmetrically, if you decode 50 bytes of data, you will
get 480 bytes of
> sound (or 240 samples or 30 ms) and if you decode 38
bytes of data
> you'll get 320 bytes of data (or 160 samples or 20 ms).
>
> As a sidenote, the 20ms/30ms mode must be configurable
because although
> ilbc is interoperable, a 20ms stream is no good when
decoded with 30ms
> mode et vice versa.
>
>   
>> And also the same problem as the one with speex how
to
>> compute the duration in milliseconds of the media
based on the given
>> length of the data.
>>     
>
> As you can see from ilbc example, it all depends on the
specifications.
> If you have some document about speex at handy, I'll
check right away.
> Otherwise I'll dig the web to find the answers =)
>
> Cheers,
> Jean
>
>
------------------------------------------------------------
---------
> To unsubscribe, e-mail: dev-unsubscribesip-communicator.dev.java.net
> For additional commands, e-mail: dev-helpsip-communicator.dev.java.net
>
>
>   

------------------------------------------------------------
---------
To unsubscribe, e-mail: dev-unsubscribesip-communicator.dev.java.net
For additional commands, e-mail: dev-helpsip-communicator.dev.java.net

Codecs implementation
user name
2006-12-12 14:24:01
Hi again,

> I'm testing with asterisk and it seems when encoding
speech asterisk 
> sends rtp packets with more than one frames.

Makes sense if you want to use bandwidth more efficiently.
Although you 
raise the latency at the same time.

> The RFC mentions that the decoder must detect and
handle if such data is 
> passed to him.  But it seems jspeex doesn't .

:(

> I cannot get the number of frames in a packet. In the
RFC is mentioned 
> that in the rtp packet there is no sense to put such
data as decoder 
> must detect this.

Actually, this makes sense too. I can imagine that the size
of a frame 
must be fixed and that as such, based on payload size, it is
possible to 
guess the number of frames. Then again, I might have missed
something. 
Let me think about that with the RFCs. I'll be back =)

Jean

------------------------------------------------------------
---------
To unsubscribe, e-mail: dev-unsubscribesip-communicator.dev.java.net
For additional commands, e-mail: dev-helpsip-communicator.dev.java.net

Codecs implementation
user name
2006-12-13 09:59:36
Hi,

Here is the progress about speex codec implementation.
It is recommended to put one frame of speex data in rtp
packet.
1 frame = 160 samples = 20 ms.
I found some code which counts the samples in given encoded
data.
There are two situations when receiving media :
1. Receiving data is 160 samples and is decoded ok.
2. receiving data is with varying number of samples -
160,320,480. Which 
are processed from the decoder but the sound is garbage. As
I read in 
various documents decoder must handle this. But it seems
not.
I have write to the jspeex forum and I'm waiting for
response.
This is for now. I get with the ilbc right now hope it will
be ok )

damencho

------------------------------------------------------------
---------
To unsubscribe, e-mail: dev-unsubscribesip-communicator.dev.java.net
For additional commands, e-mail: dev-helpsip-communicator.dev.java.net

Codecs implementation
user name
2006-12-13 13:06:06
Hi Jean,

The ilbc decoder works fine  no I'm
struggling with the encoder. The 
jmf  pass byte buffers with length at about 2000.
I'm tring to process them on portions. As I have understood 
from the 
previous mail the encoder gets 480 bytes (30ms mode) and
returns the encoded data to 50 bytes. Am I right ?  

damencho

Jean Lorchat wrote:
> Hi again,
>
>> I'm testing with asterisk and it seems when
encoding speech asterisk 
>> sends rtp packets with more than one frames.
>
> Makes sense if you want to use bandwidth more
efficiently. Although 
> you raise the latency at the same time.
>
>> The RFC mentions that the decoder must detect and
handle if such data 
>> is passed to him.  But it seems jspeex doesn't .
>
> :(
>
>> I cannot get the number of frames in a packet. In
the RFC is 
>> mentioned that in the rtp packet there is no sense
to put such data 
>> as decoder must detect this.
>
> Actually, this makes sense too. I can imagine that the
size of a frame 
> must be fixed and that as such, based on payload size,
it is possible 
> to guess the number of frames. Then again, I might have
missed 
> something. Let me think about that with the RFCs. I'll
be back =)
>
> Jean
>
>
------------------------------------------------------------
---------
> To unsubscribe, e-mail: dev-unsubscribesip-communicator.dev.java.net
> For additional commands, e-mail: dev-helpsip-communicator.dev.java.net
>
>

------------------------------------------------------------
---------
To unsubscribe, e-mail: dev-unsubscribesip-communicator.dev.java.net
For additional commands, e-mail: dev-helpsip-communicator.dev.java.net

Codecs implementation
user name
2006-12-14 04:26:13
Hi damencho,

> The ilbc decoder works fine  no I'm
struggling with the encoder. The

Glad to see it not *only works for me* :-D

> jmf  pass byte buffers with length at about 2000.

Wow... can't we lower this a bit ? Because at 8 kHz, this
makes 1000
samples (i.e. 125 ms). Which means almost as much latency.
Anyway, let's
get it working first.

> I'm tring to process them on portions. As I have
understood  from the
> previous mail the encoder gets 480 bytes (30ms mode)
and

yes

> returns the encoded data to 50 bytes. Am I right ? 

yes !

A small problem you might have, is when
sizeof(jmf_byte_buffer) % 480 != 0.

To all other people (if you read this far... emil ? u there
?), I'd also
like to have a small discussion about latency and I do not
know when is
the right time to start that thread. Because low latency is
a very
important feature, and it's easier to start implementing
properly even
if this means some overhead right now. Of course I'll have
more
information to provide to this issue when the native alsa
source is
finished... All right, I'm late. Feel free to hit me (that's
why I'm
living so far away)...

Cheers,
Jean


------------------------------------------------------------
---------
To unsubscribe, e-mail: dev-unsubscribesip-communicator.dev.java.net
For additional commands, e-mail: dev-helpsip-communicator.dev.java.net

Codecs implementation
user name
2006-12-14 06:15:10
Jean Lorchat wrote:
> A small problem you might have, is when
sizeof(jmf_byte_buffer) % 480 != 0.
>
>   
I've tried to handle this but every time I had try the sound
is 
garbage.  So I tried other example - not processing the
bytes from 
buffer but bytes which
are at portions of 480 bytes and were processed by the
decoder (like 
echo application) but this way the sound also is not ok. Can
be this 
something with the encoder ?

damencho

------------------------------------------------------------
---------
To unsubscribe, e-mail: dev-unsubscribesip-communicator.dev.java.net
For additional commands, e-mail: dev-helpsip-communicator.dev.java.net

Codecs implementation
user name
2006-12-14 07:55:52
Hi again,

> I've tried to handle this but every time I had try the
sound is
> garbage.  So I tried other example - not processing the
bytes from
> buffer but bytes which
> are at portions of 480 bytes and were processed by the
decoder (like
> echo application) but this way the sound also is not
ok. Can be this
> something with the encoder ?

First of all, did you try the code as standalone application
? I mean
based on local files. Since the code is based on RFC code,
it can work
standalone. This is how I tested it back then and it sounded
fine.
HOWEVER I might have submitted a wrong version to you.

Steps you can try :

1/ convert some audio file to raw, 16bits (LE), 8000 Hz
2/ feed it to the standalone encoder/decoder application
   it is going to produce 2 more files : one compressed
stream
   and one decoded stream based on compressed data.
3/ Listen to the decoded file. If it sounds like the source
file, then
there you are. Otherwise, blame me 

Then if it is working this way, we will make it work the
other way. Can
you please describe what is not working in more details :
. what is the input stream (jmf from capture device, audio
file, jmf
from network) ?
. what is the format (compressed ilbc, or 8000 Hz audio)
. what is it you get (obviously awful noise)

If you want, we can discuss this longer on icq/irc

jean
> 
> damencho
> 
>
------------------------------------------------------------
---------
> To unsubscribe, e-mail: dev-unsubscribesip-communicator.dev.java.net
> For additional commands, e-mail: dev-helpsip-communicator.dev.java.net

------------------------------------------------------------
---------
To unsubscribe, e-mail: dev-unsubscribesip-communicator.dev.java.net
For additional commands, e-mail: dev-helpsip-communicator.dev.java.net

Codecs implementation
user name
2006-12-14 09:50:22
Hi Jean,

Jean Lorchat wrote:
> To all other people (if you read this far... emil ? u
there ?), I'd also
> like to have a small discussion about latency and I do
not know when is
> the right time to start that thread. Because low
latency is a very
> important feature, and it's easier to start
implementing properly even
> if this means some overhead right now. Of course I'll
have more
> information to provide to this issue when the native
alsa source is
> finished... All right, I'm late. Feel free to hit me
(that's why I'm
> living so far away)...

I couldn't agree more. Latency is crucial, and even more so
for us and 
all the java-is-too-slow-for-voip comments that we're bound
to be getting.

Latency could be coming from one of the following: capture, 
encoding/decoding, net streaming and playback. I have never
seen an 
official study of the impact of any of these in JMF and
therefore SIP 
Communicator. From my experience however capture (and
possibly playback) 
seem to be the ones that are causing most trouble, and this
especially 
on Linux.

JMF's windows performance pack includes a DirectSound data
source so 
things aren't that bad (though they could be better).
Linux's 
performance pack has no native data source and uses
javasound which I 
believe is the cause for much of the latency there.

Encoding and decoding are more or less ok even when not
implemented 
natively (though once again I don't have anything official
on how much 
they take).

To summarize, I believe that a good study of the various
parts (enc/dec, 
capture, playback and streaming) of our audio system and
their impact on 
latency would be a very nice thing and could give us many
pointers as to 
how we could best optimize it. If I had to take a stab,
however, I'd go 
for capture first.

WDYT?

Cheers
Emil

------------------------------------------------------------
---------
To unsubscribe, e-mail: dev-unsubscribesip-communicator.dev.java.net
For additional commands, e-mail: dev-helpsip-communicator.dev.java.net

[1-10] [11-14]

about | contact  Other archives ( Real Estate discussion Medical topics )