I was asked by Dean to provide a SIP Expert Review of this
draft, with
special note to section 2.3. In general I like this draft.
It begins to
exploit the power of sip, and to unify it with other
protocols. I did
find some issues, all of which should be fairly
straightforward to address.
I've broken my comments down by section. In many cases I
have includes
snips from the draft for reference and followed those with
pertinent
comments.
Thanks,
Paul
Section 2.1:
The ABNF of the URI seems excessively rigid in the ordering
of
parameters - first the init-parameters, then the
vxml-parameters, then
then any other uri-parameters. It also doesn't note that
'uri-parameters' is defined in 3261.
I would suggest something like:
dialog-parameters = *(";" dialog-parameter)
dialog-parameter = init-param /
vxml-param /
uri-parameter ; defined in RFC 3261
init-param = (dialog-param /
maxage-param /
maxstale-param /
method-param /
postbody-param)
vxml-param = vxml-keyword "="
vxml-value
That achieves the same effect but allows all the params in
any order.
This is almost a nit, but the rigid ordering is likely to be
a source of
interop problems.
I'm confused about postbody:
postbody: Used to set the
application/x-www-form-urlencoded encoded
[HTML4] HTTP body for "post" requests (or
is otherwise ignored).
The postbody value is the prepared application/
x-www-form-urlencoded content, subsequently
URL-encoded (see note
below).
...
Note: Special characters in Request-URI parameter values
need to be
URL-encoded as required by the SIP URI syntax, for
example '?' (%3f),
'=' (%3d), and ';' (%3b). The VoiceXML Media Server
MUST therefore
unescape Request-URI parameter values before making use
of them or
exposing them to running VoiceXML applications. It is
important that
the VoiceXML Media Server only unescape the parameter
values once
since the desired VoiceXML URI value could itself be URL
encoded, for
example. When a postbody is included, its entire
content including
any line breaks (represented by a CR LF pair) is encoded
as a single
parameter value following the above rules (such that the
line breaks
would be replaced by '%0D%0A', for example).
[HTML4] says:
application/x-www-form-urlencoded
This is the default content type. Forms submitted with
this content
type must be encoded as follows:
1. Control names and values are escaped. Space
characters are
replaced by `+', and then reserved characters are
escaped as
described in [RFC1738], section 2.2: Non-alphanumeric
characters
are replaced by `%HH', a percent sign and two
hexadecimal digits
representing the ASCII code of the character. Line
breaks are
represented as "CR LF" pairs (i.e.,
`%0D%0A').
2. The control names/values are listed in the order they
appear in
the document. The name is separated from the value by
`=' and
name/value pairs are separated from each other by
`&'.
The interaction between these two escaping rules seems
potentially
confusing. I *think* when this is all put together it means
that the
body must first be encoded according to the [HTML4] section
above. At
that point it will already be almost conformant to the 3261
syntax of a
token, except for the use of '&'. Then it needs to be
encoded again,
which will take care of the ampersands, but which will
re-encode the '%'
characters of the first encoding.
Exactly what, if anything, I would recommend changing
depends on whether
I understood what is expected. I guess I might just
recommend that you
clarify further. (Perhaps I'm just being dense. If so please
just tell
me so.)
Section 2.2:
The Application Server SHOULD insert its own URI in the
Record-Route
header so that it remains in the signaling path for
subsequent
signaling related to the session. This is of particular
importance
for call transfers so that upstream Application Servers
or proxy
servers see signaling originating from the Application
Server and not
the VoiceXML Media Server itself.
I don't understand the purpose of the above. The SHOULD
strength of this
requirement suggests to me an assumption of a particular
operating
environment. In the general case, why should this be more
than MAY strength?
Section 2.3:
IMO the use of a media-less session is an entirely valid sip
usage. The
only concern I might potentially have is if it were to catch
some UA
unaware, because some UAs just aren't prepared to handle
this case. But
the recommended usage here always puts the choice of doing
this in the
hands of *other* UA, not the media server. So I see no
problem.
I do find it disconcerting that the initial invite and
subsequent
reinvites are handled in different ways. For the initial one
you stall
the VXML awaiting a media stream, but on subsequent ones you
assume the
absence of a stream is equivalent to having a stream that
doesn't send
anything. Why isn't the behavior consistent in these cases?
If you need
to support both behaviors, then it might be better to
explicit and
unique signaling for each. For instance, you might define
that a media
stream with a=inactive or a=sendonly (from the client's
perspective -
client putting media server "on hold") could be
treated as the absence
of input but the absence of the stream means you should wait
for a
stream to be negotiated.
Section 2.4:
I don't feel qualified to comment on the validity of the
mappings of
History-Info. It might be good to get somebody else who
knows it well to
comment on that.
In addition, the array's toString() function returns
the full SIP
Request-URI. For example, assuming a Request-URI of
sip:dialog
example.com;voicexml=http://example.com;obj={&
quot;x":1,"y":true} then
I don't believe the above URI is valid. The ',' '{' and '}'
aren't
syntactically correct in a URI pvalue. You would need to
escape them.
Section 2.6.2:
IIUC, message (1) contains no offer, (5) contains an offer
with media,
and (6) accepts the call but rejects the media. If so, then
(9) will
most likely be invalid. To make it valid, the o-line from
(8) needs to
be replaced with one consistent to that used in (6), with
the version
number incremented. And if (5) has more m-lines than (8),
then (9) needs
to be padded with extra (rejected) m-lines.
Section 5.2:
On receipt of the REFER request, the VoiceXML Media
Server MUST issue
a provisional response, 100 Trying. The 202 Accepted
response
indicates that the VoiceXML document has been fetched
and parsed
correctly. The VoiceXML Media Server proceeds to place
the outbound
INVITE and will execute the application after the ACK is
sent.
The rules of RFC 4320 need to be followed here. REFER is a
non-invite
transaction and so the timing of the 100 must be as
specified in 4320.
In the call flow, the sending of the initial NOTIFY before
the 202 for
the REFER, and especially before determining if REFER is
going to
succeed or fail, is at best unusual and almost certainly
incorrect.
Sending the NOTIFY and then sending a failure response would
certainly
be incorrect.
I think you have two choices:
- wait until the get is complete before sending the NOTIFY,
and probably
send it after the 202.
- send a 202 for the REFER before doing the GET. Inform of a
GET failure
via a NOTIFY.
Section 6.3:
In the call flow I think you probably need another NOTIFY
between
messages (6) and (7). Its potentially too long until (13).
_______________________________________________
Sip mailing list https://ww
w1.ietf.org/mailman/listinfo/sip
This list is for NEW development of the core SIP Protocol
Use sip-implementors cs.columbia.edu for questions on current
sip
Use sipping ietf.org for new developments on the application of
sip
|