|
List Info
Thread: Critical CacheMarshaller issue
|
|
| Critical CacheMarshaller issue |

|
2007-11-07 18:31:40 |
A nasty bug, spotted by someone in the user forum (initially
as a CCE)
http:/
/jira.jboss.org/jira/browse/JBCACHE-1211
Copying from the JIRA:
"This is a nasty. What started life as an optimisation
for certain
types of objects in a marshalled stream (Fqn,
GlobalTransactio, String
and Serializable) has become a major limitation in that a
single
stream can only hold up to 32767 different (not equal())
instances of
such objects.
Basically the optimisation was, for example, instead of
writing
"hello" to a stream twice, just write it once and
use a reference for
all subsequent times. Unfortunately this reference was
encoded as a
short, hence the limitation of 32767.
Fixing this will definitely break wire compatibility with
JBoss Cache
2.0.0, although JBC does allow backward compatibility by
specifying
replication version in your configuration, thanks to the
VersionAwareMarshaller. "
So I guess this mandates the need for a CacheMarshaller210.
The
question is how do we fix this. The obvious thing is to
replace the
short references with integers. The 2 ^ 31 - 1 number of
references
this would allow should be plenty! The drawback though, is
larger
streams. 4-byte refs instead of 2-byte refs can be an
unnecessary
overhead especially if objects aren't repeated much.
One thing we could do is just limit ref counting to Fqn and
GlobalTransaction types, of which we know can often be
repeated in a
modification list, and where the optimisation can benefit,
and assume
it isn't worth optimising in the same way for user data
(Strings and
Serializables).
Or maybe we should drop the ref counting altogether, save a
few more
bytes' worth of payload.
Thoughts?
--
Manik Surtani
Lead, JBoss Cache
manik jboss.org
_______________________________________________
jbosscache-dev mailing list
jbosscache-dev lists.jboss.org
https://lists.jboss.org/mailman/listinfo/jbosscache-dev
a>
|
|
| Re: Critical CacheMarshaller issue |

|
2007-11-07 20:37:44 |
IIRC, this use of references was not an optimization, it was
a
requirement to ensure we retain object identity when we
deserialize. So
I don't think we can drop it for some types.
Manik Surtani wrote:
> A nasty bug, spotted by someone in the user forum
(initially as a CCE)
>
> http:/
/jira.jboss.org/jira/browse/JBCACHE-1211
>
> Copying from the JIRA:
>
> "This is a nasty. What started life as an
optimisation for certain types
> of objects in a marshalled stream (Fqn,
GlobalTransactio, String and
> Serializable) has become a major limitation in that a
single stream can
> only hold up to 32767 different (not equal()) instances
of such objects.
>
> Basically the optimisation was, for example, instead of
writing "hello"
> to a stream twice, just write it once and use a
reference for all
> subsequent times. Unfortunately this reference was
encoded as a short,
> hence the limitation of 32767.
>
> Fixing this will definitely break wire compatibility
with JBoss Cache
> 2.0.0, although JBC does allow backward compatibility
by specifying
> replication version in your configuration, thanks to
the
> VersionAwareMarshaller. "
>
> So I guess this mandates the need for a
CacheMarshaller210. The
> question is how do we fix this. The obvious thing is
to replace the
> short references with integers. The 2 ^ 31 - 1 number
of references
> this would allow should be plenty! The drawback
though, is larger
> streams. 4-byte refs instead of 2-byte refs can be an
unnecessary
> overhead especially if objects aren't repeated much.
>
> One thing we could do is just limit ref counting to Fqn
and
> GlobalTransaction types, of which we know can often be
repeated in a
> modification list, and where the optimisation can
benefit, and assume it
> isn't worth optimising in the same way for user data
(Strings and
> Serializables).
>
> Or maybe we should drop the ref counting altogether,
save a few more
> bytes' worth of payload.
>
> Thoughts?
> --
> Manik Surtani
> Lead, JBoss Cache
> manik jboss.org
>
>
>
>
>
>
> _______________________________________________
> jbosscache-dev mailing list
> jbosscache-dev lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/jbosscache-dev
a>
--
Brian Stansberry
Lead, AS Clustering
JBoss, a division of Red Hat
brian.stansberry redhat.com
_______________________________________________
jbosscache-dev mailing list
jbosscache-dev lists.jboss.org
https://lists.jboss.org/mailman/listinfo/jbosscache-dev
a>
|
|
| Re: Critical CacheMarshaller issue |

|
2007-11-07 21:00:14 |
Manik Surtani wrote:
> A nasty bug, spotted by someone in the user forum
(initially as a CCE)
>
> http:/
/jira.jboss.org/jira/browse/JBCACHE-1211
>
> Copying from the JIRA:
>
> "This is a nasty. What started life as an
optimisation for certain types
> of objects in a marshalled stream (Fqn,
GlobalTransactio, String and
> Serializable) has become a major limitation in that a
single stream can
> only hold up to 32767 different (not equal()) instances
of such objects.
>
> Basically the optimisation was, for example, instead of
writing "hello"
> to a stream twice, just write it once and use a
reference for all
> subsequent times. Unfortunately this reference was
encoded as a short,
> hence the limitation of 32767.
>
> Fixing this will definitely break wire compatibility
with JBoss Cache
> 2.0.0, although JBC does allow backward compatibility
by specifying
> replication version in your configuration, thanks to
the
> VersionAwareMarshaller. "
>
> So I guess this mandates the need for a
CacheMarshaller210. The
> question is how do we fix this. The obvious thing is
to replace the
> short references with integers. The 2 ^ 31 - 1 number
of references
> this would allow should be plenty! The drawback
though, is larger
> streams. 4-byte refs instead of 2-byte refs can be an
unnecessary
> overhead especially if objects aren't repeated much.
I wouldn't worry too much about the extra bytes. However,
you could
maintain backwards compatibility, and save the 2 bytes, by
stealing the
sign bit on the short. If byte1 & 0x80 then read 3 more
bytes, else read
only 1 more.
--
Jason T. Greene
JBoss, a division of Red Hat
_______________________________________________
jbosscache-dev mailing list
jbosscache-dev lists.jboss.org
https://lists.jboss.org/mailman/listinfo/jbosscache-dev
a>
|
|
| Re: Critical CacheMarshaller issue |

|
2007-11-12 08:10:59 |
On 8 Nov 2007, at 03:00, Jason T. Greene wrote:
> Manik Surtani wrote:
>> A nasty bug, spotted by someone in the user forum
(initially as a
>> CCE)
>> http:/
/jira.jboss.org/jira/browse/JBCACHE-1211
>> Copying from the JIRA:
>> "This is a nasty. What started life as an
optimisation for certain
>> types of objects in a marshalled stream (Fqn,
GlobalTransactio,
>> String and Serializable) has become a major
limitation in that a
>> single stream can only hold up to 32767 different
(not equal())
>> instances of such objects.
>> Basically the optimisation was, for example,
instead of writing
>> "hello" to a stream twice, just write it
once and use a reference
>> for all subsequent times. Unfortunately this
reference was encoded
>> as a short, hence the limitation of 32767.
>> Fixing this will definitely break wire
compatibility with JBoss
>> Cache 2.0.0, although JBC does allow backward
compatibility by
>> specifying replication version in your
configuration, thanks to the
>> VersionAwareMarshaller. "
>> So I guess this mandates the need for a
CacheMarshaller210. The
>> question is how do we fix this. The obvious thing
is to replace
>> the short references with integers. The 2 ^ 31 - 1
number of
>> references this would allow should be plenty! The
drawback though,
>> is larger streams. 4-byte refs instead of 2-byte
refs can be an
>> unnecessary overhead especially if objects aren't
repeated much.
>
> I wouldn't worry too much about the extra bytes.
However, you could
> maintain backwards compatibility, and save the 2 bytes,
by stealing
> the sign bit on the short. If byte1 & 0x80 then
read 3 more bytes,
> else read only 1 more.
>
Still wouldn't help if you needed a million Strings in a
collection. Since we
have the VersionAwareMarshaller that is
able to switch between marshallers, I'm not so worried about
backward
compatibility.
Cheers
--
Manik Surtani
Lead, JBoss Cache
manik jboss.org
_______________________________________________
jbosscache-dev mailing list
jbosscache-dev lists.jboss.org
https://lists.jboss.org/mailman/listinfo/jbosscache-dev
a>
|
|
| Re: Critical CacheMarshaller issue |

|
2007-11-12 08:11:44 |
On 8 Nov 2007, at 02:37, Brian Stansberry wrote:
> IIRC, this use of references was not an optimization,
it was a
> requirement to ensure we retain object identity when we
deserialize.
> So I don't think we can drop it for some types.
>
I've implemented this now using variable length ints - uses
anything
between 1 and 5 bytes to represent an int. Thanks to genman
for this
suggestion.
http://lucene.apache.org/java/docs/fileformats.html#VInt
This does mean a new CacheMarshaller for 2.1.0 though.
Cheers,
--
Manik Surtani
Lead, JBoss Cache
manik jboss.org
_______________________________________________
jbosscache-dev mailing list
jbosscache-dev lists.jboss.org
https://lists.jboss.org/mailman/listinfo/jbosscache-dev
a>
|
|
| Re: Critical CacheMarshaller issue |

|
2007-11-12 11:18:15 |
Manik Surtani wrote:
>
> On 8 Nov 2007, at 03:00, Jason T. Greene wrote:
>
>> Manik Surtani wrote:
>>> A nasty bug, spotted by someone in the user
forum (initially as a CCE)
>>> http:/
/jira.jboss.org/jira/browse/JBCACHE-1211
>>> Copying from the JIRA:
>>> "This is a nasty. What started life as an
optimisation for certain
>>> types of objects in a marshalled stream (Fqn,
GlobalTransactio,
>>> String and Serializable) has become a major
limitation in that a
>>> single stream can only hold up to 32767
different (not equal())
>>> instances of such objects.
>>> Basically the optimisation was, for example,
instead of writing
>>> "hello" to a stream twice, just write
it once and use a reference for
>>> all subsequent times. Unfortunately this
reference was encoded as a
>>> short, hence the limitation of 32767.
>>> Fixing this will definitely break wire
compatibility with JBoss Cache
>>> 2.0.0, although JBC does allow backward
compatibility by specifying
>>> replication version in your configuration,
thanks to the
>>> VersionAwareMarshaller. "
>>> So I guess this mandates the need for a
CacheMarshaller210. The
>>> question is how do we fix this. The obvious
thing is to replace the
>>> short references with integers. The 2 ^ 31 - 1
number of references
>>> this would allow should be plenty! The
drawback though, is larger
>>> streams. 4-byte refs instead of 2-byte refs
can be an unnecessary
>>> overhead especially if objects aren't repeated
much.
>>
>> I wouldn't worry too much about the extra bytes.
However, you could
>> maintain backwards compatibility, and save the 2
bytes, by stealing
>> the sign bit on the short. If byte1 & 0x80 then
read 3 more bytes,
>> else read only 1 more.
>>
>
> Still wouldn't help if you needed a million Strings in
a collection.
>
Sure it would, since you get the full positive rang of a
signed int
(2^31 - 1). The only difference is that if its <= 32767
you write only
two bytes, and when it's greater you write an encoded int
that can be
detected (only 4 bytes).
// Writing code
if (n > 32767) {
n = n | 0x80000000;
writeInt(n);
} else {
writeShort(n);
}
// Reading code
int n = readShort();
if ((n & 0x80000000) != 0) {
// sign bit rolls off
n = n << 16 & readShort();
}
--
Jason T. Greene
JBoss, a division of Red Hat
_______________________________________________
jbosscache-dev mailing list
jbosscache-dev lists.jboss.org
https://lists.jboss.org/mailman/listinfo/jbosscache-dev
a>
|
|
| Re: Critical CacheMarshaller issue |

|
2007-11-12 11:21:18 |
Jason T. Greene wrote:
> Manik Surtani wrote:
>>
>> On 8 Nov 2007, at 03:00, Jason T. Greene wrote:
>>
>>> Manik Surtani wrote:
>>>> A nasty bug, spotted by someone in the user
forum (initially as a CCE)
>>>> http:/
/jira.jboss.org/jira/browse/JBCACHE-1211
>>>> Copying from the JIRA:
>>>> "This is a nasty. What started life as
an optimisation for certain
>>>> types of objects in a marshalled stream
(Fqn, GlobalTransactio,
>>>> String and Serializable) has become a major
limitation in that a
>>>> single stream can only hold up to 32767
different (not equal())
>>>> instances of such objects.
>>>> Basically the optimisation was, for
example, instead of writing
>>>> "hello" to a stream twice, just
write it once and use a reference
>>>> for all subsequent times. Unfortunately
this reference was encoded
>>>> as a short, hence the limitation of 32767.
>>>> Fixing this will definitely break wire
compatibility with JBoss
>>>> Cache 2.0.0, although JBC does allow
backward compatibility by
>>>> specifying replication version in your
configuration, thanks to the
>>>> VersionAwareMarshaller. "
>>>> So I guess this mandates the need for a
CacheMarshaller210. The
>>>> question is how do we fix this. The
obvious thing is to replace the
>>>> short references with integers. The 2 ^ 31
- 1 number of references
>>>> this would allow should be plenty! The
drawback though, is larger
>>>> streams. 4-byte refs instead of 2-byte
refs can be an unnecessary
>>>> overhead especially if objects aren't
repeated much.
>>>
>>> I wouldn't worry too much about the extra
bytes. However, you could
>>> maintain backwards compatibility, and save the
2 bytes, by stealing
>>> the sign bit on the short. If byte1 & 0x80
then read 3 more bytes,
>>> else read only 1 more.
>>>
>>
>> Still wouldn't help if you needed a million Strings
in a collection.
>>
>
> Sure it would, since you get the full positive rang of
a signed int
> (2^31 - 1). The only difference is that if its <=
32767 you write only
> two bytes, and when it's greater you write an encoded
int that can be
> detected (only 4 bytes).
>
> // Writing code
> if (n > 32767) {
> n = n | 0x80000000;
> writeInt(n);
> } else {
> writeShort(n);
> }
>
> // Reading code
> int n = readShort();
> if ((n & 0x80000000) != 0) {
> // sign bit rolls off
> n = n << 16 & readShort();
> }
>
Correction (Forgot to account for java's retarded auto
integer upconvert):
short s = in.readShort();
int n = s;
if ((s & 0x8000) != 0) {
n = (s & 0x7FFF) << 16 | (in.readShort() &
0xFFFF);
}
--
Jason T. Greene
JBoss, a division of Red Hat
_______________________________________________
jbosscache-dev mailing list
jbosscache-dev lists.jboss.org
https://lists.jboss.org/mailman/listinfo/jbosscache-dev
a>
|
|
| Re: Critical CacheMarshaller issue |

|
2007-11-12 12:17:41 |
On 12 Nov 2007, at 17:18, Jason T. Greene wrote:
> Manik Surtani wrote:
>> On 8 Nov 2007, at 03:00, Jason T. Greene wrote:
>>> Manik Surtani wrote:
>>>> A nasty bug, spotted by someone in the user
forum (initially as a
>>>> CCE)
>>>> http:/
/jira.jboss.org/jira/browse/JBCACHE-1211
>>>> Copying from the JIRA:
>>>> "This is a nasty. What started life as
an optimisation for
>>>> certain types of objects in a marshalled
stream (Fqn,
>>>> GlobalTransactio, String and Serializable)
has become a major
>>>> limitation in that a single stream can only
hold up to 32767
>>>> different (not equal()) instances of such
objects.
>>>> Basically the optimisation was, for
example, instead of writing
>>>> "hello" to a stream twice, just
write it once and use a reference
>>>> for all subsequent times. Unfortunately
this reference was
>>>> encoded as a short, hence the limitation of
32767.
>>>> Fixing this will definitely break wire
compatibility with JBoss
>>>> Cache 2.0.0, although JBC does allow
backward compatibility by
>>>> specifying replication version in your
configuration, thanks to
>>>> the VersionAwareMarshaller. "
>>>> So I guess this mandates the need for a
CacheMarshaller210. The
>>>> question is how do we fix this. The
obvious thing is to replace
>>>> the short references with integers. The 2
^ 31 - 1 number of
>>>> references this would allow should be
plenty! The drawback
>>>> though, is larger streams. 4-byte refs
instead of 2-byte refs
>>>> can be an unnecessary overhead especially
if objects aren't
>>>> repeated much.
>>>
>>> I wouldn't worry too much about the extra
bytes. However, you
>>> could maintain backwards compatibility, and
save the 2 bytes, by
>>> stealing the sign bit on the short. If byte1
& 0x80 then read 3
>>> more bytes, else read only 1 more.
>>>
>> Still wouldn't help if you needed a million Strings
in a
>> collection.
>
> Sure it would, since you get the full positive rang of
a signed int
> (2^31 - 1). The only difference is that if its <=
32767 you write
> only two bytes, and when it's greater you write an
encoded int that
> can be detected (only 4 bytes).
Of course, yeah, you'd read 3 more bytes. But that would
mean (with
the adding of more bytes) this would break backward
compatibility for
> 32767 refs anyway. Existing code wouldn't be able to
deserialize
such a stream. Then again, for such cases, it is currently
*broken*
and even existing code wouldn't be able to deserialize such
a stream
anyway!!
Still I'd prefer to make the change to the stream explicit
though, as
a separate marshaller for 2.1.0 - I do like the variable int
approach
since for a small number of refs (< 128, which is
probably the
majority of use cases) I'd just encode a single byte.
Cheers,
--
Manik Surtani
Lead, JBoss Cache
manik jboss.org
_______________________________________________
jbosscache-dev mailing list
jbosscache-dev lists.jboss.org
https://lists.jboss.org/mailman/listinfo/jbosscache-dev
a>
|
|
| Re: Critical CacheMarshaller issue |

|
2007-11-12 13:12:13 |
Manik Surtani wrote:
>
> On 12 Nov 2007, at 17:18, Jason T. Greene wrote:
>
>> Manik Surtani wrote:
>>> On 8 Nov 2007, at 03:00, Jason T. Greene
wrote:
>>>> Manik Surtani wrote:
>>>>> A nasty bug, spotted by someone in the
user forum (initially as a CCE)
>>>>> http:/
/jira.jboss.org/jira/browse/JBCACHE-1211
>>>>> Copying from the JIRA:
>>>>> "This is a nasty. What started
life as an optimisation for certain
>>>>> types of objects in a marshalled stream
(Fqn, GlobalTransactio,
>>>>> String and Serializable) has become a
major limitation in that a
>>>>> single stream can only hold up to 32767
different (not equal())
>>>>> instances of such objects.
>>>>> Basically the optimisation was, for
example, instead of writing
>>>>> "hello" to a stream twice,
just write it once and use a reference
>>>>> for all subsequent times. Unfortunately
this reference was encoded
>>>>> as a short, hence the limitation of
32767.
>>>>> Fixing this will definitely break wire
compatibility with JBoss
>>>>> Cache 2.0.0, although JBC does allow
backward compatibility by
>>>>> specifying replication version in your
configuration, thanks to the
>>>>> VersionAwareMarshaller. "
>>>>> So I guess this mandates the need for a
CacheMarshaller210. The
>>>>> question is how do we fix this. The
obvious thing is to replace
>>>>> the short references with integers.
The 2 ^ 31 - 1 number of
>>>>> references this would allow should be
plenty! The drawback though,
>>>>> is larger streams. 4-byte refs instead
of 2-byte refs can be an
>>>>> unnecessary overhead especially if
objects aren't repeated much.
>>>>
>>>> I wouldn't worry too much about the extra
bytes. However, you could
>>>> maintain backwards compatibility, and save
the 2 bytes, by stealing
>>>> the sign bit on the short. If byte1 &
0x80 then read 3 more bytes,
>>>> else read only 1 more.
>>>>
>>> Still wouldn't help if you needed a million
Strings in a collection.
>>>
>>
>> Sure it would, since you get the full positive rang
of a signed int
>> (2^31 - 1). The only difference is that if its
<= 32767 you write only
>> two bytes, and when it's greater you write an
encoded int that can be
>> detected (only 4 bytes).
>
> Of course, yeah, you'd read 3 more bytes. But that
would mean (with the
> adding of more bytes) this would break backward
compatibility for >
> 32767 refs anyway. Existing code wouldn't be able to
deserialize such a
> stream. Then again, for such cases, it is currently
*broken* and even
> existing code wouldn't be able to deserialize such a
stream anyway!!
Right, there is no way to fix the limitation on old code.
The only
advantage is that you only need 1 marshaller.
> Still I'd prefer to make the change to the stream
explicit though, as a
> separate marshaller for 2.1.0 - I do like the variable
int approach
> since for a small number of refs (< 128, which is
probably the majority
> of use cases) I'd just encode a single byte.
Probably the most efficient change, would be to introduce a
new magic
number for each size:
MAGICNUMBER_REF
MAGICNUMBER_REF16
MAGICNUMBER_REF32
MAGICNUMBER_REF64
This would give you the full unsigned size of each type
(provided you
changed the reference generation logic to support it).
0 - 255 refs : 1 byte
256 - 65535 refs : 2 bytes
65536 - 4294967295 : 4 bytes
4294967296 - 18446744073709551616 : 8 bytes
-Jason
--
Jason T. Greene
JBoss, a division of Red Hat
_______________________________________________
jbosscache-dev mailing list
jbosscache-dev lists.jboss.org
https://lists.jboss.org/mailman/listinfo/jbosscache-dev
a>
|
|
[1-9]
|
|