List Info

Thread: Critical CacheMarshaller issue




Critical CacheMarshaller issue
user name
2007-11-07 18:31:40
A nasty bug, spotted by someone in the user forum (initially
as a CCE)

	http:/
/jira.jboss.org/jira/browse/JBCACHE-1211

Copying from the JIRA:

"This is a nasty. What started life as an optimisation
for certain  
types of objects in a marshalled stream (Fqn,
GlobalTransactio, String  
and Serializable) has become a major limitation in that a
single  
stream can only hold up to 32767 different (not equal())
instances of  
such objects.

Basically the optimisation was, for example, instead of
writing  
"hello" to a stream twice, just write it once and
use a reference for  
all subsequent times. Unfortunately this reference was
encoded as a  
short, hence the limitation of 32767.

Fixing this will definitely break wire compatibility with
JBoss Cache  
2.0.0, although JBC does allow backward compatibility by
specifying  
replication version in your configuration, thanks to the  
VersionAwareMarshaller. "

So I guess this mandates the need for a CacheMarshaller210. 
The  
question is how do we fix this.  The obvious thing is to
replace the  
short references with integers.  The 2 ^ 31 - 1 number of
references  
this would allow should be plenty!  The drawback though, is
larger  
streams.  4-byte refs instead of 2-byte refs can be an
unnecessary  
overhead especially if objects aren't repeated much.

One thing we could do is just limit ref counting to Fqn and 

GlobalTransaction types, of which we know can often be
repeated in a  
modification list, and where the optimisation can benefit,
and assume  
it isn't worth optimising in the same way for user data
(Strings and  
Serializables).

Or maybe we should drop the ref counting altogether, save a
few more  
bytes' worth of payload.

Thoughts?
--
Manik Surtani
Lead, JBoss Cache
manikjboss.org






_______________________________________________
jbosscache-dev mailing list
jbosscache-devlists.jboss.org
https://lists.jboss.org/mailman/listinfo/jbosscache-dev

Re: Critical CacheMarshaller issue
user name
2007-11-07 20:37:44
IIRC, this use of references was not an optimization, it was
a 
requirement to ensure we retain object identity when we
deserialize. So 
I don't think we can drop it for some types.

Manik Surtani wrote:
> A nasty bug, spotted by someone in the user forum
(initially as a CCE)
> 
>     http:/
/jira.jboss.org/jira/browse/JBCACHE-1211
> 
> Copying from the JIRA:
> 
> "This is a nasty. What started life as an
optimisation for certain types 
> of objects in a marshalled stream (Fqn,
GlobalTransactio, String and 
> Serializable) has become a major limitation in that a
single stream can 
> only hold up to 32767 different (not equal()) instances
of such objects.
> 
> Basically the optimisation was, for example, instead of
writing "hello" 
> to a stream twice, just write it once and use a
reference for all 
> subsequent times. Unfortunately this reference was
encoded as a short, 
> hence the limitation of 32767.
> 
> Fixing this will definitely break wire compatibility
with JBoss Cache 
> 2.0.0, although JBC does allow backward compatibility
by specifying 
> replication version in your configuration, thanks to
the 
> VersionAwareMarshaller. "
> 
> So I guess this mandates the need for a
CacheMarshaller210.  The 
> question is how do we fix this.  The obvious thing is
to replace the 
> short references with integers.  The 2 ^ 31 - 1 number
of references 
> this would allow should be plenty!  The drawback
though, is larger 
> streams.  4-byte refs instead of 2-byte refs can be an
unnecessary 
> overhead especially if objects aren't repeated much.
> 
> One thing we could do is just limit ref counting to Fqn
and 
> GlobalTransaction types, of which we know can often be
repeated in a 
> modification list, and where the optimisation can
benefit, and assume it 
> isn't worth optimising in the same way for user data
(Strings and 
> Serializables).
> 
> Or maybe we should drop the ref counting altogether,
save a few more 
> bytes' worth of payload.
> 
> Thoughts?
> -- 
> Manik Surtani
> Lead, JBoss Cache
> manikjboss.org
> 
> 
> 
> 
> 
> 
> _______________________________________________
> jbosscache-dev mailing list
> jbosscache-devlists.jboss.org
> https://lists.jboss.org/mailman/listinfo/jbosscache-dev

-- 
Brian Stansberry
Lead, AS Clustering
JBoss, a division of Red Hat
brian.stansberryredhat.com
_______________________________________________
jbosscache-dev mailing list
jbosscache-devlists.jboss.org
https://lists.jboss.org/mailman/listinfo/jbosscache-dev

Re: Critical CacheMarshaller issue
user name
2007-11-07 21:00:14
Manik Surtani wrote:
> A nasty bug, spotted by someone in the user forum
(initially as a CCE)
> 
>     http:/
/jira.jboss.org/jira/browse/JBCACHE-1211
> 
> Copying from the JIRA:
> 
> "This is a nasty. What started life as an
optimisation for certain types 
> of objects in a marshalled stream (Fqn,
GlobalTransactio, String and 
> Serializable) has become a major limitation in that a
single stream can 
> only hold up to 32767 different (not equal()) instances
of such objects.
> 
> Basically the optimisation was, for example, instead of
writing "hello" 
> to a stream twice, just write it once and use a
reference for all 
> subsequent times. Unfortunately this reference was
encoded as a short, 
> hence the limitation of 32767.
> 
> Fixing this will definitely break wire compatibility
with JBoss Cache 
> 2.0.0, although JBC does allow backward compatibility
by specifying 
> replication version in your configuration, thanks to
the 
> VersionAwareMarshaller. "
> 
> So I guess this mandates the need for a
CacheMarshaller210.  The 
> question is how do we fix this.  The obvious thing is
to replace the 
> short references with integers.  The 2 ^ 31 - 1 number
of references 
> this would allow should be plenty!  The drawback
though, is larger 
> streams.  4-byte refs instead of 2-byte refs can be an
unnecessary 
> overhead especially if objects aren't repeated much.

I wouldn't worry too much about the extra bytes. However,
you could 
maintain backwards compatibility, and save the 2 bytes, by
stealing the 
sign bit on the short. If byte1 & 0x80 then read 3 more
bytes, else read 
only 1 more.

-- 
Jason T. Greene
JBoss, a division of Red Hat
_______________________________________________
jbosscache-dev mailing list
jbosscache-devlists.jboss.org
https://lists.jboss.org/mailman/listinfo/jbosscache-dev

Re: Critical CacheMarshaller issue
user name
2007-11-12 08:10:59
On 8 Nov 2007, at 03:00, Jason T. Greene wrote:

> Manik Surtani wrote:
>> A nasty bug, spotted by someone in the user forum
(initially as a  
>> CCE)
>>    http:/
/jira.jboss.org/jira/browse/JBCACHE-1211
>> Copying from the JIRA:
>> "This is a nasty. What started life as an
optimisation for certain  
>> types of objects in a marshalled stream (Fqn,
GlobalTransactio,  
>> String and Serializable) has become a major
limitation in that a  
>> single stream can only hold up to 32767 different
(not equal())  
>> instances of such objects.
>> Basically the optimisation was, for example,
instead of writing  
>> "hello" to a stream twice, just write it
once and use a reference  
>> for all subsequent times. Unfortunately this
reference was encoded  
>> as a short, hence the limitation of 32767.
>> Fixing this will definitely break wire
compatibility with JBoss  
>> Cache 2.0.0, although JBC does allow backward
compatibility by  
>> specifying replication version in your
configuration, thanks to the  
>> VersionAwareMarshaller. "
>> So I guess this mandates the need for a
CacheMarshaller210.  The  
>> question is how do we fix this.  The obvious thing
is to replace  
>> the short references with integers.  The 2 ^ 31 - 1
number of  
>> references this would allow should be plenty!  The
drawback though,  
>> is larger streams.  4-byte refs instead of 2-byte
refs can be an  
>> unnecessary overhead especially if objects aren't
repeated much.
>
> I wouldn't worry too much about the extra bytes.
However, you could  
> maintain backwards compatibility, and save the 2 bytes,
by stealing  
> the sign bit on the short. If byte1 & 0x80 then
read 3 more bytes,  
> else read only 1 more.
>

Still wouldn't help if you needed a million Strings in a  
collection.    Since we
have the VersionAwareMarshaller that is  
able to switch between marshallers, I'm not so worried about
backward  
compatibility.

Cheers
--
Manik Surtani
Lead, JBoss Cache
manikjboss.org






_______________________________________________
jbosscache-dev mailing list
jbosscache-devlists.jboss.org
https://lists.jboss.org/mailman/listinfo/jbosscache-dev

Re: Critical CacheMarshaller issue
user name
2007-11-12 08:11:44
On 8 Nov 2007, at 02:37, Brian Stansberry wrote:

> IIRC, this use of references was not an optimization,
it was a  
> requirement to ensure we retain object identity when we
deserialize.  
> So I don't think we can drop it for some types.
>

I've implemented this now using variable length ints - uses
anything  
between 1 and 5 bytes to represent an int.  Thanks to genman
for this  
suggestion.

http://lucene.apache.org/java/docs/fileformats.html#VInt


This does mean a new CacheMarshaller for 2.1.0 though.

Cheers,
--
Manik Surtani
Lead, JBoss Cache
manikjboss.org






_______________________________________________
jbosscache-dev mailing list
jbosscache-devlists.jboss.org
https://lists.jboss.org/mailman/listinfo/jbosscache-dev

Re: Critical CacheMarshaller issue
user name
2007-11-12 11:18:15
Manik Surtani wrote:
> 
> On 8 Nov 2007, at 03:00, Jason T. Greene wrote:
> 
>> Manik Surtani wrote:
>>> A nasty bug, spotted by someone in the user
forum (initially as a CCE)
>>>    http:/
/jira.jboss.org/jira/browse/JBCACHE-1211
>>> Copying from the JIRA:
>>> "This is a nasty. What started life as an
optimisation for certain 
>>> types of objects in a marshalled stream (Fqn,
GlobalTransactio, 
>>> String and Serializable) has become a major
limitation in that a 
>>> single stream can only hold up to 32767
different (not equal()) 
>>> instances of such objects.
>>> Basically the optimisation was, for example,
instead of writing 
>>> "hello" to a stream twice, just write
it once and use a reference for 
>>> all subsequent times. Unfortunately this
reference was encoded as a 
>>> short, hence the limitation of 32767.
>>> Fixing this will definitely break wire
compatibility with JBoss Cache 
>>> 2.0.0, although JBC does allow backward
compatibility by specifying 
>>> replication version in your configuration,
thanks to the 
>>> VersionAwareMarshaller. "
>>> So I guess this mandates the need for a
CacheMarshaller210.  The 
>>> question is how do we fix this.  The obvious
thing is to replace the 
>>> short references with integers.  The 2 ^ 31 - 1
number of references 
>>> this would allow should be plenty!  The
drawback though, is larger 
>>> streams.  4-byte refs instead of 2-byte refs
can be an unnecessary 
>>> overhead especially if objects aren't repeated
much.
>>
>> I wouldn't worry too much about the extra bytes.
However, you could 
>> maintain backwards compatibility, and save the 2
bytes, by stealing 
>> the sign bit on the short. If byte1 & 0x80 then
read 3 more bytes, 
>> else read only 1 more.
>>
> 
> Still wouldn't help if you needed a million Strings in
a collection.  
>  


Sure it would, since you get the full positive rang of a
signed int 
(2^31 - 1). The only difference is that if its <= 32767
you write only 
two bytes, and when it's greater you write an encoded int
that can be 
detected (only 4 bytes).

// Writing code
if (n > 32767) {
    n = n | 0x80000000;
    writeInt(n);
} else {
    writeShort(n);
}

// Reading code
int n = readShort();
if ((n & 0x80000000) != 0) {
   // sign bit rolls off
   n = n << 16  & readShort();
}

-- 
Jason T. Greene
JBoss, a division of Red Hat
_______________________________________________
jbosscache-dev mailing list
jbosscache-devlists.jboss.org
https://lists.jboss.org/mailman/listinfo/jbosscache-dev

Re: Critical CacheMarshaller issue
user name
2007-11-12 11:21:18
Jason T. Greene wrote:
> Manik Surtani wrote:
>>
>> On 8 Nov 2007, at 03:00, Jason T. Greene wrote:
>>
>>> Manik Surtani wrote:
>>>> A nasty bug, spotted by someone in the user
forum (initially as a CCE)
>>>>    http:/
/jira.jboss.org/jira/browse/JBCACHE-1211
>>>> Copying from the JIRA:
>>>> "This is a nasty. What started life as
an optimisation for certain 
>>>> types of objects in a marshalled stream
(Fqn, GlobalTransactio, 
>>>> String and Serializable) has become a major
limitation in that a 
>>>> single stream can only hold up to 32767
different (not equal()) 
>>>> instances of such objects.
>>>> Basically the optimisation was, for
example, instead of writing 
>>>> "hello" to a stream twice, just
write it once and use a reference 
>>>> for all subsequent times. Unfortunately
this reference was encoded 
>>>> as a short, hence the limitation of 32767.
>>>> Fixing this will definitely break wire
compatibility with JBoss 
>>>> Cache 2.0.0, although JBC does allow
backward compatibility by 
>>>> specifying replication version in your
configuration, thanks to the 
>>>> VersionAwareMarshaller. "
>>>> So I guess this mandates the need for a
CacheMarshaller210.  The 
>>>> question is how do we fix this.  The
obvious thing is to replace the 
>>>> short references with integers.  The 2 ^ 31
- 1 number of references 
>>>> this would allow should be plenty!  The
drawback though, is larger 
>>>> streams.  4-byte refs instead of 2-byte
refs can be an unnecessary 
>>>> overhead especially if objects aren't
repeated much.
>>>
>>> I wouldn't worry too much about the extra
bytes. However, you could 
>>> maintain backwards compatibility, and save the
2 bytes, by stealing 
>>> the sign bit on the short. If byte1 & 0x80
then read 3 more bytes, 
>>> else read only 1 more.
>>>
>>
>> Still wouldn't help if you needed a million Strings
in a collection.  
>>   
> 
> Sure it would, since you get the full positive rang of
a signed int 
> (2^31 - 1). The only difference is that if its <=
32767 you write only 
> two bytes, and when it's greater you write an encoded
int that can be 
> detected (only 4 bytes).
> 
> // Writing code
> if (n > 32767) {
>    n = n | 0x80000000;
>    writeInt(n);
> } else {
>    writeShort(n);
> }
> 
> // Reading code
> int n = readShort();
> if ((n & 0x80000000) != 0) {
>   // sign bit rolls off
>   n = n << 16  & readShort();
> }
> 

Correction (Forgot to account for java's retarded auto
integer upconvert):

short s = in.readShort();
int n = s;
if ((s & 0x8000) != 0) {
    n = (s & 0x7FFF) << 16 | (in.readShort() &
0xFFFF);
}


-- 
Jason T. Greene
JBoss, a division of Red Hat
_______________________________________________
jbosscache-dev mailing list
jbosscache-devlists.jboss.org
https://lists.jboss.org/mailman/listinfo/jbosscache-dev

Re: Critical CacheMarshaller issue
user name
2007-11-12 12:17:41
On 12 Nov 2007, at 17:18, Jason T. Greene wrote:

> Manik Surtani wrote:
>> On 8 Nov 2007, at 03:00, Jason T. Greene wrote:
>>> Manik Surtani wrote:
>>>> A nasty bug, spotted by someone in the user
forum (initially as a  
>>>> CCE)
>>>>   http:/
/jira.jboss.org/jira/browse/JBCACHE-1211
>>>> Copying from the JIRA:
>>>> "This is a nasty. What started life as
an optimisation for  
>>>> certain types of objects in a marshalled
stream (Fqn,  
>>>> GlobalTransactio, String and Serializable)
has become a major  
>>>> limitation in that a single stream can only
hold up to 32767  
>>>> different (not equal()) instances of such
objects.
>>>> Basically the optimisation was, for
example, instead of writing  
>>>> "hello" to a stream twice, just
write it once and use a reference  
>>>> for all subsequent times. Unfortunately
this reference was  
>>>> encoded as a short, hence the limitation of
32767.
>>>> Fixing this will definitely break wire
compatibility with JBoss  
>>>> Cache 2.0.0, although JBC does allow
backward compatibility by  
>>>> specifying replication version in your
configuration, thanks to  
>>>> the VersionAwareMarshaller. "
>>>> So I guess this mandates the need for a
CacheMarshaller210.  The  
>>>> question is how do we fix this.  The
obvious thing is to replace  
>>>> the short references with integers.  The 2
^ 31 - 1 number of  
>>>> references this would allow should be
plenty!  The drawback  
>>>> though, is larger streams.  4-byte refs
instead of 2-byte refs  
>>>> can be an unnecessary overhead especially
if objects aren't  
>>>> repeated much.
>>>
>>> I wouldn't worry too much about the extra
bytes. However, you  
>>> could maintain backwards compatibility, and
save the 2 bytes, by  
>>> stealing the sign bit on the short. If byte1
& 0x80 then read 3  
>>> more bytes, else read only 1 more.
>>>
>> Still wouldn't help if you needed a million Strings
in a  
>> collection.  
>
> Sure it would, since you get the full positive rang of
a signed int  
> (2^31 - 1). The only difference is that if its <=
32767 you write  
> only two bytes, and when it's greater you write an
encoded int that  
> can be detected (only 4 bytes).

Of course, yeah, you'd read 3 more bytes.  But that would
mean (with  
the adding of more bytes) this would break backward
compatibility for  
 > 32767 refs anyway.  Existing code wouldn't be able to
deserialize  
such a stream.  Then again, for such cases, it is currently
*broken*  
and even existing code wouldn't be able to deserialize such
a stream  
anyway!!

Still I'd prefer to make the change to the stream explicit
though, as  
a separate marshaller for 2.1.0 - I do like the variable int
approach  
since for a small number of refs (< 128, which is
probably the  
majority of use cases) I'd just encode a single byte.

Cheers,
--
Manik Surtani
Lead, JBoss Cache
manikjboss.org






_______________________________________________
jbosscache-dev mailing list
jbosscache-devlists.jboss.org
https://lists.jboss.org/mailman/listinfo/jbosscache-dev

Re: Critical CacheMarshaller issue
user name
2007-11-12 13:12:13
Manik Surtani wrote:
> 
> On 12 Nov 2007, at 17:18, Jason T. Greene wrote:
> 
>> Manik Surtani wrote:
>>> On 8 Nov 2007, at 03:00, Jason T. Greene
wrote:
>>>> Manik Surtani wrote:
>>>>> A nasty bug, spotted by someone in the
user forum (initially as a CCE)
>>>>>   http:/
/jira.jboss.org/jira/browse/JBCACHE-1211
>>>>> Copying from the JIRA:
>>>>> "This is a nasty. What started
life as an optimisation for certain 
>>>>> types of objects in a marshalled stream
(Fqn, GlobalTransactio, 
>>>>> String and Serializable) has become a
major limitation in that a 
>>>>> single stream can only hold up to 32767
different (not equal()) 
>>>>> instances of such objects.
>>>>> Basically the optimisation was, for
example, instead of writing 
>>>>> "hello" to a stream twice,
just write it once and use a reference 
>>>>> for all subsequent times. Unfortunately
this reference was encoded 
>>>>> as a short, hence the limitation of
32767.
>>>>> Fixing this will definitely break wire
compatibility with JBoss 
>>>>> Cache 2.0.0, although JBC does allow
backward compatibility by 
>>>>> specifying replication version in your
configuration, thanks to the 
>>>>> VersionAwareMarshaller. "
>>>>> So I guess this mandates the need for a
CacheMarshaller210.  The 
>>>>> question is how do we fix this.  The
obvious thing is to replace 
>>>>> the short references with integers. 
The 2 ^ 31 - 1 number of 
>>>>> references this would allow should be
plenty!  The drawback though, 
>>>>> is larger streams.  4-byte refs instead
of 2-byte refs can be an 
>>>>> unnecessary overhead especially if
objects aren't repeated much.
>>>>
>>>> I wouldn't worry too much about the extra
bytes. However, you could 
>>>> maintain backwards compatibility, and save
the 2 bytes, by stealing 
>>>> the sign bit on the short. If byte1 &
0x80 then read 3 more bytes, 
>>>> else read only 1 more.
>>>>
>>> Still wouldn't help if you needed a million
Strings in a collection.  
>>> 
>>
>> Sure it would, since you get the full positive rang
of a signed int 
>> (2^31 - 1). The only difference is that if its
<= 32767 you write only 
>> two bytes, and when it's greater you write an
encoded int that can be 
>> detected (only 4 bytes).
> 
> Of course, yeah, you'd read 3 more bytes.  But that
would mean (with the 
> adding of more bytes) this would break backward
compatibility for > 
> 32767 refs anyway.  Existing code wouldn't be able to
deserialize such a 
> stream.  Then again, for such cases, it is currently
*broken* and even 
> existing code wouldn't be able to deserialize such a
stream anyway!!

Right, there is no way to fix the limitation on old code.
The only 
advantage is that you only need 1 marshaller.

> Still I'd prefer to make the change to the stream
explicit though, as a 
> separate marshaller for 2.1.0 - I do like the variable
int approach 
> since for a small number of refs (< 128, which is
probably the majority 
> of use cases) I'd just encode a single byte.

Probably the most efficient change, would be to introduce a
new magic 
number for each size:
MAGICNUMBER_REF
MAGICNUMBER_REF16
MAGICNUMBER_REF32
MAGICNUMBER_REF64

This would give you the full unsigned size of each type
(provided you 
changed the reference generation logic to support it).
0 - 255 refs : 1 byte
256 - 65535 refs : 2 bytes
65536 - 4294967295 : 4 bytes
4294967296 - 18446744073709551616 : 8 bytes

-Jason

-- 
Jason T. Greene
JBoss, a division of Red Hat
_______________________________________________
jbosscache-dev mailing list
jbosscache-devlists.jboss.org
https://lists.jboss.org/mailman/listinfo/jbosscache-dev

[1-9]

about | contact  Other archives ( Real Estate discussion Medical topics )