List Info

Thread: re: SEL 10 - Kernel 2.6.16.27.0.9 locks up




re: SEL 10 - Kernel 2.6.16.27.0.9 locks up
user name
2007-04-30 14:15:38
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Folks,
	I'm trying to find out how to go about investigating an
issue
	where our test server running 10.2.0.3 (x86_64) is locking
up when we run a
	few dd commands sequentially (dd if=/dev/zero
of=/z0/test/testthere2 bs=4k count=5000000) .. where /z0
was
	just some local storage.

	He did a kernel upgrade to version 2.6.16.27.0.9 a couple
of weeks ago. We then installed
	the following ASM packages on top of that.

		oracleasmlib-2.0.2-1.x86_64.rpm
		oracleasm-support-2.0.3-1.x86_64.rpm
		oracleasm-2.6.16.27-0.9-smp-2.0.3-1.x86_64.rpm

	We are using SEL 10 + 10.2.0.3 + ASM via ASMLib.

	At random intervals the machine would crash with no
information in the /var/log/messages. We ran a memory test
	on it and it was fine.  Finally our SA recompiled the
latest kernel from source ( 2.6.21-smp) and after a number
	of "dd" tests ,the machine did NOT crash.  With
the latest kernel from source, ASM was not started because
of
	version mismatch!

	ASM may or may not be the problem, but what is the best way
to troubleshoot this?

	The machine has the following spec:
	- Dell 6800  with 4 dual core CPUs (Intel(R) Xeon(TM) CPU
2.60GHz )
	- Storage is DS4400
	- Storage Driver: Fibre Channel: QLogic Corp. QLA2312 Fibre
Channel Adapter (rev 02)


- -peter

	
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org


iD8DBQFGNkBaoyy5QBCjoT0RAlE7AJ41HhYsiyUpY6GN+8gFoPfif+YNnwCf
anAc
Y9MmZW4vEL4nTTLihLflJzI=
=LO3p
-----END PGP SIGNATURE-----

-- 
To unsubscribe, email: suse-oracle-unsubscribesuse.com
For additional commands, email: suse-oracle-helpsuse.com
Please see http://www.suse.com/oracl
e/ before posting


Re: re: SEL 10 - Kernel 2.6.16.27.0.9 locks up
user name
2007-04-30 14:45:06
Advice # 1 - drop asmlib and never use it. It is useless
piece of software.
Linux have 'raw' which do the same but is standard
component, not omee made
as asmlib.

Then repeat tests again.

----- Original Message ----- 
From: "Peter Santos" <psantoscheetahmail.com>
To: <suse-oraclesuse.com>
Sent: Monday, April 30, 2007 12:15 PM
Subject: [suse-oracle] re: SEL 10 - Kernel 2.6.16.27.0.9
locks up


> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Folks,
> I'm trying to find out how to go about investigating an
issue
> where our test server running 10.2.0.3 (x86_64) is
locking up when we run
a
> few dd commands sequentially (dd if=/dev/zero
of=/z0/test/testthere2 bs=4k
count=5000000) .. where /z0 was
> just some local storage.
>
> He did a kernel upgrade to version 2.6.16.27.0.9 a
couple of weeks ago. We
then installed
> the following ASM packages on top of that.
>
> oracleasmlib-2.0.2-1.x86_64.rpm
> oracleasm-support-2.0.3-1.x86_64.rpm
> oracleasm-2.6.16.27-0.9-smp-2.0.3-1.x86_64.rpm
>
> We are using SEL 10 + 10.2.0.3 + ASM via ASMLib.
>
> At random intervals the machine would crash with no
information in the
/var/log/messages. We ran a memory test
> on it and it was fine.  Finally our SA recompiled the
latest kernel from
source ( 2.6.21-smp) and after a number
> of "dd" tests ,the machine did NOT crash. 
With the latest kernel from
source, ASM was not started because of
> version mismatch!
>
> ASM may or may not be the problem, but what is the best
way to
troubleshoot this?
>
> The machine has the following spec:
> - Dell 6800  with 4 dual core CPUs (Intel(R) Xeon(TM)
CPU 2.60GHz )
> - Storage is DS4400
> - Storage Driver: Fibre Channel: QLogic Corp. QLA2312
Fibre Channel
Adapter (rev 02)
>
>
> - -peter
>
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.1 (GNU/Linux)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

>
>
iD8DBQFGNkBaoyy5QBCjoT0RAlE7AJ41HhYsiyUpY6GN+8gFoPfif+YNnwCf
anAc
> Y9MmZW4vEL4nTTLihLflJzI=
> =LO3p
> -----END PGP SIGNATURE-----
>
> -- 
> To unsubscribe, email: suse-oracle-unsubscribesuse.com
> For additional commands, email: suse-oracle-helpsuse.com
> Please see http://www.suse.com/oracl
e/ before posting
>
>


-- 
To unsubscribe, email: suse-oracle-unsubscribesuse.com
For additional commands, email: suse-oracle-helpsuse.com
Please see http://www.suse.com/oracl
e/ before posting


Re: re: SELS 10 - Kernel 2.6.16.27.0.9 locks up - Again.
user name
2007-05-02 07:36:36
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Alexei,
	the reason we are using asmlib is because our experience
with managing
	raw devices is limited and we don't want to run into
additional trouble
	down the road.
	
	we've tried these tests over and over and it seems that the
machine just
	locks up when we run consecutive "dd" commands ..
after about an hr the
	machine locks up.  When the oracleasm is down we can't
reproduce this, but when
	the service is up, we get the locking problem. The only
thing that I'm
	uncertain about is that when the raw service starts up the
raw devices
	are bound, but the permissions on those devices were
root:root when
	oracleasm started. Only after did I change the permissions.
 I'm going to 	
	try this test one more time in this sequence.
		1. bind the raw devices.
		2. set the proper permissions on those devices
		3. start the oracleasm service.
		4. do /etc/init.d/oracleasm/status and listdisks to make
sure that
		   everything looks correct.
		5. run a number of "dd" commands to some local
storage and see if
		   machine locks up.
		   prompt>  dd if=/dev/zero of=/z0/test/testthere3
bs=4k count=22000000

	The frustrating thing is that the machine just locks up and
there is no logging. Also
	it requires that we go to the data center to physically
restart the machine.

	The other thing is that our hardware is certified on SLES 9
(SP3), but not on SLES 10. Again,
	I'm not show how important this is, but we can/might try
SLES 9 if we can't get this resolved.
	The certification bulletin for our hardware on SLES 9 is
83873.
	
	Here is the module information for ASM.

	dbt1:~ # modinfo oracleasm
	filename:      
/lib/modules/2.6.16.27-0.9-smp/kernel/drivers/addon/oracleas
m/oracleasm.ko
	license:        GPL
	version:        2.0.3
	author:         Joel Becker <joel.beckeroracle.com>
	description:    Kernel driver backing the Generic Linux ASM
Library.
	vermagic:       2.6.16.27-0.9-smp SMP gcc-4.1
	depends:
	srcversion:     B35F9F20EF40931C318A5EA

	Any ideas on how to troubleshoot this would be great!


- -peter


Alexei_Roudnev wrote:
> Advice # 1 - drop asmlib and never use it. It is
useless piece of software.
> Linux have 'raw' which do the same but is standard
component, not omee made
> as asmlib.
> 
> Then repeat tests again.
> 
> ----- Original Message ----- 
> From: "Peter Santos" <psantoscheetahmail.com>
> To: <suse-oraclesuse.com>
> Sent: Monday, April 30, 2007 12:15 PM
> Subject: [suse-oracle] re: SEL 10 - Kernel
2.6.16.27.0.9 locks up
> 
> 
> Folks,
> I'm trying to find out how to go about investigating an
issue
> where our test server running 10.2.0.3 (x86_64) is
locking up when we run
>> a
> few dd commands sequentially (dd if=/dev/zero
of=/z0/test/testthere2 bs=4k
>> count=5000000) .. where /z0 was
> just some local storage.
> 
> He did a kernel upgrade to version 2.6.16.27.0.9 a
couple of weeks ago. We
>> then installed
> the following ASM packages on top of that.
> 
> oracleasmlib-2.0.2-1.x86_64.rpm
> oracleasm-support-2.0.3-1.x86_64.rpm
> oracleasm-2.6.16.27-0.9-smp-2.0.3-1.x86_64.rpm
> 
> We are using SEL 10 + 10.2.0.3 + ASM via ASMLib.
> 
> At random intervals the machine would crash with no
information in the
>> /var/log/messages. We ran a memory test
> on it and it was fine.  Finally our SA recompiled the
latest kernel from
>> source ( 2.6.21-smp) and after a number
> of "dd" tests ,the machine did NOT crash. 
With the latest kernel from
>> source, ASM was not started because of
> version mismatch!
> 
> ASM may or may not be the problem, but what is the best
way to
>> troubleshoot this?
> The machine has the following spec:
> - Dell 6800  with 4 dual core CPUs (Intel(R) Xeon(TM)
CPU 2.60GHz )
> - Storage is DS4400
> - Storage Driver: Fibre Channel: QLogic Corp. QLA2312
Fibre Channel
>> Adapter (rev 02)
> 
> -peter
> 
> 
>>
- --
To unsubscribe, email: suse-oracle-unsubscribesuse.com
For additional commands, email: suse-oracle-helpsuse.com
Please see http://www.suse.com/oracl
e/ before posting
>>
>>

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org


iD8DBQFGOIXUoyy5QBCjoT0RAtF2AKCGy+d6+p/C88fQ2pbEYOOjmKIWZQCe
InqA
nhkQebGQE+Dz3tC3EpzhC/U=
=o4fN
-----END PGP SIGNATURE-----

-- 
To unsubscribe, email: suse-oracle-unsubscribesuse.com
For additional commands, email: suse-oracle-helpsuse.com
Please see http://www.suse.com/oracl
e/ before posting


Re: re: SELS 10 - Kernel 2.6.16.27.0.9 locks up - Again.
user name
2007-05-02 11:39:19
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Alexei,

so I decided to turn off everything that was
"oracle" related and
by running a couple of "dd" commands in parallel,
I got the machine to
lock up again.

I know that you mentioned in a previous posting that SLES 10
is just
not production ready .. and I'm wondering if I'm just
hitting some sort
of hardware issue.

One thing I did notice was the following in the
/var/log/messages ... which is some sort of
incompatibility with the dvd-rom, but from my research I
couldn't tell if this could cause
the machine to lock up.


May  2 11:37:53 s_dgramdbt1 kernel: hda: cdrom_pc_intr: The drive
appears confused (ireason = 0x01). Trying to recover
by ending request.
May  2 11:48:09 s_dgramdbt1 kernel: hda: cdrom_pc_intr: The drive
appears confused (ireason = 0x01). Trying to recover
by ending request.
May  2 11:53:56 s_dgramdbt1 kernel: hda: cdrom_pc_intr: The drive
appears confused (ireason = 0x01). Trying to recover
by ending request.
May  2 11:54:02 s_dgramdbt1 kernel: hda: cdrom_pc_intr: The drive
appears confused (ireason = 0x01). Trying to recover
by ending request.
May  2 11:54:29 s_dgramdbt1 kernel: hda: cdrom_pc_intr: The drive
appears confused (ireason = 0x01). Trying to recover
by ending request.
May  2 12:03:04 s_dgramdbt1 kernel: hda: cdrom_pc_intr: The drive
appears confused (ireason = 0x01). Trying to recover
by ending request.
May  2 12:06:32 s_dgramdbt1 kernel: hda: cdrom_pc_intr: The drive
appears confused (ireason = 0x01). Trying to recover
by ending request.

We have another 3 node RAC cluster on SLES 9 (SP3), so we
just might go back to that ...

- -peter


Peter Santos wrote:
> Alexei,
> 	the reason we are using asmlib is because our
experience with managing
> 	raw devices is limited and we don't want to run into
additional trouble
> 	down the road.
> 	
> 	we've tried these tests over and over and it seems
that the machine just
> 	locks up when we run consecutive "dd"
commands .. after about an hr the
> 	machine locks up.  When the oracleasm is down we can't
reproduce this, but when
> 	the service is up, we get the locking problem. The
only thing that I'm
> 	uncertain about is that when the raw service starts up
the raw devices
> 	are bound, but the permissions on those devices were
root:root when
> 	oracleasm started. Only after did I change the
permissions.  I'm going to 	
> 	try this test one more time in this sequence.
> 		1. bind the raw devices.
> 		2. set the proper permissions on those devices
> 		3. start the oracleasm service.
> 		4. do /etc/init.d/oracleasm/status and listdisks to
make sure that
> 		   everything looks correct.
> 		5. run a number of "dd" commands to some
local storage and see if
> 		   machine locks up.
> 		   prompt>  dd if=/dev/zero of=/z0/test/testthere3
bs=4k count=22000000
> 
> 	The frustrating thing is that the machine just locks
up and there is no logging. Also
> 	it requires that we go to the data center to
physically restart the machine.
> 
> 	The other thing is that our hardware is certified on
SLES 9 (SP3), but not on SLES 10. Again,
> 	I'm not show how important this is, but we can/might
try SLES 9 if we can't get this resolved.
> 	The certification bulletin for our hardware on SLES 9
is 83873.
> 	
> 	Here is the module information for ASM.
> 
> 	dbt1:~ # modinfo oracleasm
> 	filename:      
/lib/modules/2.6.16.27-0.9-smp/kernel/drivers/addon/oracleas
m/oracleasm.ko
> 	license:        GPL
> 	version:        2.0.3
> 	author:         Joel Becker <joel.beckeroracle.com>
> 	description:    Kernel driver backing the Generic
Linux ASM Library.
> 	vermagic:       2.6.16.27-0.9-smp SMP gcc-4.1
> 	depends:
> 	srcversion:     B35F9F20EF40931C318A5EA
> 
> 	Any ideas on how to troubleshoot this would be great!
> 
> 
> -peter
> 
> 
> Alexei_Roudnev wrote:
>>> Advice # 1 - drop asmlib and never use it. It
is useless piece of software.
>>> Linux have 'raw' which do the same but is
standard component, not omee made
>>> as asmlib.
>>>
>>> Then repeat tests again.
>>>
>>> ----- Original Message ----- 
>>> From: "Peter Santos" <psantoscheetahmail.com>
>>> To: <suse-oraclesuse.com>
>>> Sent: Monday, April 30, 2007 12:15 PM
>>> Subject: [suse-oracle] re: SEL 10 - Kernel
2.6.16.27.0.9 locks up
>>>
>>>
>>> Folks,
>>> I'm trying to find out how to go about
investigating an issue
>>> where our test server running 10.2.0.3 (x86_64)
is locking up when we run
>>>> a
>>> few dd commands sequentially (dd if=/dev/zero
of=/z0/test/testthere2 bs=4k
>>>> count=5000000) .. where /z0 was
>>> just some local storage.
>>>
>>> He did a kernel upgrade to version
2.6.16.27.0.9 a couple of weeks ago. We
>>>> then installed
>>> the following ASM packages on top of that.
>>>
>>> oracleasmlib-2.0.2-1.x86_64.rpm
>>> oracleasm-support-2.0.3-1.x86_64.rpm
>>> oracleasm-2.6.16.27-0.9-smp-2.0.3-1.x86_64.rpm
>>>
>>> We are using SEL 10 + 10.2.0.3 + ASM via
ASMLib.
>>>
>>> At random intervals the machine would crash
with no information in the
>>>> /var/log/messages. We ran a memory test
>>> on it and it was fine.  Finally our SA
recompiled the latest kernel from
>>>> source ( 2.6.21-smp) and after a number
>>> of "dd" tests ,the machine did NOT
crash.  With the latest kernel from
>>>> source, ASM was not started because of
>>> version mismatch!
>>>
>>> ASM may or may not be the problem, but what is
the best way to
>>>> troubleshoot this?
>>> The machine has the following spec:
>>> - Dell 6800  with 4 dual core CPUs (Intel(R)
Xeon(TM) CPU 2.60GHz )
>>> - Storage is DS4400
>>> - Storage Driver: Fibre Channel: QLogic Corp.
QLA2312 Fibre Channel
>>>> Adapter (rev 02)
>>> -peter
>>>
>>>
> --
> To unsubscribe, email: suse-oracle-unsubscribesuse.com
> For additional commands, email: suse-oracle-helpsuse.com
> Please see http://www.suse.com/oracl
e/ before posting
>>>>
> 
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org


iD8DBQFGOL63oyy5QBCjoT0RAsxaAJwLJsVT/W08N2l/C/gqRqUv/qONtQCe
PYqx
uqmvU6kXkneqzsF08gFSbUk=
=ZfIh
-----END PGP SIGNATURE-----

-- 
To unsubscribe, email: suse-oracle-unsubscribesuse.com
For additional commands, email: suse-oracle-helpsuse.com
Please see http://www.suse.com/oracl
e/ before posting


Re: re: SELS 10 - Kernel 2.6.16.27.0.9 locks up - Again.
user name
2007-05-02 12:01:33
Peter,

As you wrote this box is SLES9 (SP3) certified http://dev
eloper.novell.com/yes/83873.htm. I will suggest opening
support request either with Novell or Dell to find out If
there is any known issue with SLES10 and work with them to
figure out why it's locking with simple dd.

Another option, is try with latest build (RC3) of upcoming
SLES10 SP1. You can request this with Novell support.

-Arun 


>>> On 5/2/2007 at 9:39 AM, Peter Santos
<psantoscheetahmail.com> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> Alexei,
> 
> so I decided to turn off everything that was
"oracle" related and
> by running a couple of "dd" commands in
parallel, I got the machine to
> lock up again.
> 
> I know that you mentioned in a previous posting that
SLES 10 is just
> not production ready .. and I'm wondering if I'm just
hitting some sort
> of hardware issue.
> 
> One thing I did notice was the following in the
/var/log/messages ... which 
> is some sort of
> incompatibility with the dvd-rom, but from my research
I couldn't tell if 
> this could cause
> the machine to lock up.
> 
> 
> May  2 11:37:53 s_dgramdbt1 kernel: hda:
cdrom_pc_intr: The drive appears 
> confused (ireason = 0x01). Trying to recover
> by ending request.
> May  2 11:48:09 s_dgramdbt1 kernel: hda:
cdrom_pc_intr: The drive appears 
> confused (ireason = 0x01). Trying to recover
> by ending request.
> May  2 11:53:56 s_dgramdbt1 kernel: hda:
cdrom_pc_intr: The drive appears 
> confused (ireason = 0x01). Trying to recover
> by ending request.
> May  2 11:54:02 s_dgramdbt1 kernel: hda:
cdrom_pc_intr: The drive appears 
> confused (ireason = 0x01). Trying to recover
> by ending request.
> May  2 11:54:29 s_dgramdbt1 kernel: hda:
cdrom_pc_intr: The drive appears 
> confused (ireason = 0x01). Trying to recover
> by ending request.
> May  2 12:03:04 s_dgramdbt1 kernel: hda:
cdrom_pc_intr: The drive appears 
> confused (ireason = 0x01). Trying to recover
> by ending request.
> May  2 12:06:32 s_dgramdbt1 kernel: hda:
cdrom_pc_intr: The drive appears 
> confused (ireason = 0x01). Trying to recover
> by ending request.
> 
> We have another 3 node RAC cluster on SLES 9 (SP3), so
we just might go back 
> to that ...
> 
> - -peter
> 
> 
> Peter Santos wrote:
>> Alexei,
>> 	the reason we are using asmlib is because our
experience with managing
>> 	raw devices is limited and we don't want to run
into additional trouble
>> 	down the road.
>> 	
>> 	we've tried these tests over and over and it seems
that the machine just
>> 	locks up when we run consecutive "dd"
commands .. after about an hr the
>> 	machine locks up.  When the oracleasm is down we
can't reproduce this, but 
> when
>> 	the service is up, we get the locking problem. The
only thing that I'm
>> 	uncertain about is that when the raw service
starts up the raw devices
>> 	are bound, but the permissions on those devices
were root:root when
>> 	oracleasm started. Only after did I change the
permissions.  I'm going to 	
>> 	try this test one more time in this sequence.
>> 		1. bind the raw devices.
>> 		2. set the proper permissions on those devices
>> 		3. start the oracleasm service.
>> 		4. do /etc/init.d/oracleasm/status and listdisks
to make sure that
>> 		   everything looks correct.
>> 		5. run a number of "dd" commands to
some local storage and see if
>> 		   machine locks up.
>> 		   prompt>  dd if=/dev/zero
of=/z0/test/testthere3 bs=4k count=22000000
>> 
>> 	The frustrating thing is that the machine just
locks up and there is no 
> logging. Also
>> 	it requires that we go to the data center to
physically restart the machine.
>> 
>> 	The other thing is that our hardware is certified
on SLES 9 (SP3), but not 
> on SLES 10. Again,
>> 	I'm not show how important this is, but we
can/might try SLES 9 if we can't 
> get this resolved.
>> 	The certification bulletin for our hardware on
SLES 9 is 83873.
>> 	
>> 	Here is the module information for ASM.
>> 
>> 	dbt1:~ # modinfo oracleasm
>> 	filename:       
>
/lib/modules/2.6.16.27-0.9-smp/kernel/drivers/addon/oracleas
m/oracleasm.ko
>> 	license:        GPL
>> 	version:        2.0.3
>> 	author:         Joel Becker <joel.beckeroracle.com>
>> 	description:    Kernel driver backing the Generic
Linux ASM Library.
>> 	vermagic:       2.6.16.27-0.9-smp SMP gcc-4.1
>> 	depends:
>> 	srcversion:     B35F9F20EF40931C318A5EA
>> 
>> 	Any ideas on how to troubleshoot this would be
great!
>> 
>> 
>> -peter
>> 
>> 
>> Alexei_Roudnev wrote:
>>>> Advice # 1 - drop asmlib and never use it.
It is useless piece of software.
>>>> Linux have 'raw' which do the same but is
standard component, not omee made
>>>> as asmlib.
>>>>
>>>> Then repeat tests again.
>>>>
>>>> ----- Original Message ----- 
>>>> From: "Peter Santos"
<psantoscheetahmail.com>
>>>> To: <suse-oraclesuse.com>
>>>> Sent: Monday, April 30, 2007 12:15 PM
>>>> Subject: [suse-oracle] re: SEL 10 - Kernel
2.6.16.27.0.9 locks up
>>>>
>>>>
>>>> Folks,
>>>> I'm trying to find out how to go about
investigating an issue
>>>> where our test server running 10.2.0.3
(x86_64) is locking up when we run
>>>>> a
>>>> few dd commands sequentially (dd
if=/dev/zero of=/z0/test/testthere2 bs=4k
>>>>> count=5000000) .. where /z0 was
>>>> just some local storage.
>>>>
>>>> He did a kernel upgrade to version
2.6.16.27.0.9 a couple of weeks ago. We
>>>>> then installed
>>>> the following ASM packages on top of that.
>>>>
>>>> oracleasmlib-2.0.2-1.x86_64.rpm
>>>> oracleasm-support-2.0.3-1.x86_64.rpm
>>>>
oracleasm-2.6.16.27-0.9-smp-2.0.3-1.x86_64.rpm
>>>>
>>>> We are using SEL 10 + 10.2.0.3 + ASM via
ASMLib.
>>>>
>>>> At random intervals the machine would crash
with no information in the
>>>>> /var/log/messages. We ran a memory
test
>>>> on it and it was fine.  Finally our SA
recompiled the latest kernel from
>>>>> source ( 2.6.21-smp) and after a
number
>>>> of "dd" tests ,the machine did
NOT crash.  With the latest kernel from
>>>>> source, ASM was not started because of
>>>> version mismatch!
>>>>
>>>> ASM may or may not be the problem, but what
is the best way to
>>>>> troubleshoot this?
>>>> The machine has the following spec:
>>>> - Dell 6800  with 4 dual core CPUs
(Intel(R) Xeon(TM) CPU 2.60GHz )
>>>> - Storage is DS4400
>>>> - Storage Driver: Fibre Channel: QLogic
Corp. QLA2312 Fibre Channel
>>>>> Adapter (rev 02)
>>>> -peter
>>>>
>>>>
>> --
>> To unsubscribe, email: suse-oracle-unsubscribesuse.com

>> For additional commands, email:
suse-oracle-helpsuse.com 
>> Please see http://www.suse.com/oracl
e/ before posting
>>>>>
>> 
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.1 (GNU/Linux)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
 
> 
>
iD8DBQFGOL63oyy5QBCjoT0RAsxaAJwLJsVT/W08N2l/C/gqRqUv/qONtQCe
PYqx
> uqmvU6kXkneqzsF08gFSbUk=
> =ZfIh
> -----END PGP SIGNATURE-----




--
To unsubscribe, email: suse-oracle-unsubscribesuse.com
For additional commands, email: suse-oracle-helpsuse.com
Please see http://www.suse.com/oracl
e/ before posting


Re: re: SELS 10 - Kernel 2.6.16.27.0.9 locks up - Again.
user name
2007-05-02 12:50:37
Try SP1 (ask for RC3) first of all. It have a sugnificant
improvements vs
SLES10 release.

(

As I said before:
SLES10 release is in reality SLES10 open beta
SLES10 SP1 will be in reality SLES10 first real release.

The same was with SLES9

SLES9 release had a quality of beta (was not stable, had a
critical VM bugs,
had a compatibility problems.)
SLES9 SP1 became a first production-ready version.

Why should we expect a difference with SLES10? We are in
unofficial
beta-stage now until the middle of the May /when SP1 wil be
released/.
My experiments with both SLEs10 and SLES10 Sp1 proved it for
me.

).

----- Original Message ----- 
From: "Peter Santos" <psantoscheetahmail.com>
To: "Alexei_Roudnev" <Alexei_Roudnevexigengroup.com>;
<oracleasm-usersoss.oracle.com>
Cc: <suse-oraclesuse.com>
Sent: Wednesday, May 02, 2007 9:39 AM
Subject: Re: [suse-oracle] re: SELS 10 - Kernel
2.6.16.27.0.9 locks up -
Again.


> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Alexei,
>
> so I decided to turn off everything that was
"oracle" related and
> by running a couple of "dd" commands in
parallel, I got the machine to
> lock up again.
>
> I know that you mentioned in a previous posting that
SLES 10 is just
> not production ready .. and I'm wondering if I'm just
hitting some sort
> of hardware issue.
>
> One thing I did notice was the following in the
/var/log/messages ...
which is some sort of
> incompatibility with the dvd-rom, but from my research
I couldn't tell if
this could cause
> the machine to lock up.
>
>
> May  2 11:37:53 s_dgramdbt1 kernel: hda:
cdrom_pc_intr: The drive appears
confused (ireason = 0x01). Trying to recover
> by ending request.
> May  2 11:48:09 s_dgramdbt1 kernel: hda:
cdrom_pc_intr: The drive appears
confused (ireason = 0x01). Trying to recover
> by ending request.
> May  2 11:53:56 s_dgramdbt1 kernel: hda:
cdrom_pc_intr: The drive appears
confused (ireason = 0x01). Trying to recover
> by ending request.
> May  2 11:54:02 s_dgramdbt1 kernel: hda:
cdrom_pc_intr: The drive appears
confused (ireason = 0x01). Trying to recover
> by ending request.
> May  2 11:54:29 s_dgramdbt1 kernel: hda:
cdrom_pc_intr: The drive appears
confused (ireason = 0x01). Trying to recover
> by ending request.
> May  2 12:03:04 s_dgramdbt1 kernel: hda:
cdrom_pc_intr: The drive appears
confused (ireason = 0x01). Trying to recover
> by ending request.
> May  2 12:06:32 s_dgramdbt1 kernel: hda:
cdrom_pc_intr: The drive appears
confused (ireason = 0x01). Trying to recover
> by ending request.
>
> We have another 3 node RAC cluster on SLES 9 (SP3), so
we just might go
back to that ...
>
> - -peter
>
>
> Peter Santos wrote:
> > Alexei,
> > the reason we are using asmlib is because our
experience with managing
> > raw devices is limited and we don't want to run
into additional trouble
> > down the road.
> >
> > we've tried these tests over and over and it seems
that the machine just
> > locks up when we run consecutive "dd"
commands .. after about an hr the
> > machine locks up.  When the oracleasm is down we
can't reproduce this,
but when
> > the service is up, we get the locking problem. The
only thing that I'm
> > uncertain about is that when the raw service
starts up the raw devices
> > are bound, but the permissions on those devices
were root:root when
> > oracleasm started. Only after did I change the
permissions.  I'm going
to
> > try this test one more time in this sequence.
> > 1. bind the raw devices.
> > 2. set the proper permissions on those devices
> > 3. start the oracleasm service.
> > 4. do /etc/init.d/oracleasm/status and listdisks
to make sure that
> >    everything looks correct.
> > 5. run a number of "dd" commands to some
local storage and see if
> >    machine locks up.
> >    prompt>  dd if=/dev/zero
of=/z0/test/testthere3 bs=4k count=22000000
> >
> > The frustrating thing is that the machine just
locks up and there is no
logging. Also
> > it requires that we go to the data center to
physically restart the
machine.
> >
> > The other thing is that our hardware is certified
on SLES 9 (SP3), but
not on SLES 10. Again,
> > I'm not show how important this is, but we
can/might try SLES 9 if we
can't get this resolved.
> > The certification bulletin for our hardware on
SLES 9 is 83873.
> >
> > Here is the module information for ASM.
> >
> > dbt1:~ # modinfo oracleasm
> > filename:
/lib/modules/2.6.16.27-0.9-smp/kernel/drivers/addon/oracleas
m/oracleasm.ko
> > license:        GPL
> > version:        2.0.3
> > author:         Joel Becker <joel.beckeroracle.com>
> > description:    Kernel driver backing the Generic
Linux ASM Library.
> > vermagic:       2.6.16.27-0.9-smp SMP gcc-4.1
> > depends:
> > srcversion:     B35F9F20EF40931C318A5EA
> >
> > Any ideas on how to troubleshoot this would be
great!
> >
> >
> > -peter
> >
> >
> > Alexei_Roudnev wrote:
> >>> Advice # 1 - drop asmlib and never use it.
It is useless piece of
software.
> >>> Linux have 'raw' which do the same but is
standard component, not omee
made
> >>> as asmlib.
> >>>
> >>> Then repeat tests again.
> >>>
> >>> ----- Original Message ----- 
> >>> From: "Peter Santos"
<psantoscheetahmail.com>
> >>> To: <suse-oraclesuse.com>
> >>> Sent: Monday, April 30, 2007 12:15 PM
> >>> Subject: [suse-oracle] re: SEL 10 - Kernel
2.6.16.27.0.9 locks up
> >>>
> >>>
> >>> Folks,
> >>> I'm trying to find out how to go about
investigating an issue
> >>> where our test server running 10.2.0.3
(x86_64) is locking up when we
run
> >>>> a
> >>> few dd commands sequentially (dd
if=/dev/zero of=/z0/test/testthere2
bs=4k
> >>>> count=5000000) .. where /z0 was
> >>> just some local storage.
> >>>
> >>> He did a kernel upgrade to version
2.6.16.27.0.9 a couple of weeks
ago. We
> >>>> then installed
> >>> the following ASM packages on top of
that.
> >>>
> >>> oracleasmlib-2.0.2-1.x86_64.rpm
> >>> oracleasm-support-2.0.3-1.x86_64.rpm
> >>>
oracleasm-2.6.16.27-0.9-smp-2.0.3-1.x86_64.rpm
> >>>
> >>> We are using SEL 10 + 10.2.0.3 + ASM via
ASMLib.
> >>>
> >>> At random intervals the machine would
crash with no information in the
> >>>> /var/log/messages. We ran a memory
test
> >>> on it and it was fine.  Finally our SA
recompiled the latest kernel
from
> >>>> source ( 2.6.21-smp) and after a
number
> >>> of "dd" tests ,the machine did
NOT crash.  With the latest kernel from
> >>>> source, ASM was not started because
of
> >>> version mismatch!
> >>>
> >>> ASM may or may not be the problem, but
what is the best way to
> >>>> troubleshoot this?
> >>> The machine has the following spec:
> >>> - Dell 6800  with 4 dual core CPUs
(Intel(R) Xeon(TM) CPU 2.60GHz )
> >>> - Storage is DS4400
> >>> - Storage Driver: Fibre Channel: QLogic
Corp. QLA2312 Fibre Channel
> >>>> Adapter (rev 02)
> >>> -peter
> >>>
> >>>
> > --
> > To unsubscribe, email: suse-oracle-unsubscribesuse.com
> > For additional commands, email:
suse-oracle-helpsuse.com
> > Please see http://www.suse.com/oracl
e/ before posting
> >>>>
> >
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.1 (GNU/Linux)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

>
>
iD8DBQFGOL63oyy5QBCjoT0RAsxaAJwLJsVT/W08N2l/C/gqRqUv/qONtQCe
PYqx
> uqmvU6kXkneqzsF08gFSbUk=
> =ZfIh
> -----END PGP SIGNATURE-----
>


-- 
To unsubscribe, email: suse-oracle-unsubscribesuse.com
For additional commands, email: suse-oracle-helpsuse.com
Please see http://www.suse.com/oracl
e/ before posting


[1-6]

about | contact  Other archives ( Real Estate discussion Medical topics )