|
List Info
Thread: re: SEL 10 - Kernel 2.6.16.27.0.9 locks up
|
|
| re: SEL 10 - Kernel 2.6.16.27.0.9 locks
up |

|
2007-04-30 14:15:38 |
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Folks,
I'm trying to find out how to go about investigating an
issue
where our test server running 10.2.0.3 (x86_64) is locking
up when we run a
few dd commands sequentially (dd if=/dev/zero
of=/z0/test/testthere2 bs=4k count=5000000) .. where /z0
was
just some local storage.
He did a kernel upgrade to version 2.6.16.27.0.9 a couple
of weeks ago. We then installed
the following ASM packages on top of that.
oracleasmlib-2.0.2-1.x86_64.rpm
oracleasm-support-2.0.3-1.x86_64.rpm
oracleasm-2.6.16.27-0.9-smp-2.0.3-1.x86_64.rpm
We are using SEL 10 + 10.2.0.3 + ASM via ASMLib.
At random intervals the machine would crash with no
information in the /var/log/messages. We ran a memory test
on it and it was fine. Finally our SA recompiled the
latest kernel from source ( 2.6.21-smp) and after a number
of "dd" tests ,the machine did NOT crash. With
the latest kernel from source, ASM was not started because
of
version mismatch!
ASM may or may not be the problem, but what is the best way
to troubleshoot this?
The machine has the following spec:
- Dell 6800 with 4 dual core CPUs (Intel(R) Xeon(TM) CPU
2.60GHz )
- Storage is DS4400
- Storage Driver: Fibre Channel: QLogic Corp. QLA2312 Fibre
Channel Adapter (rev 02)
- -peter
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFGNkBaoyy5QBCjoT0RAlE7AJ41HhYsiyUpY6GN+8gFoPfif+YNnwCf
anAc
Y9MmZW4vEL4nTTLihLflJzI=
=LO3p
-----END PGP SIGNATURE-----
--
To unsubscribe, email: suse-oracle-unsubscribe suse.com
For additional commands, email: suse-oracle-help suse.com
Please see http://www.suse.com/oracl
e/ before posting
|
|
| Re: re: SEL 10 - Kernel 2.6.16.27.0.9
locks up |

|
2007-04-30 14:45:06 |
Advice # 1 - drop asmlib and never use it. It is useless
piece of software.
Linux have 'raw' which do the same but is standard
component, not omee made
as asmlib.
Then repeat tests again.
----- Original Message -----
From: "Peter Santos" <psantos cheetahmail.com>
To: <suse-oracle suse.com>
Sent: Monday, April 30, 2007 12:15 PM
Subject: [suse-oracle] re: SEL 10 - Kernel 2.6.16.27.0.9
locks up
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Folks,
> I'm trying to find out how to go about investigating an
issue
> where our test server running 10.2.0.3 (x86_64) is
locking up when we run
a
> few dd commands sequentially (dd if=/dev/zero
of=/z0/test/testthere2 bs=4k
count=5000000) .. where /z0 was
> just some local storage.
>
> He did a kernel upgrade to version 2.6.16.27.0.9 a
couple of weeks ago. We
then installed
> the following ASM packages on top of that.
>
> oracleasmlib-2.0.2-1.x86_64.rpm
> oracleasm-support-2.0.3-1.x86_64.rpm
> oracleasm-2.6.16.27-0.9-smp-2.0.3-1.x86_64.rpm
>
> We are using SEL 10 + 10.2.0.3 + ASM via ASMLib.
>
> At random intervals the machine would crash with no
information in the
/var/log/messages. We ran a memory test
> on it and it was fine. Finally our SA recompiled the
latest kernel from
source ( 2.6.21-smp) and after a number
> of "dd" tests ,the machine did NOT crash.
With the latest kernel from
source, ASM was not started because of
> version mismatch!
>
> ASM may or may not be the problem, but what is the best
way to
troubleshoot this?
>
> The machine has the following spec:
> - Dell 6800 with 4 dual core CPUs (Intel(R) Xeon(TM)
CPU 2.60GHz )
> - Storage is DS4400
> - Storage Driver: Fibre Channel: QLogic Corp. QLA2312
Fibre Channel
Adapter (rev 02)
>
>
> - -peter
>
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.1 (GNU/Linux)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
>
>
iD8DBQFGNkBaoyy5QBCjoT0RAlE7AJ41HhYsiyUpY6GN+8gFoPfif+YNnwCf
anAc
> Y9MmZW4vEL4nTTLihLflJzI=
> =LO3p
> -----END PGP SIGNATURE-----
>
> --
> To unsubscribe, email: suse-oracle-unsubscribe suse.com
> For additional commands, email: suse-oracle-help suse.com
> Please see http://www.suse.com/oracl
e/ before posting
>
>
--
To unsubscribe, email: suse-oracle-unsubscribe suse.com
For additional commands, email: suse-oracle-help suse.com
Please see http://www.suse.com/oracl
e/ before posting
|
|
| Re: re: SELS 10 - Kernel 2.6.16.27.0.9
locks up - Again. |

|
2007-05-02 07:36:36 |
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Alexei,
the reason we are using asmlib is because our experience
with managing
raw devices is limited and we don't want to run into
additional trouble
down the road.
we've tried these tests over and over and it seems that the
machine just
locks up when we run consecutive "dd" commands ..
after about an hr the
machine locks up. When the oracleasm is down we can't
reproduce this, but when
the service is up, we get the locking problem. The only
thing that I'm
uncertain about is that when the raw service starts up the
raw devices
are bound, but the permissions on those devices were
root:root when
oracleasm started. Only after did I change the permissions.
I'm going to
try this test one more time in this sequence.
1. bind the raw devices.
2. set the proper permissions on those devices
3. start the oracleasm service.
4. do /etc/init.d/oracleasm/status and listdisks to make
sure that
everything looks correct.
5. run a number of "dd" commands to some local
storage and see if
machine locks up.
prompt> dd if=/dev/zero of=/z0/test/testthere3
bs=4k count=22000000
The frustrating thing is that the machine just locks up and
there is no logging. Also
it requires that we go to the data center to physically
restart the machine.
The other thing is that our hardware is certified on SLES 9
(SP3), but not on SLES 10. Again,
I'm not show how important this is, but we can/might try
SLES 9 if we can't get this resolved.
The certification bulletin for our hardware on SLES 9 is
83873.
Here is the module information for ASM.
dbt1:~ # modinfo oracleasm
filename:
/lib/modules/2.6.16.27-0.9-smp/kernel/drivers/addon/oracleas
m/oracleasm.ko
license: GPL
version: 2.0.3
author: Joel Becker <joel.becker oracle.com>
description: Kernel driver backing the Generic Linux ASM
Library.
vermagic: 2.6.16.27-0.9-smp SMP gcc-4.1
depends:
srcversion: B35F9F20EF40931C318A5EA
Any ideas on how to troubleshoot this would be great!
- -peter
Alexei_Roudnev wrote:
> Advice # 1 - drop asmlib and never use it. It is
useless piece of software.
> Linux have 'raw' which do the same but is standard
component, not omee made
> as asmlib.
>
> Then repeat tests again.
>
> ----- Original Message -----
> From: "Peter Santos" <psantos cheetahmail.com>
> To: <suse-oracle suse.com>
> Sent: Monday, April 30, 2007 12:15 PM
> Subject: [suse-oracle] re: SEL 10 - Kernel
2.6.16.27.0.9 locks up
>
>
> Folks,
> I'm trying to find out how to go about investigating an
issue
> where our test server running 10.2.0.3 (x86_64) is
locking up when we run
>> a
> few dd commands sequentially (dd if=/dev/zero
of=/z0/test/testthere2 bs=4k
>> count=5000000) .. where /z0 was
> just some local storage.
>
> He did a kernel upgrade to version 2.6.16.27.0.9 a
couple of weeks ago. We
>> then installed
> the following ASM packages on top of that.
>
> oracleasmlib-2.0.2-1.x86_64.rpm
> oracleasm-support-2.0.3-1.x86_64.rpm
> oracleasm-2.6.16.27-0.9-smp-2.0.3-1.x86_64.rpm
>
> We are using SEL 10 + 10.2.0.3 + ASM via ASMLib.
>
> At random intervals the machine would crash with no
information in the
>> /var/log/messages. We ran a memory test
> on it and it was fine. Finally our SA recompiled the
latest kernel from
>> source ( 2.6.21-smp) and after a number
> of "dd" tests ,the machine did NOT crash.
With the latest kernel from
>> source, ASM was not started because of
> version mismatch!
>
> ASM may or may not be the problem, but what is the best
way to
>> troubleshoot this?
> The machine has the following spec:
> - Dell 6800 with 4 dual core CPUs (Intel(R) Xeon(TM)
CPU 2.60GHz )
> - Storage is DS4400
> - Storage Driver: Fibre Channel: QLogic Corp. QLA2312
Fibre Channel
>> Adapter (rev 02)
>
> -peter
>
>
>>
- --
To unsubscribe, email: suse-oracle-unsubscribe suse.com
For additional commands, email: suse-oracle-help suse.com
Please see http://www.suse.com/oracl
e/ before posting
>>
>>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFGOIXUoyy5QBCjoT0RAtF2AKCGy+d6+p/C88fQ2pbEYOOjmKIWZQCe
InqA
nhkQebGQE+Dz3tC3EpzhC/U=
=o4fN
-----END PGP SIGNATURE-----
--
To unsubscribe, email: suse-oracle-unsubscribe suse.com
For additional commands, email: suse-oracle-help suse.com
Please see http://www.suse.com/oracl
e/ before posting
|
|
| Re: re: SELS 10 - Kernel 2.6.16.27.0.9
locks up - Again. |

|
2007-05-02 11:39:19 |
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Alexei,
so I decided to turn off everything that was
"oracle" related and
by running a couple of "dd" commands in parallel,
I got the machine to
lock up again.
I know that you mentioned in a previous posting that SLES 10
is just
not production ready .. and I'm wondering if I'm just
hitting some sort
of hardware issue.
One thing I did notice was the following in the
/var/log/messages ... which is some sort of
incompatibility with the dvd-rom, but from my research I
couldn't tell if this could cause
the machine to lock up.
May 2 11:37:53 s_dgram dbt1 kernel: hda: cdrom_pc_intr: The drive
appears confused (ireason = 0x01). Trying to recover
by ending request.
May 2 11:48:09 s_dgram dbt1 kernel: hda: cdrom_pc_intr: The drive
appears confused (ireason = 0x01). Trying to recover
by ending request.
May 2 11:53:56 s_dgram dbt1 kernel: hda: cdrom_pc_intr: The drive
appears confused (ireason = 0x01). Trying to recover
by ending request.
May 2 11:54:02 s_dgram dbt1 kernel: hda: cdrom_pc_intr: The drive
appears confused (ireason = 0x01). Trying to recover
by ending request.
May 2 11:54:29 s_dgram dbt1 kernel: hda: cdrom_pc_intr: The drive
appears confused (ireason = 0x01). Trying to recover
by ending request.
May 2 12:03:04 s_dgram dbt1 kernel: hda: cdrom_pc_intr: The drive
appears confused (ireason = 0x01). Trying to recover
by ending request.
May 2 12:06:32 s_dgram dbt1 kernel: hda: cdrom_pc_intr: The drive
appears confused (ireason = 0x01). Trying to recover
by ending request.
We have another 3 node RAC cluster on SLES 9 (SP3), so we
just might go back to that ...
- -peter
Peter Santos wrote:
> Alexei,
> the reason we are using asmlib is because our
experience with managing
> raw devices is limited and we don't want to run into
additional trouble
> down the road.
>
> we've tried these tests over and over and it seems
that the machine just
> locks up when we run consecutive "dd"
commands .. after about an hr the
> machine locks up. When the oracleasm is down we can't
reproduce this, but when
> the service is up, we get the locking problem. The
only thing that I'm
> uncertain about is that when the raw service starts up
the raw devices
> are bound, but the permissions on those devices were
root:root when
> oracleasm started. Only after did I change the
permissions. I'm going to
> try this test one more time in this sequence.
> 1. bind the raw devices.
> 2. set the proper permissions on those devices
> 3. start the oracleasm service.
> 4. do /etc/init.d/oracleasm/status and listdisks to
make sure that
> everything looks correct.
> 5. run a number of "dd" commands to some
local storage and see if
> machine locks up.
> prompt> dd if=/dev/zero of=/z0/test/testthere3
bs=4k count=22000000
>
> The frustrating thing is that the machine just locks
up and there is no logging. Also
> it requires that we go to the data center to
physically restart the machine.
>
> The other thing is that our hardware is certified on
SLES 9 (SP3), but not on SLES 10. Again,
> I'm not show how important this is, but we can/might
try SLES 9 if we can't get this resolved.
> The certification bulletin for our hardware on SLES 9
is 83873.
>
> Here is the module information for ASM.
>
> dbt1:~ # modinfo oracleasm
> filename:
/lib/modules/2.6.16.27-0.9-smp/kernel/drivers/addon/oracleas
m/oracleasm.ko
> license: GPL
> version: 2.0.3
> author: Joel Becker <joel.becker oracle.com>
> description: Kernel driver backing the Generic
Linux ASM Library.
> vermagic: 2.6.16.27-0.9-smp SMP gcc-4.1
> depends:
> srcversion: B35F9F20EF40931C318A5EA
>
> Any ideas on how to troubleshoot this would be great!
>
>
> -peter
>
>
> Alexei_Roudnev wrote:
>>> Advice # 1 - drop asmlib and never use it. It
is useless piece of software.
>>> Linux have 'raw' which do the same but is
standard component, not omee made
>>> as asmlib.
>>>
>>> Then repeat tests again.
>>>
>>> ----- Original Message -----
>>> From: "Peter Santos" <psantos cheetahmail.com>
>>> To: <suse-oracle suse.com>
>>> Sent: Monday, April 30, 2007 12:15 PM
>>> Subject: [suse-oracle] re: SEL 10 - Kernel
2.6.16.27.0.9 locks up
>>>
>>>
>>> Folks,
>>> I'm trying to find out how to go about
investigating an issue
>>> where our test server running 10.2.0.3 (x86_64)
is locking up when we run
>>>> a
>>> few dd commands sequentially (dd if=/dev/zero
of=/z0/test/testthere2 bs=4k
>>>> count=5000000) .. where /z0 was
>>> just some local storage.
>>>
>>> He did a kernel upgrade to version
2.6.16.27.0.9 a couple of weeks ago. We
>>>> then installed
>>> the following ASM packages on top of that.
>>>
>>> oracleasmlib-2.0.2-1.x86_64.rpm
>>> oracleasm-support-2.0.3-1.x86_64.rpm
>>> oracleasm-2.6.16.27-0.9-smp-2.0.3-1.x86_64.rpm
>>>
>>> We are using SEL 10 + 10.2.0.3 + ASM via
ASMLib.
>>>
>>> At random intervals the machine would crash
with no information in the
>>>> /var/log/messages. We ran a memory test
>>> on it and it was fine. Finally our SA
recompiled the latest kernel from
>>>> source ( 2.6.21-smp) and after a number
>>> of "dd" tests ,the machine did NOT
crash. With the latest kernel from
>>>> source, ASM was not started because of
>>> version mismatch!
>>>
>>> ASM may or may not be the problem, but what is
the best way to
>>>> troubleshoot this?
>>> The machine has the following spec:
>>> - Dell 6800 with 4 dual core CPUs (Intel(R)
Xeon(TM) CPU 2.60GHz )
>>> - Storage is DS4400
>>> - Storage Driver: Fibre Channel: QLogic Corp.
QLA2312 Fibre Channel
>>>> Adapter (rev 02)
>>> -peter
>>>
>>>
> --
> To unsubscribe, email: suse-oracle-unsubscribe suse.com
> For additional commands, email: suse-oracle-help suse.com
> Please see http://www.suse.com/oracl
e/ before posting
>>>>
>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFGOL63oyy5QBCjoT0RAsxaAJwLJsVT/W08N2l/C/gqRqUv/qONtQCe
PYqx
uqmvU6kXkneqzsF08gFSbUk=
=ZfIh
-----END PGP SIGNATURE-----
--
To unsubscribe, email: suse-oracle-unsubscribe suse.com
For additional commands, email: suse-oracle-help suse.com
Please see http://www.suse.com/oracl
e/ before posting
|
|
| Re: re: SELS 10 - Kernel 2.6.16.27.0.9
locks up - Again. |

|
2007-05-02 12:01:33 |
Peter,
As you wrote this box is SLES9 (SP3) certified http://dev
eloper.novell.com/yes/83873.htm. I will suggest opening
support request either with Novell or Dell to find out If
there is any known issue with SLES10 and work with them to
figure out why it's locking with simple dd.
Another option, is try with latest build (RC3) of upcoming
SLES10 SP1. You can request this with Novell support.
-Arun
>>> On 5/2/2007 at 9:39 AM, Peter Santos
<psantos cheetahmail.com> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Alexei,
>
> so I decided to turn off everything that was
"oracle" related and
> by running a couple of "dd" commands in
parallel, I got the machine to
> lock up again.
>
> I know that you mentioned in a previous posting that
SLES 10 is just
> not production ready .. and I'm wondering if I'm just
hitting some sort
> of hardware issue.
>
> One thing I did notice was the following in the
/var/log/messages ... which
> is some sort of
> incompatibility with the dvd-rom, but from my research
I couldn't tell if
> this could cause
> the machine to lock up.
>
>
> May 2 11:37:53 s_dgram dbt1 kernel: hda:
cdrom_pc_intr: The drive appears
> confused (ireason = 0x01). Trying to recover
> by ending request.
> May 2 11:48:09 s_dgram dbt1 kernel: hda:
cdrom_pc_intr: The drive appears
> confused (ireason = 0x01). Trying to recover
> by ending request.
> May 2 11:53:56 s_dgram dbt1 kernel: hda:
cdrom_pc_intr: The drive appears
> confused (ireason = 0x01). Trying to recover
> by ending request.
> May 2 11:54:02 s_dgram dbt1 kernel: hda:
cdrom_pc_intr: The drive appears
> confused (ireason = 0x01). Trying to recover
> by ending request.
> May 2 11:54:29 s_dgram dbt1 kernel: hda:
cdrom_pc_intr: The drive appears
> confused (ireason = 0x01). Trying to recover
> by ending request.
> May 2 12:03:04 s_dgram dbt1 kernel: hda:
cdrom_pc_intr: The drive appears
> confused (ireason = 0x01). Trying to recover
> by ending request.
> May 2 12:06:32 s_dgram dbt1 kernel: hda:
cdrom_pc_intr: The drive appears
> confused (ireason = 0x01). Trying to recover
> by ending request.
>
> We have another 3 node RAC cluster on SLES 9 (SP3), so
we just might go back
> to that ...
>
> - -peter
>
>
> Peter Santos wrote:
>> Alexei,
>> the reason we are using asmlib is because our
experience with managing
>> raw devices is limited and we don't want to run
into additional trouble
>> down the road.
>>
>> we've tried these tests over and over and it seems
that the machine just
>> locks up when we run consecutive "dd"
commands .. after about an hr the
>> machine locks up. When the oracleasm is down we
can't reproduce this, but
> when
>> the service is up, we get the locking problem. The
only thing that I'm
>> uncertain about is that when the raw service
starts up the raw devices
>> are bound, but the permissions on those devices
were root:root when
>> oracleasm started. Only after did I change the
permissions. I'm going to
>> try this test one more time in this sequence.
>> 1. bind the raw devices.
>> 2. set the proper permissions on those devices
>> 3. start the oracleasm service.
>> 4. do /etc/init.d/oracleasm/status and listdisks
to make sure that
>> everything looks correct.
>> 5. run a number of "dd" commands to
some local storage and see if
>> machine locks up.
>> prompt> dd if=/dev/zero
of=/z0/test/testthere3 bs=4k count=22000000
>>
>> The frustrating thing is that the machine just
locks up and there is no
> logging. Also
>> it requires that we go to the data center to
physically restart the machine.
>>
>> The other thing is that our hardware is certified
on SLES 9 (SP3), but not
> on SLES 10. Again,
>> I'm not show how important this is, but we
can/might try SLES 9 if we can't
> get this resolved.
>> The certification bulletin for our hardware on
SLES 9 is 83873.
>>
>> Here is the module information for ASM.
>>
>> dbt1:~ # modinfo oracleasm
>> filename:
>
/lib/modules/2.6.16.27-0.9-smp/kernel/drivers/addon/oracleas
m/oracleasm.ko
>> license: GPL
>> version: 2.0.3
>> author: Joel Becker <joel.becker oracle.com>
>> description: Kernel driver backing the Generic
Linux ASM Library.
>> vermagic: 2.6.16.27-0.9-smp SMP gcc-4.1
>> depends:
>> srcversion: B35F9F20EF40931C318A5EA
>>
>> Any ideas on how to troubleshoot this would be
great!
>>
>>
>> -peter
>>
>>
>> Alexei_Roudnev wrote:
>>>> Advice # 1 - drop asmlib and never use it.
It is useless piece of software.
>>>> Linux have 'raw' which do the same but is
standard component, not omee made
>>>> as asmlib.
>>>>
>>>> Then repeat tests again.
>>>>
>>>> ----- Original Message -----
>>>> From: "Peter Santos"
<psantos cheetahmail.com>
>>>> To: <suse-oracle suse.com>
>>>> Sent: Monday, April 30, 2007 12:15 PM
>>>> Subject: [suse-oracle] re: SEL 10 - Kernel
2.6.16.27.0.9 locks up
>>>>
>>>>
>>>> Folks,
>>>> I'm trying to find out how to go about
investigating an issue
>>>> where our test server running 10.2.0.3
(x86_64) is locking up when we run
>>>>> a
>>>> few dd commands sequentially (dd
if=/dev/zero of=/z0/test/testthere2 bs=4k
>>>>> count=5000000) .. where /z0 was
>>>> just some local storage.
>>>>
>>>> He did a kernel upgrade to version
2.6.16.27.0.9 a couple of weeks ago. We
>>>>> then installed
>>>> the following ASM packages on top of that.
>>>>
>>>> oracleasmlib-2.0.2-1.x86_64.rpm
>>>> oracleasm-support-2.0.3-1.x86_64.rpm
>>>>
oracleasm-2.6.16.27-0.9-smp-2.0.3-1.x86_64.rpm
>>>>
>>>> We are using SEL 10 + 10.2.0.3 + ASM via
ASMLib.
>>>>
>>>> At random intervals the machine would crash
with no information in the
>>>>> /var/log/messages. We ran a memory
test
>>>> on it and it was fine. Finally our SA
recompiled the latest kernel from
>>>>> source ( 2.6.21-smp) and after a
number
>>>> of "dd" tests ,the machine did
NOT crash. With the latest kernel from
>>>>> source, ASM was not started because of
>>>> version mismatch!
>>>>
>>>> ASM may or may not be the problem, but what
is the best way to
>>>>> troubleshoot this?
>>>> The machine has the following spec:
>>>> - Dell 6800 with 4 dual core CPUs
(Intel(R) Xeon(TM) CPU 2.60GHz )
>>>> - Storage is DS4400
>>>> - Storage Driver: Fibre Channel: QLogic
Corp. QLA2312 Fibre Channel
>>>>> Adapter (rev 02)
>>>> -peter
>>>>
>>>>
>> --
>> To unsubscribe, email: suse-oracle-unsubscribe suse.com
>> For additional commands, email:
suse-oracle-help suse.com
>> Please see http://www.suse.com/oracl
e/ before posting
>>>>>
>>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.1 (GNU/Linux)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
>
>
iD8DBQFGOL63oyy5QBCjoT0RAsxaAJwLJsVT/W08N2l/C/gqRqUv/qONtQCe
PYqx
> uqmvU6kXkneqzsF08gFSbUk=
> =ZfIh
> -----END PGP SIGNATURE-----
--
To unsubscribe, email: suse-oracle-unsubscribe suse.com
For additional commands, email: suse-oracle-help suse.com
Please see http://www.suse.com/oracl
e/ before posting
|
|
| Re: re: SELS 10 - Kernel 2.6.16.27.0.9
locks up - Again. |

|
2007-05-02 12:50:37 |
Try SP1 (ask for RC3) first of all. It have a sugnificant
improvements vs
SLES10 release.
(
As I said before:
SLES10 release is in reality SLES10 open beta
SLES10 SP1 will be in reality SLES10 first real release.
The same was with SLES9
SLES9 release had a quality of beta (was not stable, had a
critical VM bugs,
had a compatibility problems.)
SLES9 SP1 became a first production-ready version.
Why should we expect a difference with SLES10? We are in
unofficial
beta-stage now until the middle of the May /when SP1 wil be
released/.
My experiments with both SLEs10 and SLES10 Sp1 proved it for
me.
).
----- Original Message -----
From: "Peter Santos" <psantos cheetahmail.com>
To: "Alexei_Roudnev" <Alexei_Roudnev exigengroup.com>;
<oracleasm-users oss.oracle.com>
Cc: <suse-oracle suse.com>
Sent: Wednesday, May 02, 2007 9:39 AM
Subject: Re: [suse-oracle] re: SELS 10 - Kernel
2.6.16.27.0.9 locks up -
Again.
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Alexei,
>
> so I decided to turn off everything that was
"oracle" related and
> by running a couple of "dd" commands in
parallel, I got the machine to
> lock up again.
>
> I know that you mentioned in a previous posting that
SLES 10 is just
> not production ready .. and I'm wondering if I'm just
hitting some sort
> of hardware issue.
>
> One thing I did notice was the following in the
/var/log/messages ...
which is some sort of
> incompatibility with the dvd-rom, but from my research
I couldn't tell if
this could cause
> the machine to lock up.
>
>
> May 2 11:37:53 s_dgram dbt1 kernel: hda:
cdrom_pc_intr: The drive appears
confused (ireason = 0x01). Trying to recover
> by ending request.
> May 2 11:48:09 s_dgram dbt1 kernel: hda:
cdrom_pc_intr: The drive appears
confused (ireason = 0x01). Trying to recover
> by ending request.
> May 2 11:53:56 s_dgram dbt1 kernel: hda:
cdrom_pc_intr: The drive appears
confused (ireason = 0x01). Trying to recover
> by ending request.
> May 2 11:54:02 s_dgram dbt1 kernel: hda:
cdrom_pc_intr: The drive appears
confused (ireason = 0x01). Trying to recover
> by ending request.
> May 2 11:54:29 s_dgram dbt1 kernel: hda:
cdrom_pc_intr: The drive appears
confused (ireason = 0x01). Trying to recover
> by ending request.
> May 2 12:03:04 s_dgram dbt1 kernel: hda:
cdrom_pc_intr: The drive appears
confused (ireason = 0x01). Trying to recover
> by ending request.
> May 2 12:06:32 s_dgram dbt1 kernel: hda:
cdrom_pc_intr: The drive appears
confused (ireason = 0x01). Trying to recover
> by ending request.
>
> We have another 3 node RAC cluster on SLES 9 (SP3), so
we just might go
back to that ...
>
> - -peter
>
>
> Peter Santos wrote:
> > Alexei,
> > the reason we are using asmlib is because our
experience with managing
> > raw devices is limited and we don't want to run
into additional trouble
> > down the road.
> >
> > we've tried these tests over and over and it seems
that the machine just
> > locks up when we run consecutive "dd"
commands .. after about an hr the
> > machine locks up. When the oracleasm is down we
can't reproduce this,
but when
> > the service is up, we get the locking problem. The
only thing that I'm
> > uncertain about is that when the raw service
starts up the raw devices
> > are bound, but the permissions on those devices
were root:root when
> > oracleasm started. Only after did I change the
permissions. I'm going
to
> > try this test one more time in this sequence.
> > 1. bind the raw devices.
> > 2. set the proper permissions on those devices
> > 3. start the oracleasm service.
> > 4. do /etc/init.d/oracleasm/status and listdisks
to make sure that
> > everything looks correct.
> > 5. run a number of "dd" commands to some
local storage and see if
> > machine locks up.
> > prompt> dd if=/dev/zero
of=/z0/test/testthere3 bs=4k count=22000000
> >
> > The frustrating thing is that the machine just
locks up and there is no
logging. Also
> > it requires that we go to the data center to
physically restart the
machine.
> >
> > The other thing is that our hardware is certified
on SLES 9 (SP3), but
not on SLES 10. Again,
> > I'm not show how important this is, but we
can/might try SLES 9 if we
can't get this resolved.
> > The certification bulletin for our hardware on
SLES 9 is 83873.
> >
> > Here is the module information for ASM.
> >
> > dbt1:~ # modinfo oracleasm
> > filename:
/lib/modules/2.6.16.27-0.9-smp/kernel/drivers/addon/oracleas
m/oracleasm.ko
> > license: GPL
> > version: 2.0.3
> > author: Joel Becker <joel.becker oracle.com>
> > description: Kernel driver backing the Generic
Linux ASM Library.
> > vermagic: 2.6.16.27-0.9-smp SMP gcc-4.1
> > depends:
> > srcversion: B35F9F20EF40931C318A5EA
> >
> > Any ideas on how to troubleshoot this would be
great!
> >
> >
> > -peter
> >
> >
> > Alexei_Roudnev wrote:
> >>> Advice # 1 - drop asmlib and never use it.
It is useless piece of
software.
> >>> Linux have 'raw' which do the same but is
standard component, not omee
made
> >>> as asmlib.
> >>>
> >>> Then repeat tests again.
> >>>
> >>> ----- Original Message -----
> >>> From: "Peter Santos"
<psantos cheetahmail.com>
> >>> To: <suse-oracle suse.com>
> >>> Sent: Monday, April 30, 2007 12:15 PM
> >>> Subject: [suse-oracle] re: SEL 10 - Kernel
2.6.16.27.0.9 locks up
> >>>
> >>>
> >>> Folks,
> >>> I'm trying to find out how to go about
investigating an issue
> >>> where our test server running 10.2.0.3
(x86_64) is locking up when we
run
> >>>> a
> >>> few dd commands sequentially (dd
if=/dev/zero of=/z0/test/testthere2
bs=4k
> >>>> count=5000000) .. where /z0 was
> >>> just some local storage.
> >>>
> >>> He did a kernel upgrade to version
2.6.16.27.0.9 a couple of weeks
ago. We
> >>>> then installed
> >>> the following ASM packages on top of
that.
> >>>
> >>> oracleasmlib-2.0.2-1.x86_64.rpm
> >>> oracleasm-support-2.0.3-1.x86_64.rpm
> >>>
oracleasm-2.6.16.27-0.9-smp-2.0.3-1.x86_64.rpm
> >>>
> >>> We are using SEL 10 + 10.2.0.3 + ASM via
ASMLib.
> >>>
> >>> At random intervals the machine would
crash with no information in the
> >>>> /var/log/messages. We ran a memory
test
> >>> on it and it was fine. Finally our SA
recompiled the latest kernel
from
> >>>> source ( 2.6.21-smp) and after a
number
> >>> of "dd" tests ,the machine did
NOT crash. With the latest kernel from
> >>>> source, ASM was not started because
of
> >>> version mismatch!
> >>>
> >>> ASM may or may not be the problem, but
what is the best way to
> >>>> troubleshoot this?
> >>> The machine has the following spec:
> >>> - Dell 6800 with 4 dual core CPUs
(Intel(R) Xeon(TM) CPU 2.60GHz )
> >>> - Storage is DS4400
> >>> - Storage Driver: Fibre Channel: QLogic
Corp. QLA2312 Fibre Channel
> >>>> Adapter (rev 02)
> >>> -peter
> >>>
> >>>
> > --
> > To unsubscribe, email: suse-oracle-unsubscribe suse.com
> > For additional commands, email:
suse-oracle-help suse.com
> > Please see http://www.suse.com/oracl
e/ before posting
> >>>>
> >
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.1 (GNU/Linux)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
>
>
iD8DBQFGOL63oyy5QBCjoT0RAsxaAJwLJsVT/W08N2l/C/gqRqUv/qONtQCe
PYqx
> uqmvU6kXkneqzsF08gFSbUk=
> =ZfIh
> -----END PGP SIGNATURE-----
>
--
To unsubscribe, email: suse-oracle-unsubscribe suse.com
For additional commands, email: suse-oracle-help suse.com
Please see http://www.suse.com/oracl
e/ before posting
|
|
[1-6]
|
|