|
List Info
Thread: Re: Re: Difference between OCF_ERR_CONFIGURED and OCF_ERR_INSTALLED ?
|
|
| Re: Re: Difference between
OCF_ERR_CONFIGURED and OCF_ERR_INSTALLED
? |

|
2008-07-08 08:09:37 |
On Tue, Jul 8, 2008 at 15:03, Andrew Beekhof <beekhof gmail.com> wrote:
> On Fri, Jul 4, 2008 at 16:52, Joe Bill
<pica1dilly yahoo.com> wrote:
>>
>>>--- On Fri, 7/4/08, Andrew Beekhof
<beekhof gmail.com> wrote:
>>>> Exatcly how does heartbeat handle
OCF_ERR_CONFIGURED and
>>>> OCF_ERR_INSTALLED differently ?
>>>
>> >From some badly formatted and not-quite
finished documentation:
>>>
>>>soft = stop and retry
>>>hard = stop and retry - current node is
excluded
>>>fatal = stop - all nodes are excluded
>>
>> Taking the opportunity then that the documentation
is not yet finished, I would like to make the following
suggestions:
>>
>> - "soft" be changed to "error,
unexpected"
>>
>> - "hard" be changed to "fatal,
local" or "critical, local", or "fatal,
node" or "critical, node" because we have
diagnosed that the resource at fault is local to the node
where it has been detected on
>>
>> - "fatal" be changed to "fatal,
common" or "critical, common" or "fatal,
cluster" or "critical, cluster" because we
have diagnosed that the resource at fault is common to all
nodes in the cluster.
>>
>>>5 The requested agent or tool required by the
agent is
>>> not installed. hard
>>
>> I believe "resource configuration" to be
more appropriate here. HA shouldn't care at this point if
it's a piece of software or local configuration file that is
missing or screwed.
>>
>> add:
>>
>> - or the resource's local configuration,
>> - or the node's specific configuration ... are
invalid.
>>
>>>6 The resource's configuration is invalid.
fatal
>>
>> I believe "instance configuration" to be
more appropriate here,
>>
>> replace with:
>>
>> - the instance's configuration (common, shared,
clusterwide resource configuration) is invalid,
>> - or the resource agent has detected a severe
internal (programming,code) error.
>
> makes sense
>
>>
>>
>> Regarding the mnemonics of the return codes...
>>
>> >From your notes above, it seems the status
definitions appear to be more related to the restart and
blocking effect the HA supervisor has on resources, than
what the current mnemonics attempt to describe as
situation.
>>
>> I am not sure it is such a good idea to attempt to
combine a condition with the condition's handling action in
the process of defining states that are to be reported to
the supervisor.
>
> Not sure I follow this...
>
>>
>> >From what you provided as description, is it
i.e. the supervisor's concern, and will the supervisor
attempt anything to address the cause, or for that matter do
anything different if it receives any of the following
status: OCF_ERR_UNIMPLEMENTED, OCF_ERR_PERM,
OCF_ERR_INSTALLED ?
>>
>> Same question for OCF_ERR_ARGS and
OCF_ERR_CONFIGURED ?
>>
>> Now the problem starts when I want to describe a
condition where a resource needs an internal ( fixed name,
not specified as resource parameter) file but file is
missing on one host and not on others. Which condition would
you choose ?
>
> OCF_ERR_ARGS i guess - since that would exclude the
failed node but not the others.
oops, args doesn't do this.
probably OCF_ERR_INSTALLED then. or maybe one of
OCF_ERR_ARGS and
OCF_ERR_CONFIGURED needs to be made fatal.
> if the file isn't available anywhere, then the resource
will be tried
> once on each node and give up.
>
>> Then the situation where a filename is specified as
resource parameter but that file does not exist on one host.
Is it an OCF_ERR_INSTALLED error, or a OCF_ERR_CONFIGURED
error, why not an OCF_ERR_ARGS ? Can I even diagnose a
OCF_ERR_ARGS when running the resource agent on only one
node if that file DOES exist on other nodes ? How is that
resource agent going to check on the another nodes and see
that the file does exist there ?
>
> why would you try and do this? just let it fail once
on each node.
> OCF_ERR_CONFIGURED should only be used when the inputs
are so bad that
> the resource wont be able to run anywhere (ie.
"file" is mandatory but
> no value was specified)
>
_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
a>
Home Page: http://linux-ha.org/
|
|
| Re: Re: Difference between
OCF_ERR_CONFIGURED and OCF_ERR_INSTALLED
? |

|
2008-07-08 08:19:04 |
On Tue, Jul 8, 2008 at 15:09, Andrew Beekhof <beekhof gmail.com> wrote:
> On Tue, Jul 8, 2008 at 15:03, Andrew Beekhof
<beekhof gmail.com> wrote:
>> On Fri, Jul 4, 2008 at 16:52, Joe Bill
<pica1dilly yahoo.com> wrote:
>>>
>>>>--- On Fri, 7/4/08, Andrew Beekhof
<beekhof gmail.com> wrote:
>>>>> Exatcly how does heartbeat handle
OCF_ERR_CONFIGURED and
>>>>> OCF_ERR_INSTALLED differently ?
>>>>
>>> >From some badly formatted and not-quite
finished documentation:
>>>>
>>>>soft = stop and retry
>>>>hard = stop and retry - current node is
excluded
>>>>fatal = stop - all nodes are excluded
>>>
>>> Taking the opportunity then that the
documentation is not yet finished, I would like to make the
following suggestions:
>>>
>>> - "soft" be changed to "error,
unexpected"
>>>
>>> - "hard" be changed to "fatal,
local" or "critical, local", or "fatal,
node" or "critical, node" because we have
diagnosed that the resource at fault is local to the node
where it has been detected on
>>>
>>> - "fatal" be changed to "fatal,
common" or "critical, common" or "fatal,
cluster" or "critical, cluster" because we
have diagnosed that the resource at fault is common to all
nodes in the cluster.
>>>
>>>>5 The requested agent or tool required by
the agent is
>>>> not installed. hard
>>>
>>> I believe "resource configuration" to
be more appropriate here. HA shouldn't care at this point if
it's a piece of software or local configuration file that is
missing or screwed.
>>>
>>> add:
>>>
>>> - or the resource's local configuration,
>>> - or the node's specific configuration ... are
invalid.
>>>
>>>>6 The resource's configuration is invalid.
fatal
>>>
>>> I believe "instance configuration" to
be more appropriate here,
>>>
>>> replace with:
>>>
>>> - the instance's configuration (common, shared,
clusterwide resource configuration) is invalid,
>>> - or the resource agent has detected a severe
internal (programming,code) error.
>>
>> makes sense
>>
>>>
>>>
>>> Regarding the mnemonics of the return codes...
>>>
>>> >From your notes above, it seems the status
definitions appear to be more related to the restart and
blocking effect the HA supervisor has on resources, than
what the current mnemonics attempt to describe as
situation.
>>>
>>> I am not sure it is such a good idea to attempt
to combine a condition with the condition's handling action
in the process of defining states that are to be reported to
the supervisor.
>>
>> Not sure I follow this...
>>
>>>
>>> >From what you provided as description, is
it i.e. the supervisor's concern, and will the supervisor
attempt anything to address the cause, or for that matter do
anything different if it receives any of the following
status: OCF_ERR_UNIMPLEMENTED, OCF_ERR_PERM,
OCF_ERR_INSTALLED ?
>>>
>>> Same question for OCF_ERR_ARGS and
OCF_ERR_CONFIGURED ?
>>>
>>> Now the problem starts when I want to describe
a condition where a resource needs an internal ( fixed name,
not specified as resource parameter) file but file is
missing on one host and not on others. Which condition would
you choose ?
>>
>> OCF_ERR_ARGS i guess - since that would exclude the
failed node but not the others.
>
> oops, args doesn't do this.
> probably OCF_ERR_INSTALLED then. or maybe one of
OCF_ERR_ARGS and
> OCF_ERR_CONFIGURED needs to be made fatal.
brain not working today... of course I meant
"hard". and having
looked at everything again, i think this is the right
approach.
So from now on OCF_ERR_ARGS will be a "hard" error
instead of a "fatal" one.
_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
a>
Home Page: http://linux-ha.org/
|
|
[1-2]
|
|