List Info

Thread: PHP blinding for Lucy?




PHP blinding for Lucy?
user name
2007-02-27 07:44:36
Hello,

I know that Lucy planned to support Perl & Ruby but
still in a very
early stage.

Anyway, If I am going to implmenet a blinding for PHP, what
should I
know or need to prepare?

Is it still early to ask this question?


Thanks.

Re: PHP blinding for Lucy?
user name
2007-02-27 16:46:25
On Feb 27, 2007, at 5:44 AM, howard chen wrote:

> Hello,
>
> I know that Lucy planned to support Perl & Ruby but
still in a very
> early stage.
>
> Anyway, If I am going to implmenet a blinding for PHP,
what should I
> know or need to prepare?
>
> Is it still early to ask this question?

It's still a bit early, but stick around.

A couple of days ago, I managed to get the first dev release
of  
KinoSearch 0.20 (0.20_01) out the door.  A lot of the work
that has  
gone into it was informed by discussions here; the new
version is  
considerably closer to Ferret and to the theoretical
"Lucy" than the  
current KS release, 0.15.

One thing we're going to need is a way of communicating to
bindings  
authors what is and isn't public API.  I'm thinking we need
shared  
documentation.  XML, maybe?  Then each binding would require
an  
appropriate XML-to-whatever translation utility.

Marvin Humphrey
Rectangular Research
http://www.rectangular.co
m/



Re: PHP blinding for Lucy?
user name
2007-02-27 20:24:33
On 2/28/07, Marvin Humphrey <marvinrectangular.com> wrote:
>
> On Feb 27, 2007, at 5:44 AM, howard chen wrote:
>
> > Hello,
> >
> > I know that Lucy planned to support Perl &
Ruby but still in a very
> > early stage.
> >
> > Anyway, If I am going to implmenet a blinding for
PHP, what should I
> > know or need to prepare?
> >
> > Is it still early to ask this question?
>
> It's still a bit early, but stick around.
>
> A couple of days ago, I managed to get the first dev
release of
> KinoSearch 0.20 (0.20_01) out the door.  A lot of the
work that has
> gone into it was informed by discussions here; the new
version is
> considerably closer to Ferret and to the theoretical
"Lucy" than the
> current KS release, 0.15.
>
> One thing we're going to need is a way of communicating
to bindings
> authors what is and isn't public API.

How about just using doxygen. I don't have much experience
with it but
I'm pretty sure there would be a way to tag particular
functions that
are public so that when you generate the documentation you
can
generate only the public methods.

Of course you could also have public and private include
files.

> I'm thinking we need shared
> documentation.  XML, maybe?  Then each binding would
require an
> appropriate XML-to-whatever translation utility.

I'm not entirely sure I'm on the same wavelength as you
today. By
'whatever' do you mean the specific languages documentation
format? If
that is the case then I don't see this working as the ruby
API for
Lucy will probably be quite different to the PHP API. But,
maybe
you're talking about something completely different.

Cheers,
Dave

-- 
Dave Balmain
http://www.davebalmain.co
m/

Re: PHP blinding for Lucy?
user name
2007-02-27 22:30:19
On Feb 27, 2007, at 6:24 PM, David Balmain wrote:

> How about just using doxygen. I don't have much
experience with it but
> I'm pretty sure there would be a way to tag particular
functions that
> are public so that when you generate the documentation
you can
> generate only the public methods.

I don't know it well either, but I'm sure you're right and
it will  
allow us to put in a public/non-public tag.

It would be even better if we could export at least some of
the  
documentation -- particularly method descriptions.  I'd
really like  
to be able to synch up the Perl binding docs by running a
script  
rather than via copy-and-paste.

> Of course you could also have public and private
include files.

Hmm, can you elaborate?  I'd basically given up hope that
we'd be  
able to maintain tight control over symbol export, and was
expecting  
to define the API via documentation only.

>> I'm thinking we need shared
>> documentation.  XML, maybe?  Then each binding
would require an
>> appropriate XML-to-whatever translation utility.
>
> I'm not entirely sure I'm on the same wavelength as you
today. By
> 'whatever' do you mean the specific languages
documentation format?

Yes, that was what I was thinking.  But perhaps not quite so
 
ambitious as may have come across.

> If
> that is the case then I don't see this working as the
ruby API for
> Lucy will probably be quite different to the PHP API.

If we're reasonably careful about how we word things, many
method  
descriptions could be reused across all bindings.  And one
of the  
things about the naming convention we've settled on for
method  
invocations is that you can derive either lowerCamelCase or 

separated_by_underscores method names with a simple
transform:

    Sim_Length_Norm => lengthNorm
    Sim_Length_Norm => length_norm

If we tag every last thing, enough so that we could actually
 
generate, say, both POD and javadoc without intervention,
then sure,  
XML is wayyyy too verbose.  Anything would be, really,
because  
language syntaxes are too distinct.  But if we set our
sights a  
little lower, and just try to share method names, method  
descriptions, and public/non-public access control, that's
doable --  
and it's a whole lot of savings.  (Maybe parameter lists and
return  
values, too, but that's a little harder.)

   <method>
     <name>Sim_Length_Norm</name>
     <acl>public</acl>
     <description>
       Computes the normalization value for a field given
the total  
number of
       terms contained in a field. These values, together
with field  
boosts,
       are stored in an index and multipled into scores for
hits on  
each field
       by the search code.

       Matches in longer fields are less precise, so
implementations  
of this
       method usually return smaller values when numTokens
is large,  
and larger
       values when numTokens is small.

       That these values are computed under
IxWriter_Add_Document and  
stored
       then using Sim_Encode_Norm. Thus they have limited
precision, and
       documents must be re-indexed if this method is
altered.
     </description>
   </method>

Note the use of "IxWriter_Add_Document" and
"Sim_Encode_Norm" within  
the description.  Those method names are identifiable
patterns,  
matchable with this regex:

   # $1 is class nick, $2 is short method name
   /([A-Z][A-Za-z]+)_([A-Z]w+)/

It's easy to sub out IxWriter_Add_Document for this, which
will  
generate a nicely formatted link...

   
L<IndexWriter::add_document|Lucy::Index::IndexWriter/&quo
t;add_document">

Now, returning to your point about Doxygen... With XML, we'd
have to  
maintain separate files for the documentation, which would
suck.  So  
I'm all for using Doxygen, especially if we can rig things
up so that  
the description can be isolated and parsed out reliably.

I might go write an extractor tool which parses our header
files and  
generates intermediate XML.  Then bindings authors could
write their  
own final translation utilities in their language of choice,
and use  
as much or as little as they wish.

Hopefully they'd use more rather than less.  It's to the
user's  
benefit for various bindings to present reasonably
consistent APIs  
while still being idiomatic, because it makes it easier to
apply what  
you learned about one of them to another.

Marvin Humphrey
Rectangular Research
http://www.rectangular.co
m/



Re: PHP blinding for Lucy?
user name
2007-02-28 00:19:56
On 2/28/07, Marvin Humphrey <marvinrectangular.com> wrote:
> > Of course you could also have public and private
include files.
>
> Hmm, can you elaborate?  I'd basically given up hope
that we'd be
> able to maintain tight control over symbol export, and
was expecting
> to define the API via documentation only.

I haven't tried this before and I'm not sure if other large
C projects
do this but I was thinking you could have two sets of
include files;
one that contains the public API and the other used to
contain methods
used internally within Lucy. For example, an IndexWriter.h
and an
IndexWriter_private.h.

One way I am planning on cutting down on public methods is
by keeping
the unit tests in the same file as the source code,
surrounding them
with an #ifdef UNIT_TESTS. A lot of the methods in Ferret
would be
static except that I needed to make them available for unit
testing.

><snip/>
> Now, returning to your point about Doxygen... With XML,
we'd have to
> maintain separate files for the documentation, which
would suck.  So
> I'm all for using Doxygen, especially if we can rig
things up so that
> the description can be isolated and parsed out
reliably.
>
> I might go write an extractor tool which parses our
header files and
> generates intermediate XML.

Well, I definitely think it might be a good idea to look at
doxygen
and see if you can hook into its parser. I'm pretty sure it
parses the
C code as well as the comments so it might do most of the
work for
you.

> Then bindings authors could write their
> own final translation utilities in their language of
choice, and use
> as much or as little as they wish.
>
> Hopefully they'd use more rather than less.  It's to
the user's
> benefit for various bindings to present reasonably
consistent APIs
> while still being idiomatic, because it makes it easier
to apply what
> you learned about one of them to another.

I agree to a certain extent. However, when I released the
first
version of Ferret, a lot of people complained the interface
was too
Java-like. It can be difficult to find the happy medium
between making
the interface easy for people who used Lucy in a different
language
and people who their Ruby/Perl/PHP library to work in a
certain way.

But if you wanted to take this route (making it possible for
binding
authors to generate some/most of the binding from some XML
files), how
about using SWIG?

-- 
Dave Balmain
http://www.davebalmain.co
m/

Re: PHP blinding for Lucy?
user name
2007-02-28 05:18:24
On 2/28/07, Marvin Humphrey <marvinrectangular.com> wrote:
>
> On Feb 27, 2007, at 10:19 PM, David Balmain wrote:
>
> > On 2/28/07, Marvin Humphrey <marvinrectangular.com> wrote:
> >> > Of course you could also have public and
private include files.
> >>
> >> Hmm, can you elaborate?  I'd basically given
up hope that we'd be
> >> able to maintain tight control over symbol
export, and was expecting
> >> to define the API via documentation only.
> >
> > I haven't tried this before and I'm not sure if
other large C projects
> > do this but I was thinking you could have two sets
of include files;
> > one that contains the public API and the other
used to contain methods
> > used internally within Lucy. For example, an
IndexWriter.h and an
> > IndexWriter_private.h.
> >
> > One way I am planning on cutting down on public
methods is by keeping
> > the unit tests in the same file as the source
code, surrounding them
> > with an #ifdef UNIT_TESTS. A lot of the methods in
Ferret would be
> > static except that I needed to make them available
for unit testing.
>
> I guess I just don't care a whole lot.  Words to live
by, courtesy of
> Larry Wall: "Perl doesn't have an infatuation with
enforced privacy.
> It would prefer that you stayed out of its living room
because you
> weren't invited, not because it has a shotgun."

Couldn't agree more. My motivation for doing this would be
keeping the
public interface as minimal and simple as possible to avoid
confusion
rather than to act as access control. And it allows me to
keep my
static method names short but this would make no difference
with
Lucy's coding standards so I guess it is a moot point.

> > I agree to a certain extent. However, when I
released the first
> > version of Ferret, a lot of people complained the
interface was too
> > Java-like. It can be difficult to find the happy
medium between making
> > the interface easy for people who used Lucy in a
different language
> > and people who their Ruby/Perl/PHP library to work
in a certain way.
>
> Absolutely.  What I'm suggesting wouldn't stand in the
way of that.
> It would even lend superficial similarity to bindings
which are
> otherwise implemented very differently.

Well as long as what you are talking about doesn't get in
the way of
writing idiomatic bindings then it all sounds good to me. I
look
forward to seeing what you come up with.

Cheers,
Dave

-- 
Dave Balmain
http://www.davebalmain.co
m/

Re: PHP blinding for Lucy?
user name
2007-03-02 21:15:46
On Feb 27, 2007, at 6:24 PM, David Balmain wrote:

> How about just using doxygen. I don't have much
experience with it but
> I'm pretty sure there would be a way to tag particular
functions that
> are public so that when you generate the documentation
you can
> generate only the public methods.

 From the Doxygen manual:

   21.124 PHP only commands

   For PHP files there are a number of additional commands,
that can be
   used inside classes to make members public, private, or
protected  
even
   though the language itself doesn’t support this notion.

   To mark a single item use one of private, protected,
public. For
   starting a section with a certain protection level use
one of:
   privatesection, protectedsection, publicsection. The
latter  
commands
   are similar to ”private:”, ”protected:”, and ”public:” in
C++.

Too bad "package" isn't in that list.  KS has
three access levels:  
public, private, and "distro", which I made up and
means any file in  
the KinoSearch distribution can use it -- the equivalent of
Java  
package if the whole distro was in one package.  (I would
have called  
it "package" but that has a different meaning in
Perl.)

I'll bet Doxygen allows you to define your own keywords...
just  
haven't found that yet.

Marvin Humphrey
Rectangular Research
http://www.rectangular.co
m/



[1-7]

about | contact  Other archives ( Real Estate discussion Medical topics )