List Info

Thread: NLP was Apple Data Detectors




NLP was Apple Data Detectors
user name
2008-02-08 12:54:04
2008/2/8, Guillaume Lebleu <guillaumelebleu.org>:
> I understand the challenge of disambiguation and the
value microformats
> bring in terms of easier parser implementation and more
reliable
> information consumption experience.

--- without the explicit additional mark-up declaring
something to be
of a certain type, we are left with just Natural Language
Processing
(NLP = http://en.wikipedia.org/wiki/Natural_language_processing
)

This is a dangerous and slippery slope, NLP sounds like a
great idea
but has never attained the hype proceeding it. NLP is
language
specific, so while it might be great that Apple Data
Detectors work in
English, the NLP for all language makes the code explode
quickly.
Microformats, while requiring extra mark-up, can accommodate
for ISO
dates in any language.

Geonames has a service that attempts to find places and give
them Geo
Coordinates. You can judge for yourself how well NLP can or
can not
correctly extract data.

http://www.geonames.org/rss-to-georss-converter.html

I Want Sandy attempts to parse dates and times, but usually
needs some
help or a well structured format.  http://iwantsandy.com/
while not
impossible todo, you end-up writing in a way that isn't
natural.

> The challenge for average people
> writing microformats can't be underestimated though. I
strongly believe
> that the time where disambiguation costs are the lowest
are at
> publishing time, but this is also the time where you
are focused on the
> english content, not the microformats.

The dangers of these are that you are attempting to
"have your cake
and eat it too". There will always be an effort on
someone's part to
explain this data. AI is not, and probably won't get there
any time
soon, so microformats are the lightest-weight way to add
the
information needed to help machines without over burdening
the
publisher.

The ideal solution would be for somesort of plugin in the
CMS so you
can simply highlight areas and push a button and it will add
the
microformatted information, or (like Microsoft Writer) have
a
hCalendar Plugin, so you fill out the forms and it puts it
all inline
with the mark-up for you. Both of these efforts lighten the
load on
the publisher while keeping the mark-up to remove
ambiguities.

-brian

-- 
brian suda
http://suda.co.uk
_______________________________________________
microformats-discuss mailing list
microformats-discussmicroformats.org
http://microformats.org/mailman/listinfo/microforma
ts-discuss

Re: Editor integration (was: NLP was Apple Data Detectors)
country flaguser name
United States
2008-02-08 19:04:50
Brian Suda wrote:
> The ideal solution would be for somesort of plugin in
the CMS so you
> can simply highlight areas and push a button and it
will add the
> microformatted information

Do you or anyone know of any microformats integration work
with TinyMCE 
or any insight into why it hasn't happened yet? Seems like
there has 
been some talk about this on this list back in 2006. 
http://microformats.org/discuss/
mail/microformats-discuss/2006-March/003298.html
TIA,
Guillaume
_______________________________________________
microformats-discuss mailing list
microformats-discussmicroformats.org
http://microformats.org/mailman/listinfo/microforma
ts-discuss

[1-2]

about | contact  Other archives ( Real Estate discussion Medical topics )