Suzan Verberne wrote:
> I had a look at this version too, but I cannot find the
information I
> need in here either.
> It gives one example, but does not completely describe
the format.
> E.g.
> - what does Type="Lookup" mean?
> - what is the order of the annotation in the
AnnotationSet?
> - why can one node occur 3 times in the AnnotationSet?
(Type="Lookup",
> Type="Token",
Type="FirstPerson")
The format is quite simple when you break it down. First
there is the
GateDocumentFeatures section, which simply contains the
features
(possibly none) of the document as a whole as name/value
pairs. Next
the TextWithNodes section contains the document text,
interspersed with
<Node/> elements. Then there are one or more
AnnotationSet sections -
always at least the one with no name, which corresponds to
the default
annotation set, and possibly some named sets as well.
Within the AnnotationSets there are Annotation elements,
whose start and
end node features refer to the IDs of the <Node/>
elements in the
TextWithNodes. There's no significance to the order in
which the
Annotation elements appear, they are simply tied to the text
via the
Node IDs. If two annotations start or end at the same place
they will
use the same node ID.
Finally, an Annotation element can contain Feature elements
in the same
format as in the GateDocumentFeatures section, representing
the features
of the annotation.
> And what is the function of the concise
GateDocument.dtd?
If you wished, you could validate a GATE-format XML document
against it,
but GATE doesn't use the DTD internally, it's just there
for reference.
Ian
--
Ian Roberts | Department of Computer Science
i.roberts dcs.shef.ac.uk | University of Sheffield, UK
|