lapps.github.io

LAPPS Vocabulary - Current Issues

Last update: November 20 ^th), 2014.

There are now four spots where the syntax and semantics of the LAPPS data structures are defined: the source code (mainly the wrappers), the LAPPS Web Services Exchange Vocabulary, the LAPPS JSON schema and the LAPPS Interchange Format specifications. None of these are final and all of them have open issues. In addition, these four sources are not in sync. This page contains open issues for the LAPPS vocabulary: unclarities, ommisions, and inconsistencies relative to the LIF specifications (for those cases where the latter are ahead of the vocabulary).

Required versus optional attributes

The vocabulary does not make a distinction between required and optional attributes. We do not yet have a clear idea what attributes should be required. Take for example the Annotation element. We should probably require the id attribute. But we cannot require start and end since under Annotation we may have things like Dependency, for which character offsets are not meaningful.

Lists versus sets

When the vocabulary says "List of URIs", does this imply that the list is ordered? Shall we use List and Ordered List, or List and Set?

Adding vocabulary pages for coreference

See the [ Coreference in LIF](http://lapps.github.io/interchange/coref-v3.html) page. We need to add two elements.

Coreference

Stores all information on coreference. It has two features:

mentions. A list of identifiers. Each identifier points at an object of type Annoation or a subtype thereof.
representative. An identifier that points to the full form of the elements in the coreference chain, that is, one of the elements of the menrions list.

An alternative here is to not use the identifiers but the objects themselves. using the identifiers seems the better choice, it lines up better with the representation in LIF.

Markable

This is needed in LIF in case we do not have annotation objects that the mentions list can point to. An object then needs to be created from the offsets alone and we called this a Markable object. It has an extra feature named targets that allows it to point to other annotation objects. It may also need features like ENTITY_MENTION_TYPE in order to store what is available in the output of typical coreference services.

**Note**. The Markable element may be used for other, non-coreference, purposes as well. How do we deal with this when it comes to features from a wide variety of views? One answer would be to stipulate that Markable elements are only used for creference and that similar elements for other purposes need other names.

Adding vocabulary pages for phrase structure

See the [ Phrase structure in LIF](http://lapps.github.io/interchange/phrase_structure-v1.html) page. We need to add two elements.

PhraseStructure

The container with all phrase structure information. Possibly an immediate subtype of Annotation. It has two features:

categorySet. A meta data feature containing a URI for a particular category set. If defined in the LAPPS vocabulary, this URI would be inside http://vocab.lappsgrid.org/ns/types.
Question: should this be a list of URIs?
constituents. A set of annotation objects of type Constituent.

Constituent

The list of constituents defines the tree structure of the parse tree. Each constituent has two features.

label. A category label, defined in the URI that is the value of PhraseStructure#categorySet.
children. An ordered list of identifiers. Each identifier points to an annotation object of type Constituent.

Adding vocabulary pages for dependency structure

See the [ Dependency structure in LIF](http://lapps.github.io/interchange/dependencies-v1.html) page. Two new elements:

DependencyStructure

The container with all phrase structure information. Possibly an immediate subtype of Annotation. It has two features:

dependencySet. A URI for a particular set of dependency labels. If defined in the LAPPS vocabulary, the URI would be inside http://vocab.lappsgrid.org/ns/types. This is a meta data feature.
type. The type of dependencies: basic-dependencies, collapsed-dependencies, etcetera. Given Steve Cassidy's insistence on having @type as well as type in his JSON-LD, and the possibility that we go along with this, maybe this feature should be called dependencyType.
dependencies. A set of annotation objects of type Dependency.

Dependency

The list of dependencies defines the dependency structure. Each dependency has three features:

label. A dependency label, defined in the URI that is the value of DependencyStructure#dependencySet.
governor. An identifier pointing at an object of type Annotation or a subtype thereof. Can be null for the root dependency.
dependent. An identifier pointing at an object of type Annotation or a subtype thereof.

The Date object

Dates have a dateType feature with values like date, datetime and time. With features like this, should we add a meta data feature like dateTypeSet, which has a URI containing type definitions? This question is relevant for other object types as well. Also, we should add a value feature to store the normalized value, as well as other Timex2 and Timex3 features.

Other Issues

Thing	layout of the headers is different from the other pages
Date	URLs in sameAs and similarTo are not links
Person	URLs in sameAs and similarTo are not links
Document	URLs in sameAs isocat reference is not a link. Also, Document is where the language property is defined. But in LIF we use @language as a key asscoiated with a text string. We should probably add a Text element.
Sentence	sentenceType is in red
Location	locType is in red
TextDocument	has a different table layout than other pages
AudioDocument	has a different table layout than other pages
Token	pos needs to be changed into posTag

The Language Applications Grid

An open framework for interoperable NLP web services

lapps.github.io

LAPPS Vocabulary - Current Issues