lapps.github.io

LAPPS Vocabulary - Current Issues

Last update: November 20 th), 2014.

There are now four spots where the syntax and semantics of the LAPPS data structures are defined: the source code (mainly the wrappers), the LAPPS Web Services Exchange Vocabulary, the LAPPS JSON schema and the LAPPS Interchange Format specifications. None of these are final and all of them have open issues. In addition, these four sources are not in sync. This page contains open issues for the LAPPS vocabulary: unclarities, ommisions, and inconsistencies relative to the LIF specifications (for those cases where the latter are ahead of the vocabulary).

Required versus optional attributes
The vocabulary does not make a distinction between required and optional attributes. We do not yet have a clear idea what attributes should be required. Take for example the Annotation element. We should probably require the id attribute. But we cannot require start and end since under Annotation we may have things like Dependency, for which character offsets are not meaningful.
Lists versus sets
When the vocabulary says "List of URIs", does this imply that the list is ordered? Shall we use List and Ordered List, or List and Set?
Adding vocabulary pages for coreference
See the [ Coreference in LIF](http://lapps.github.io/interchange/coref-v3.html) page. We need to add two elements.
CoreferenceStores all information on coreference. It has two features:
  • mentions. A list of identifiers. Each identifier points at an object of type Annoation or a subtype thereof.
  • representative. An identifier that points to the full form of the elements in the coreference chain, that is, one of the elements of the menrions list.
An alternative here is to not use the identifiers but the objects themselves. using the identifiers seems the better choice, it lines up better with the representation in LIF.
MarkableThis is needed in LIF in case we do not have annotation objects that the mentions list can point to. An object then needs to be created from the offsets alone and we called this a Markable object. It has an extra feature named targets that allows it to point to other annotation objects. It may also need features like ENTITY_MENTION_TYPE in order to store what is available in the output of typical coreference services.
**Note**. The Markable element may be used for other, non-coreference, purposes as well. How do we deal with this when it comes to features from a wide variety of views? One answer would be to stipulate that Markable elements are only used for creference and that similar elements for other purposes need other names.
Adding vocabulary pages for phrase structure
See the [ Phrase structure in LIF](http://lapps.github.io/interchange/phrase_structure-v1.html) page. We need to add two elements.
PhraseStructureThe container with all phrase structure information. Possibly an immediate subtype of Annotation. It has two features:
  • categorySet. A meta data feature containing a URI for a particular category set. If defined in the LAPPS vocabulary, this URI would be inside http://vocab.lappsgrid.org/ns/types.
    Question: should this be a list of URIs?
  • constituents. A set of annotation objects of type Constituent.
ConstituentThe list of constituents defines the tree structure of the parse tree. Each constituent has two features.
  • label. A category label, defined in the URI that is the value of PhraseStructure#categorySet.
  • children. An ordered list of identifiers. Each identifier points to an annotation object of type Constituent.
Adding vocabulary pages for dependency structure
See the [ Dependency structure in LIF](http://lapps.github.io/interchange/dependencies-v1.html) page. Two new elements:
DependencyStructureThe container with all phrase structure information. Possibly an immediate subtype of Annotation. It has two features:
  • dependencySet. A URI for a particular set of dependency labels. If defined in the LAPPS vocabulary, the URI would be inside http://vocab.lappsgrid.org/ns/types. This is a meta data feature.
  • type. The type of dependencies: basic-dependencies, collapsed-dependencies, etcetera. Given Steve Cassidy's insistence on having @type as well as type in his JSON-LD, and the possibility that we go along with this, maybe this feature should be called dependencyType.
  • dependencies. A set of annotation objects of type Dependency.
DependencyThe list of dependencies defines the dependency structure. Each dependency has three features:
  • label. A dependency label, defined in the URI that is the value of DependencyStructure#dependencySet.
  • governor. An identifier pointing at an object of type Annotation or a subtype thereof. Can be null for the root dependency.
  • dependent. An identifier pointing at an object of type Annotation or a subtype thereof.
The Date object
Dates have a dateType feature with values like date, datetime and time. With features like this, should we add a meta data feature like dateTypeSet, which has a URI containing type definitions? This question is relevant for other object types as well. Also, we should add a value feature to store the normalized value, as well as other Timex2 and Timex3 features.
Other Issues
Thinglayout of the headers is different from the other pages
DateURLs in sameAs and similarTo are not links
PersonURLs in sameAs and similarTo are not links
DocumentURLs in sameAs isocat reference is not a link. Also, Document is where the language property is defined. But in LIF we use @language as a key asscoiated with a text string. We should probably add a Text element.
SentencesentenceType is in red
LocationlocType is in red
TextDocumenthas a different table layout than other pages
AudioDocumenthas a different table layout than other pages
Tokenpos needs to be changed into posTag