EcoLexicon Semantic Sketch Grammar

How to cite
León Araúz, Pilar, Antonio San Martín and Pamela Faber. 2016. Pattern-based Word Sketches for the Extraction of Semantic Relations. In Proceedings of the 5th International Workshop on Computational Terminology (Computerm2016), 73-82. Osaka, Japan: COLING 2016.

Introduction

One of the most common approaches for the efficient extraction of information from a corpus is to search for knowledge-rich contexts (KRCs). A KRC is “a context indicating at least one item of domain knowledge that could be useful for conceptual analysis” (Meyer 2001). In order to find KRCs in corpora, knowledge patterns (KPs) are used, which are the linguistic and paralinguistic patterns that convey a specific semantic relation (Meyer 2001).

KPs have been successfully applied in many terminology-related projects that have led to the creation of knowledge extraction tools, such as Caméléon (Aussenac-Gilles and Jacques 2008) and TerminoWeb (Barrière and Agbago 2006). However, to the best of our knowledge, currently there are no user-friendly publicly available applications allowing terminologists to find KRCs in their own corpora with ready-made KPs. For this reason, terminologists still tend to rely on manual work to extract all the semantic information that they need for the description of specialized concepts.

In order to fill this void, we have created the EcoLexicon Semantic Sketch Grammar (ESSG), a KP-based sketch grammar in the well-known corpus query system, Sketch Engine (Kilgarriff et al. 2004). This allows users to generate new word sketches that could be exploited by any terminologist, lexicographer or translator interested in the extraction of semantic relations.

Word sketches are automatic corpus-derived summaries of a word’s grammatical and collocational behavior (Kilgarriff et al. 2004). Rather than looking at an arbitrary window of text around the headword – as occurs in previous corpus tools – Sketch Engine is able to look for each grammatical relation that the word participates in (Kilgarriff et al. 2004). The default word sketches provided by Sketch Engine represent different relations, such as verb-object, modifiers or prepositional phrases.

The ESSG is currently available in English and French. The English version (ESSG-en) includes generic-specific, partitive, cause, function and location relations. The French version (ESSG-fr) only includes generic-specific relations for the moment.

How to use the ESSG-en with the EcoLexicon English Corpus

The ESSG comes by default with the EcoLexicon English Corpus in Sketch Engine. It is also available on Sketch Engine’s Open Corpora, which does not require registration. To learn how to use word sketches in SketchEngine, visit this page.

How to use the ESSG with your own existing corpus (Sketch Engine account needed)

  1. Select the desired corpus and click on the "Manage corpus" button, located at the top of the Sketch Engine Home Page.
  2. Click on the "Compile" button.
  3. Under "Expert settings", find the option "Sketch grammar". Click on the plus sign button to add a grammar.
  4. Specify a name for the grammar. Paste the contents of the ESSG in the "Content" field.
  5. Click on "Save and compile".

How to use the ESSG with a new corpus (Sketch Engine account needed)

  1. Click on the "New corpus" button at the top right of the Sketch Engine Home Page.
  2. Specify a name, type and language for the corpus. Click on "Next".
  3. Build the corpus using documents from the web or a local file. Click on "Next".
  4. Open the "Expert settings" drop-down menu to locate the "Sketch grammar" option. Click on the plus sign button to add a grammar.
  5. Specify a name for the grammar. Paste the contents of the ESSG in the "Content" field.
  6. Click on "Save and compile" to complete the creation of the corpus.