GEOLSemantics offer is:
- Multisource: the information to be analyzed can be stemming from highly varied types of contents and can result from diversified sources such as Web, Word documents, pdf, transcriptions of speeches, e-mails, etc…
- Multilingual: the level of treatments in each of the supported languages is similar
- Interlanguage: the technologies of GEOLSemantics allow to present, in the user’s language, the results of the analyses of documents written in various languages
The basic offer of GEOLSemantics is made by two modules namely:
GEOL Linguistic Analyzer
The objective of this module is to identify and to normalize the relevant information. This goal is achieved by proceeding to a morphosyntactic analysis followed by a deep syntactic analysis. These analyses allow in particular to raise the ambiguities, to assimilate synonyms, etc. In particular, GEOL Linguistic Analyzer makes a very fine location of the named Entities such as the names of people, organizations or companies, places, quantities, distances, values, dates. It is also possible to extract named Entities appropriate to certain domains, for example in the medical, the legal. Using the power and the precision of the linguistic analyzers, the recognition of the named entities made by GEOLSemantics produces a level of unequalled result: so as an example, in the sentence “Washington worries about aims of Beijing” Washington and Beijing are “organizations” (The governments) and are neither places nor people.
GEOL Knowledge is an extractor of knowledge:
• to raise the ambiguities not resolved by GEOL Linguistic Analyzer and in particular by the recognition of the role of the named Entities. So, as an example, in the sentence “Washington worries”, Washington can be only a person or an organization as far as a place does not worry;
• to structure the relevant information, the results of GEOL Linguistic Analyzer. This structuring articulates the information by taking into account roles, relations between information, time etc… so as to answer the questions: WHO? WHEN? WHAT? WHERE? HOW? HOW MUCH?
The basic offer is enriched by two complementary modules, worth knowing:
GEOL Transliterator: it’s objective is to return the various spelling or phonetic forms to a normalized form: GEOL Transliterator is made of 2 engines:
- the first one is based on the variability of foreign origin names for which the character set is different and for whom the Latinization is not homogeneous (Russian, Arabic and Chinese). He allows to gather information concerning a person whose we know a single spelling by looking for several spellings.
- the second refers to the phonetic variability of the proper nouns of which we have never seen the spelling. The tool proposes variants which concern the phonetics according to several European languages, French, English, German and Spanish.
GEOL Terminology Extractor: It’s objective is to produce terminological lexicons of a corpus of texts such as legal, technical, economic texts, etc. The Extractor of terminology consists of 2 main engines:
- the first one concerns the customers who have never made a list of themes, but who possess many textual data. The extractor establishes an authority control list of themes from automatic extractions checked manually by a specialist.
- the second tool concerns the customers who have already established an authority control list. This extraction by learning is made on a new work, but based on the past work of information officers having put keywords to describe the contents of every document.