Published papers
GEOLSemantics develops its multilingual semantic extraction technology by participating to research and development projects in partnership at the national and European level.
SAIMSI project:
The SAIMSI project aims at realizing a system prototype which would accumulate the structured information about the actions of people suspected of illicit activities.
This information is automatically extracted from the internet sources and this in different languages (French, English, Arabic and Chinese (mandarin) in different media (text and speech) and from different type of sources (web pages, press releases, social networks…). Within the framework of the project, we limited ourselves to open sources.
The information extracted from the various languages is represented using the standards of the semantic Web (RDF) independently from the languages and compliant with an ontology of the safety elaborated within the framework of the project. English was chosen to represent concepts and relations.
The collected information is managed within two databases: one knowledge base containing the structured information from the different documents and a textual database searchable in interlingua and containing the source documents. During the display of text in the textual DB you may ask for structured information from the knowledge base on a quoted entity (person, location, company…). Conversely, for every information of the knowledge base you may recover all original documents in the textual DB.
ORELO Project:
ORELO aims at working out identification techniques of Arabic dialectal origin of a text written in Arabic or Latin characters or a quote. The dialects considered by the project are the main dialects of the Maghreb (Moroccan, Algerian, Tunisian) and the Egyptian.
The Maghreb dialects are not very much studied from the point of view of the processing by computer. The consideration of the Egyptian is going to allow comparisons with previous works which concern the Egyptian and the languages of Machrek. These preliminary works are essential so that Vocapia can spread its systems of automatic transcription of the standard Arabic word to the various dialects. It is also a prerequisite so that GEOLSemantics can reinforce its processing of extraction of strong standard knowledge in Arabic to the presence of dialectal words. The approach proposed by GEOLSemantics for the identification of the written dialects, based on the use of dictionaries of dialects, already supplies the necessary resources for continuation.
DRIRS Project:
Based on texts obtained from websites, newspapers, letters, tweets, and audiovisual sources, early detection of jihadist propaganda activities on social networks and their impact on people who may be easily influenced with justification of results. The ultimate goal is to capture initial dialogues before they continue with encrypted applications.
On other topics:
NLP Applied to Online Suicide Intention Detection, HealTAC 2020, April 2020