Published papers

GEOLSemantics develops its multilingual semantic extraction technology by participating to research and development projects in partnership at the national and European level.

SAIMSI project:

The SAIMSI project aims at realizing a system prototype which would accumulate the structured information about the actions of people suspected of illicit activities.

This information is automatically extracted from the internet sources and this in different languages (French, English, Arabic and Chinese (mandarin) in different media (text and speech) and from different type of sources (web pages, press releases, social networks…). Within the framework of the project, we limited ourselves to open sources.

The information extracted from the various languages is represented using the standards of the semantic Web (RDF) independently from the languages and compliant with an ontology of the safety elaborated within the framework of the project. English was chosen to represent concepts and relations.

The collected information is managed within two databases: one knowledge base containing the structured information from the different documents and a textual database searchable in interlingua and containing the source documents. During the display of text in the textual DB you may ask for structured information from the knowledge base on a quoted entity (person, location, company…). Conversely, for every information of the knowledge base you may recover all original documents in the textual DB.

Une approche linguistique pour l’extraction des connaissances dans un texte arabe, colloque TALN-RÉCITAL, Les Sables d’Olonne, juin 2013

Une approche mixte morpho-syntaxique et statistique pour la reconnaissance d’entités nommées en langue chinoise, colloque TALN-RÉCITAL, Les Sables d’Olonne, juin 2013

SAIMSI, Suivi Adaptatif Interlingue et MultiSources des Informations, colloque WISG2013, Troyes, janvier 2013

Using Arabic Transliteration to Improve Word Alignment from French – Arabic Parallel Corpora, colloque The Fourth Workshop on Computational Approaches to Arabic Script-based Languages, San Diego, novembre 2012

Extraction of information on activities of persons suspected of illegal activities from web open sources, colloque Language Resources for Public Security Applications, Istanbul, mai 2012

Transcription of Arabic Names into Latin, colloque Sciences of Electronics, Technologies of Information and Telecommunications, Sousse, mars 2012

Extraction system for Personal Attributes Extraction of CLP2014, The Third CIPS-SIGHAN Joint Conference on Chinese Language Processing, Wuhan, Chine, octobre 2014

ORELO Project:

ORELO aims at working out identification techniques of Arabic dialectal origin of a text written in Arabic or Latin characters or a quote. The dialects considered by the project are the main dialects of the Maghreb (Moroccan, Algerian, Tunisian) and the Egyptian.

The Maghreb dialects are not very much studied from the point of view of the processing by computer. The consideration of the Egyptian is going to allow comparisons with previous works which concern the Egyptian and the languages of Machrek. These preliminary works are essential so that Vocapia can spread its systems of automatic transcription of the standard Arabic word to the various dialects. It is also a prerequisite so that GEOLSemantics can reinforce its processing of extraction of strong standard knowledge in Arabic to the presence of dialectal words. The approach proposed by GEOLSemantics for the identification of the written dialects, based on the use of dictionaries of dialects, already supplies the necessary resources for continuation.

CODA : A conventional orthography for Algerian Arabic, colloque Arabic Natural Language Processing Workshop, Pékin, juillet 2015

La reconnaissance automatique des dialectes arabes à l’écrit, Colloque international traduction et champs connexes, quelle place pour la langue arabe aujourd’hui?, Alger, décembre 2013

Une approche linguistique pour la détection des dialectes arabes. Actes de TALN 2017, Orléans, France, 2017

DRIRS Project:

Based on texts obtained from websites, newspapers, letters, tweets, and audiovisual sources, early detection of jihadist propaganda activities on social networks and their impact on people who may be easily influenced with justification of results. The ultimate goal is to capture initial dialogues before they continue with encrypted applications.

Une approche fondée sur les lexiques d’analyse de sentiments du dialecte algérien. Revue (TAL), Volume 58 Numéro 3 Traitement Automatique des Langues, Octobre 2018

Approche Hybride pour la translitération de l’arabizi algérien : une étude préliminaire. Conference: 25e conférence sur le Traitement Automatique des Langues Naturelles (TALN), Rennes, France, Mai 2018

Automatic Identification of Maghreb Dialects Using a Dictionary-Based Approach. Eleventh International Conference on Language Resources and Evaluation (LREC 2018), Miyazaki, Japan, Mai 2018

On other topics:

Synthèse de concepts formels par réécriture à partir d’une ontologie client, 13èmeConférence Francophone sur l’Extraction et la Gestion des Connaissances (EGC 2013), Toulouse, janvier 2013

Gestion de l’incertitude dans le cadre d’une extraction des connaissances à partir de texte, 12ème atelier sur la Fouille de Données Complexes (FDC) Extraction et Gestion des Connaissances (EGC 2015), Luxembourg, janvier 2015

RDF Knowledge Graph Visualization From a Knowledge Extraction System, Summarizing and Presenting Entities and Ontologies (SumPre), ESWC 2015 workshop, Portoroz, Slovénia , May 2015

Uncertainty Evaluation in Textual Document, 11th International Workshop on Uncertainty Reasoning for the Semantic Web (URSW), ISWC 2015 workshop, Bethlehem, Pennsylvania , October 2015