GEOLSemantics develops its multilingual semantic extraction technology by participating to research and development projects in partnership at the national and European level.
The SAIMSI project aims at realizing a system prototype which would accumulate the structured information about the actions of people suspected of illicit activities.
This information is automatically extracted from the internet sources and this in different languages (French, English, Arabic and Chinese (mandarin) in different media (text and speech) and from different type of sources (web pages, press releases, social networks…). Within the framework of the project, we limited ourselves to open sources.
The information extracted from the various languages is represented using the standards of the semantic Web (RDF) independently from the languages and compliant with an ontology of the safety elaborated within the framework of the project. English was chosen to represent concepts and relations.
The collected information is managed within two databases: one knowledge base containing the structured information from the different documents and a textual database searchable in interlingua and containing the source documents. During the display of text in the textual DB you may ask for structured information from the knowledge base on a quoted entity (person, location, company…). Conversely, for every information of the knowledge base you may recover all original documents in the textual DB.
Using Arabic Transliteration to Improve Word Alignment from French – Arabic Parallel Corpora, colloque The Fourth Workshop on Computational Approaches to Arabic Script-based Languages, San Diego, novembre 2012
ORELO aims at working out identification techniques of Arabic dialectal origin of a text written in Arabic or Latin characters or a quote. The dialects considered by the project are the main dialects of the Maghreb (Moroccan, Algerian, Tunisian) and the Egyptian.
The Maghreb dialects are not very much studied from the point of view of the processing by computer. The consideration of the Egyptian is going to allow comparisons with previous works which concern the Egyptian and the languages of Machrek. These preliminary works are essential so that Vocapia can spread its systems of automatic transcription of the standard Arabic word to the various dialects. It is also a prerequisite so that GEOLSemantics can reinforce its processing of extraction of strong standard knowledge in Arabic to the presence of dialectal words. The approach proposed by GEOLSemantics for the identification of the written dialects, based on the use of dictionaries of dialects, already supplies the necessary resources for continuation.
Based on texts obtained from websites, newspapers, letters, tweets, and audiovisual sources, early detection of jihadist propaganda activities on social networks and their impact on people who may be easily influenced with justification of results. The ultimate goal is to capture initial dialogues before they continue with encrypted applications.
Approche Hybride pour la translitération de l’arabizi algérien : une étude préliminaire. Conference: 25e conférence sur le Traitement Automatique des Langues Naturelles (TALN), Rennes, France, Mai 2018
On other topics:
Gestion de l’incertitude dans le cadre d’une extraction des connaissances à partir de texte, 12ème atelier sur la Fouille de Données Complexes (FDC) Extraction et Gestion des Connaissances (EGC 2015), Luxembourg, janvier 2015