In the context of bridging the gap between academic literature and businesses, challenges must be taken and ambition is our driving force. We try to replicate and improve what is currently in the literature making it usable in business use-case scenarios where efficiency and quality have to be maximized. In our data science laboratory, we create the building blocks that power up noura.ai platform.
Named Entity Recognition
By exhaustively extracting morphological Arabic word features, we have trained a fast sequential model that can detect and recognizes named entities in text such as persons, organizations, and locations. The model is scalable to other types of entities such as product categories, business industries, and so on.
Identifying mentioned entities in unstructured text and classifying them into their relevant categories is one of the most valuable information extraction techniques for businesses. Notable applications are efficient document search, document prioritization, or employee assignment based on products, or locations mentioned in complaints, letters, messages, or emails. Detecting entities can power up content-based websites as well by extracting entity-based metadata from content.
We have trained a deep learning transformer architecture on +100,000 Arabic samples with different dialects and achieved a state-of-the-art prediction of textual sentiment perceived by readers as positive, negative, or neutral with an accuracy of 94%.
Opinion mining has reached its peak with the introduction of tools that facilitates sharing ideas and thoughts with the public. Although subjectivity of opinions affects how factual information is, sentiment analysis plays a huge role in studying a targeted group's reaction towards a certain entity or event. To mention a few applications of sentiment analysis: Discovering a public event's reaction, improving the customer satisfaction process, and studying a certain brand's or an entity's reputation.
By evaluating multiple techniques in the literature, we have developed a sentence ranking algorithm that uses the most up-to-date text representation techniques to summarize documents with a massive amount of sentences into a form that can be read in seconds.
In text understanding and knowledge representation, text summarization is one of the techniques that boost time-effectiveness for readers' day-to-day tasks. Summarizing a massive amount of social, political, or business content into a relatively short number of sentences provides a concrete and fast overview of any targeted topic stakeholders want to know more about.
Using the latest semantic contextualized embeddings, coupled with a rule-based morphological keyword or key-phrase candidate election, we have developed an unsupervised technique that can extract the most informative keywords and key-phrases in a document.
Pinpointing a text document to its most informative keywords creates another document shape where a reader can tell that, for example, a 1000-word document talks about a certain subject only by reading 10 words. The model can help in automating real-time analytics for massive datasets and can be applied to get a consumer-oriented brand or entity descriptive keywords, awareness about what your audience talks about, or knowing a low-level reaction towards introducing a new product or policy.