lingventa

OUR
PROJECTS

THE DEVELOPMENT OF THE TECHNOLOGY AIMING TO REDUCE FALSE-POSITIVE ERRORS DETECTED BY TEXT-CORRECTION ALGORITHMS ALLOWING AN IMPLEMENTATION OF A PLATFORM FOR AUTOMATIC CONTEXTUAL CORRECTION AND AUTOMATIC PARTIAL EDITING OF DOCUMENTS EMPLOYING MACHINE-LEARNING METHODS AND LANGUAGE MODELS

Project description

Lingventa (a limited liability company) implements the project: The Development of the Technology Aiming to Reduce False-Positive Errors Detected by Text-Correction Algorithms Allowing an Implementation of a Platform for Automatic Contextual Correction and Automatic Partial Editing of Documents Employing Machine-Learning Methods and Language Models.

The aim of the project is to develop a worldwide innovative technology aiming to reduce false-positive errors detected by text-correction algorithms in Polish texts. The beneficiaries of the product will be publishing houses, portals, offices, law firms and other companies processing a large number of documents (e.g. correspondence with clients) and individual users (students, journalists, researchers, translators or bloggers). The technology will allow us to make a new product designed for automatic contextual correction of any texts in Polish.

Intermediary institution:
National Centre for Research and Development, www.ncbir.gov.pl.

As part of the Programme 1.1.:
R&D projects of enterprises, Sub-measure 1.1.1. Industrial research and development works carried out by enterprises of the Smart Growth Operational Programme 2014-2020.

Priority axis:
Support for conducting R&D works by enterprises.

Project implementation period:
from 2021-03-01 to 2023-12-31

Data:
Contract No.: POIR.01.01.01-00-1790/20

Total value:
4 188 970,59 zł

EU contribution:
3 221 823,53 zł

  • SYNAMET

    Microcorpus of synesthetic metaphors. Formalization of description and development of effective methods of metaphor analysis in discourse
    expand

    Project description

    The first objective of the project was to investigate how various sensory impressions (smell, taste, hearing) are described in texts via figurative language. Subsequently, a corpus of synesthetic metaphors was created. Finally, it was checked whether the synesthesia model proposed by S. Ullmann is also applicable in Polish.
    The Polish Corpus of Synesthesia Metaphors SYNAMET is the first Polish corpus of metaphors and the first corpus of Synesthetic Metaphors in the world. It is a new tool that can be used in linguistic, literary or cultural studies.

    Lingventa was responsible for preparing the software for building a corpus of synesthetic
    metaphors:

    • the application for extracting texts from the Internet,
    • the application designed for annotating texts,
    • the application designed to present the corpus with search engines on the website.

    Website:
    synamet.uw.edu.pl

    The corpus is available on the website:
    synamet.polon.uw.edu.pl

    Financed by:
    National Science Centre (project no. 2014/15/B/HS2/00182)

    Implementation period:
    November 2015 – October 2019

    Contractor:
    Institute of Polish Language, University of Warsaw

    Project manager:
    Magdalena Zawisławska, dr hab.

  • KORBA

    Electronic corpus of Polish texts from the seventeenth and eighteenth centuries (until 1772)
    expand

    Project description

    The aim of the project was to create a corpus of old Polish texts. The body has 13,453,367 segments. It consists of 718 text files (each text file contains the full text written in the years 1601–1772 or an extensive fragment of such text). It is available at: www.korba.edu.pl.

    The Corpus expands the National Corpus of the Polish Language (NKJP) to include old texts, and thus allows students to learn about the evolution of the mother tongue. It is a new research tool useful in analyses in the field of linguistics, literary studies, cultural studies, history, and sociology.

    Lingventa’s task was to morphosyntactically annotate 551 samples, super-annotate 470 samples of
    texts of the Baroque corpus, and to compile a frequency list of lexemes occurring in the corpus and
    comparative indexes with the SXVII content index.

    Website:
    korba.edu.pl

    Project financed by:
    Ministry of Science and Higher Education (project no. 0036/NPRH2/H11/81/2012)

    Investigators:
    Laboratory of the History of the Polish Language of the 17th and 18th centuries of the Institute of Computer Science of the Polish Academy of Sciences and the Linguistic Engineering Group of the Institute of Computer Science of the Polish Academy of Sciences

    Task implementation period:
    January – March 2018

    Project manager:
    Prof. Włodzimierz Gruszczyński

lingventa

Contact us!

    Lingventa sp. z o.o.
    ul. Rodziny Połanieckich 29/67

    01-924 Warszawa
    NIP 118-20-81-854

    REGON 145918004

    KRS 0000404830