Jussi Karlgren

Last updated
Jussi Karlgren.jpg

Jussi Karlgren is a Swedish computational linguist, research scientist at Spotify, and co-founder of text analytics company Gavagai AB. [1] He holds a PhD in computational linguistics from Stockholm University, [2] and the title of docent (adjoint professor) of language technology at Helsinki University. [3]

Jussi Karlgren is known for having pioneered the application of computational linguistics to stylometry, [4] for having first formulated the notion of a recommender system, [5] [6] [7] and for his continued work in bringing non-topical features of text to the attention of the information access research field.

Karlgren's research is focused on questions relating to information access, genre and stylistics, distributional pragmatics, and evaluation of information access applications and distributional models.

Karlgren is of half Finnish descent and is fluent in Finnish. [8]

Related Research Articles

<span class="mw-page-title-main">Bernhard Karlgren</span> Swedish sinologist and linguist (1889–1978)

Klas Bernhard Johannes Karlgren was a Swedish sinologist and linguist who pioneered the study of Chinese historical phonology using modern comparative methods. In the early 20th century, Karlgren conducted large surveys of the varieties of Chinese and studied historical information on rhyming in ancient Chinese poetry, then used them to create the first ever complete reconstructions of what are now called Middle Chinese and Old Chinese.

Computational semiotics is an interdisciplinary field that applies, conducts, and draws on research in logic, mathematics, the theory and practice of computation, formal and natural language studies, the cognitive sciences generally, and semiotics proper. The term encompasses both the application of semiotics to computer hardware and software design and, conversely, the use of computation for performing semiotic analysis. The former focuses on what semiotics can bring to computation; the latter on what computation can bring to semiotics.

A recommender system (RecSys), or a recommendation system (sometimes replacing system with terms such as platform, engine, or algorithm), is a subclass of information filtering system that provides suggestions for items that are most pertinent to a particular user. Recommender systems are particularly useful when an individual needs to choose an item from a potentially overwhelming number of items that a service may offer.

A paraphrase or rephrase is the rendering of the same text in different words without losing the meaning of the text itself. More often than not, a paraphrased text can convey its meaning better than the original words. In other words, it is a copy of the text in meaning, but which is different from the original. For example, when someone tells a story they heard, in their own words, they paraphrase, with the meaning being the same. The term itself is derived via Latin paraphrasis, from Ancient Greek παράφρασις (paráphrasis) 'additional manner of expression'. The act of paraphrasing is also called paraphrasis.

<span class="mw-page-title-main">Podcast</span> Type of audio digital media

A podcast is a program made available in digital format for download over the Internet. Typically, a podcast is an episodic series of digital audio files that users can download to a personal device or stream to listen to at a time of their choosing. Podcasts are primarily an audio medium, but some distribute in video, either as their primary content or as a supplement to audio; popularised in recent years by video platform YouTube.

Stylometry is the application of the study of linguistic style, usually to written language. It has also been applied successfully to music, paintings, and chess.

<span class="mw-page-title-main">Fred Karlsson</span> Linguistics professor at the University of Helsinki (born 1946)

Fred Göran Karlsson is a professor emeritus of general linguistics at the University of Helsinki.

<span class="mw-page-title-main">Kimmo Koskenniemi</span>

Kimmo Matti Koskenniemi is the inventor of finite-state two-level models for computational phonology and morphology. He was a professor of Computational Linguistics at the University of Helsinki, Finland. In the early 1980s Koskenniemi's work became accessible by early adopters such as Lauri Karttunen, Ronald M. Kaplan and Martin Kay, first at the University of Texas Austin, later at the Xerox Palo Alto Research Center.

Sentiment analysis is the use of natural language processing, text analysis, computational linguistics, and biometrics to systematically identify, extract, quantify, and study affective states and subjective information. Sentiment analysis is widely applied to voice of the customer materials such as reviews and survey responses, online and social media, and healthcare materials for applications that range from marketing to customer service to clinical medicine. With the rise of deep language models, such as RoBERTa, also more difficult data domains can be analyzed, e.g., news texts where authors typically express their opinion/sentiment less explicitly.

In linguistics, statistical semantics applies the methods of statistics to the problem of determining the meaning of words or phrases, ideally through unsupervised learning, to a degree of precision at least sufficient for the purpose of information retrieval.

<span class="mw-page-title-main">Distributional semantics</span> Field of linguistics

Distributional semantics is a research area that develops and studies theories and methods for quantifying and categorizing semantic similarities between linguistic items based on their distributional properties in large samples of language data. The basic idea of distributional semantics can be summed up in the so-called distributional hypothesis: linguistic items with similar distributions have similar meanings.

<span class="mw-page-title-main">Doug Cutting</span> American information theorist

Douglass Read Cutting is a software designer, advocate for, and creator of open-source search technology. He founded two technology projects, Lucene and Nutch, with Mike Cafarella. The Apache Software Foundation now manages both projects. Cutting and Cafarella were also co-founders of Apache Hadoop.

Linguistic categories include

<span class="mw-page-title-main">National Archives of Finland</span>

The National Archives of Finland is a Finnish government agency under the Ministry of Education and Culture. It is responsible for archiving official documents of the Finnish state and municipalities. It consists of three locations in the capital Helsinki and seven former regional archives, which were incorporated into the National Archives in 2017 and have since been its branches.

The International Committee on Computational Linguistics (ICCL) was founded by Dr. David Hays of the RAND Corporation in 1965 to promote the biennial International Conference on Computational Linguistics, which since the third conference in Stockholm is known by the acronym COLING after the Swedish fictional character Kolingen by Albert Engström. The current President of ICCL is Professor Jun-Ichi Tsujii of the AIRC and membership of the committee is permanent.

In natural language processing, a word embedding is a representation of a word. The embedding is used in text analysis. Typically, the representation is a real-valued vector that encodes the meaning of the word in such a way that the words that are closer in the vector space are expected to be similar in meaning. Word embeddings can be obtained using language modeling and feature learning techniques, where words or phrases from the vocabulary are mapped to vectors of real numbers.

Mans Hulden is a researcher and associate professor in computational linguistics. Before moving to the New College of Florida in 2024, he taught courses in computational linguistics, phonetics, and phonology at the University of Colorado Boulder. He is the creator and maintainer of the free and open source finite-state toolkit Foma.

In natural language processing, linguistics, and neighboring fields, Linguistic Linked Open Data (LLOD) describes a method and an interdisciplinary community concerned with creating, sharing, and (re-)using language resources in accordance with Linked Data principles. The Linguistic Linked Open Data Cloud was conceived and is being maintained by the Open Linguistics Working Group (OWLG) of the Open Knowledge Foundation, but has been a point of focal activity for several W3C community groups, research projects, and infrastructure efforts since then.

<span class="mw-page-title-main">Arvi Hurskainen</span>

Arvi Johannes Hurskainen is a Finnish scholar of language technology and linguistics. Since 1985, he has developed rule-based language technology mainly for Swahili, but also for other languages, including machine translation from English to Finnish. He has created a development environment called SALAMA, but it suits to any language. The major applications developed so far include the following: the spell checker for Swahili, the annotator of corpus texts, an advanced dictionary between Swahili and English and translators from Swahili to English, from English to Swahili, and from English to Finnish. He has also developed an advanced learning system for Swahili and a system for producing targeted vocabularies for language learners. Hurskainen has compiled two annotated corpora, Helsinki Corpus of Swahili 1.0 and Helsinki Corpus of Swahili 2.0.

The Scandinavian Logic Society, abbreviated as SLS, is a not-for-profit organization with objective to organize, promote, and support logic-related events and other activities of relevance for the development of logic-related research and education in the Nordic Region of Europe.

References

  1. "About Us". Gavagai. Retrieved 6 February 2022.
  2. "Institutionen för lingvistik: avhandlingar". Stockholms universitet. Retrieved 6 February 2022.
  3. "Language Technology". Helsinki University. Retrieved 6 February 2022.
  4. Karlgren, Jussi; Cutting, Douglass (1994). "Recognizing Text Genres with Simple Metrics Using Discriminant Analysis". Proceedings of the International Conference on Computational Linguistics. 2: 1071. arXiv: cmp-lg/9410008 . Bibcode:1994cmp.lg...10008K. doi:10.3115/991250.991324. S2CID   1297432.
  5. Karlgren, Jussi. 1990. "An Algebra for Recommendations." Syslab Working Paper 179 (1990).
  6. Karlgren, Jussi. "Newsgroup Clustering Based On User Behavior-A Recommendation Algebra Archived 2021-02-27 at the Wayback Machine ." SICS Research Report (1994).
  7. Karlgren, Jussi (October 2017). "A digital bookshelf: original work on recommender systems" . Retrieved 27 October 2017.
  8. "Ihan varmasti on vaikutusta". sverigesradio.se (in Finnish). 23 October 2013. Retrieved 7 March 2018. Jussi Karlgren: "meitä oli aika monta, jotka – niin kuin minäkin – että yksi vanhemmista oli suomalainen" (c. 6:19–6:25 in the radio interview)