Jussi Karlgren is a Swedish computational linguist, research scientist at Spotify, and co-founder of text analytics company Gavagai AB. [1] He holds a PhD in computational linguistics from Stockholm University, [2] and the title of docent (adjoint professor) of language technology at Helsinki University. [3]
Jussi Karlgren is known for having pioneered the application of computational linguistics to stylometry, [4] for having first formulated the notion of a recommender system, [5] [6] [7] and for his continued work in bringing non-topical features of text to the attention of the information access research field.
Karlgren's research is focused on questions relating to information access, genre and stylistics, distributional pragmatics, and evaluation of information access applications and distributional models.
Karlgren is of half Finnish descent and is fluent in Finnish. [8]
Corpus linguistics is the study of a language as that language is expressed in its text corpus, its body of "real world" text. Corpus linguistics proposes that a reliable analysis of a language is more feasible with corpora collected in the field—the natural context ("realia") of that language—with minimal experimental interference.
Klas Bernhard Johannes Karlgren was a Swedish sinologist and linguist who pioneered the study of Chinese historical phonology using modern comparative methods. In the early 20th century, Karlgren conducted large surveys of the varieties of Chinese and studied historical information on rhyming in ancient Chinese poetry, then used them to create the first ever complete reconstructions of what are now called Middle Chinese and Old Chinese.
Computational semiotics is an interdisciplinary field that applies, conducts, and draws on research in logic, mathematics, the theory and practice of computation, formal and natural language studies, the cognitive sciences generally, and semiotics proper. The term encompasses both the application of semiotics to computer hardware and software design and, conversely, the use of computation for performing semiotic analysis. The former focuses on what semiotics can bring to computation; the latter on what computation can bring to semiotics.
A recommender system, or a recommendation system, is a subclass of information filtering system that provide suggestions for items that are most pertinent to a particular user. Typically, the suggestions refer to various decision-making processes, such as what product to purchase, what music to listen to, or what online news to read. Recommender systems are particularly useful when an individual needs to choose an item from a potentially overwhelming number of items that a service may offer.
A podcast is a program made available in digital format for download over the Internet. For example, an episodic series of digital audio or video files that a user can download to a personal device to listen to at a time of their choosing. Streaming applications and podcasting services provide a convenient and integrated way to manage a personal consumption queue across many podcast sources and playback devices. There are also podcast search engines, which help users find and share podcast episodes.
A bigram or digram is a sequence of two adjacent elements from a string of tokens, which are typically letters, syllables, or words. A bigram is an n-gram for n=2. The frequency distribution of every bigram in a string is commonly used for simple statistical analysis of text in many applications, including in computational linguistics, cryptography, speech recognition, and so on.
Stylometry is the application of the study of linguistic style, usually to written language. It has also been applied successfully to music and to fine-art paintings as well. Another conceptualization defines it as the linguistic discipline that evaluates an author's style through the application of statistical analysis to a body of their work.
Fred Göran Karlsson is a professor emeritus of general linguistics at the University of Helsinki.
Sentiment analysis is the use of natural language processing, text analysis, computational linguistics, and biometrics to systematically identify, extract, quantify, and study affective states and subjective information. Sentiment analysis is widely applied to voice of the customer materials such as reviews and survey responses, online and social media, and healthcare materials for applications that range from marketing to customer service to clinical medicine. With the rise of deep language models, such as RoBERTa, also more difficult data domains can be analyzed, e.g., news texts where authors typically express their opinion/sentiment less explicitly.
In linguistics, statistical semantics applies the methods of statistics to the problem of determining the meaning of words or phrases, ideally through unsupervised learning, to a degree of precision at least sufficient for the purpose of information retrieval.
Distributional semantics is a research area that develops and studies theories and methods for quantifying and categorizing semantic similarities between linguistic items based on their distributional properties in large samples of language data. The basic idea of distributional semantics can be summed up in the so-called distributional hypothesis: linguistic items with similar distributions have similar meanings.
Nils Göran David Malmqvist was a Swedish linguist, literary historian, sinologist and translator. He was also a member of the Swedish Academy between 1985 and 2019.
Douglass Read Cutting is a software designer, advocate, and creator of open-source search technology. He founded two technology projects, Lucene, and Nutch, with Mike Cafarella. Both projects are now managed through the Apache Software Foundation. Cutting and Cafarella are also the co-founders of Apache Hadoop.
Linguistic categories include
Computational humor is a branch of computational linguistics and artificial intelligence which uses computers in humor research. It is a relatively new area, with the first dedicated conference organized in 1996.
The National Archives of Finland is a Finnish government agency under the Ministry of Education and Culture. It is responsible for archiving official documents of the Finnish state and municipalities. It consists of three locations in the capital Helsinki and seven former regional archives, which were incorporated into the National Archives in 2017 and have since been its branches.
The International Committee on Computational Linguistics (ICCL) was founded by Dr. David Hays of the RAND Corporation in 1965 to promote the biennial International Conference on Computational Linguistics, which since the third conference in Stockholm is known by the acronym COLING after the Swedish fictional character Kolingen by Albert Engström. The current President of ICCL is Professor Jun-Ichi Tsujii of the AIRC and membership of the committee is permanent.
In natural language processing (NLP), a word embedding is a representation of a word. The embedding is used in text analysis. Typically the representation is a real-valued vector that encodes the meaning of the word in such a way that words that are closer in the vector space are expected to be similar in meaning. Word embeddings can be obtained using language modeling and feature learning techniques where words or phrases from the vocabulary are mapped to vectors of real numbers.
Mans Hulden is a researcher in computational linguistics currently holding the title of Assistant Professor at the Department of Linguistics of the University of Colorado Boulder. He teaches courses in computational linguistics, phonology, and phonetic and He is the creator and maintainer of the free and open source finite-state toolkit Foma.
The Scandinavian Logic Society, abbreviated as SLS, is a not-for-profit organization with objective to organize, promote, and support logic-related events and other activities of relevance for the development of logic-related research and education in the Nordic Region of Europe.
Jussi Karlgren: "meitä oli aika monta, jotka – niin kuin minäkin – että yksi vanhemmista oli suomalainen" (c. 6:19–6:25 in the radio interview)