My research interests are in knowledge representation of language. For coming generations of information and knowledge systems, a closer understanding of the communication process between author and reader will be necessary: my aim is to improve information access by designing systems based on an informed but practical analysis of usage, context, situation, and domain and in formulating and implementing a flexible and scalable knowledge representation for what is essential for realistic volumes of linguistic data.
This includes understanding how language usage changes over time and over new modes of communication, such as new text genres or new modalities. I am currently (2021) very interested in understanding how new genres will emerge in the convergence of broadcasting, video clip sharing, and pod publication, how they can be related to previous media, and made easily and handily accessible.
Most of what I have written can be found in various repositories on the net. My ORCID.
As my first research effort I attempted to formulate an Algebra for Recommendations An Algebra for Recommendations (1990) . This is what now is known as Recommender systems and I probably should have continued along this path in spite of an initial reviewer number two setback Newsgroup Clustering Based On User Behavior - A Recommendation Algebra (1994) involving the ethical issues involved in clustering .newsrc files.
Stylostatistics and Studies of Genre
Since 1993 I have worked on computational stylistic analysis of text. Previous work on style and genre has been motivated from a primarily philologic standpoint, even if sometimes computationally oriented. The 1994 Coling publication Recognizing Text Genres with Simple Metrics Using Discriminant Analysis (1994) by myself and Douglass Cutting marked the first language technology attempt in this direction. I have held numerous talks, seminars, and international symposia on the topic covering both methodology, results, and applications. New Text (2006) Textual Stylistic Variation: Choices, Genres and Individuals (2008), Conventions and Mutual Expectations - understanding sources for web genres (2008) The Relation Between Author Mood and Affect to Sentiment in Text and Text Genre (2011)
Currently I am working on extending this work to the emerging landscape of podcasts and other audio material.
Scalable, realistic, and useful semantic models
Since 1998 I have participated in work on scalable, behaviouristically, and neurophysiologically plausible computational models for processing large amounts of text efficiently and usefully to build semantic spaces based on distributional analysis of linguistic items using the random indexing processing model and memory model. From Words to Understanding (2001) Meaningful Models for Information Access Systems (2005) Filaments of Meaning in Word Space (2008) This work is continuing and I am currently interested in exploring the interface between geometric high-dimensional models on the one hand and graph and topological models on the other. Semantic Topology (2014) Parts of this work was what became the text analysis company Gavagai which I co-founded in 2008 and where I worked until 2019. We started by building a lexical learning model The Gavagai Living Lexicon (2016) which we used for sentiment analysis in media monitoring, Usefulness of Sentiment Analysis (2014) and for analysis of customer feedback and questionnaires. Analysis of Open Answers to Survey Questions through Interactive Clustering and Theme Extraction (2018)
We even toyed with applying our model to non-human communication, but while the approach still seems reasonable we never managed to get the project properly afloat or in the air as it were. A proposal to use distributional models to analyse dolphin vocalisation (2017)
In 2017-2018 I visited Stanford where I worked on an application of random indexing for construction grammar. High-dimensional distributed semantic spaces for utterances (2019)
Interacting with information
At SICS I worked on various aspects of interaction with information systems in numerous projects. These papers range from understanding how to interact with virtual worlds, Interaction Models, Reference, and Interactivity in Speech Interfaces to Virtual Environments (1995) to how to model and manage user expectations in human-computer dialogue. The Interaction of Discourse Modality and User Expectations in Human-Computer Dialog (1992) Inferring Complex Plans (1993) Transparent Natural Language Interaction through Multimodality (1993) A Glass Box Approach to Adaptive Hypermedia (1995) A very interesting application was to work with increasing the understanding of home owners on their energy usage. Socially Intelligent Interfaces for Increased Energy Awareness in the Home (2008)
Evaluating information systems
That led me to think more about models for evaluating quality of information retrieval. This latter has largely been channeled into my participation in the CLEF series of workshops and conferences where I usually go to talk with colleagues about shared tasks and innovative evaluation schemes. Especially interesting to me is how an intrinsic evaluation scheme could be built for learning models Evaluating Learning Language Representations (2015) and how evalution schemes could be moved from laboratories to operational settings. Adopting Systematic Evaluation Benchmarks in Operational Settings (2019)
One of my favourite gripes is to make the distinction between benchmarking and validation clear, how to figure out if a solidly reliable component on a laboratory bench will be useful for practical application, and how a laboratory benchmark can be variously useful in real life and how it might impact the activities the model is used for. How Lexical Gold Standards Have Effects On The Usefulness Of Text Analysis Tools For Digital Scholarship (2019)
(Also, research needs to be exciting and fun.) From Boxes and Arrows to Conversation and Negotiation or how Research should be Amusing, Awful, and Artificial (2006)