My research interests are in knowledge representation of language. For coming generations of information and knowledge systems, a closer understanding of the communication process between author and reader will be necessary: my aim is to improve information access by designing systems based on an informed but practical analysis of usage, context, situation, and domain and in formulating and implementing a flexible and scalable knowledge representation for what is essential for realistic volumes of linguistic data.
This includes understanding how language usage changes over time and over new modes of communication, such as new text genres or new modalities. I am currently (2021) very interested in understanding how new genres will emerge in the convergence of broadcasting, video clip sharing, and pod publication, how they can be related to previous media, and made easily and handily accessible.
Most of what I have written can be found in various repositories on the net. My ORCID.
As my first research effort I attempted to formulate an Algebra for Recommendations An Algebra for Recommendations (1990) . This is what now is known as Recommender systems and I probably should have continued along this path in spite of an initial reviewer number two setback Newsgroup Clustering Based On User Behavior — A Recommendation Algebra (1994) involving the ethical issues involved in clustering .newsrc files.
Stylostatistics and Studies of Genre
Since 1993 I have worked on computational stylistic analysis of text. Previous work on style and genre has been motivated from a primarily philologic standpoint, even if sometimes computationally oriented. The 1994 Coling publication Recognizing Text Genres with Simple Metrics Using Discriminant Analysis (1994) by myself and Douglass Cutting marked the first language technology attempt in this direction. I have held numerous talks, seminars, and international symposia on the topic covering both methodology, results, and applications. New Text (2006) Textual Stylistic Variation: Choices, Genres and Individuals (2008), Conventions and Mutual Expectations — understanding sources for web genres (2008) Currently I am working on extending this work to the emerging landscape of podcasts and other audio material.
Scalable, realistic, and useful semantic models
Since 1998 I have participated in work on scalable, behaviouristically, and neurophysiologically plausible computational models for processing large amounts of text efficiently and usefully to build semantic spaces based on distributional analysis of linguistic items using the random indexing processing model and memory model. From Words to Understanding (2001) Meaningful Models for Information Access Systems (2005) Filaments of Meaning in Word Space (2008) This work is continuing and I am currently interested in exploring the interface between geometric high-dimensional models on the one hand and graph and topological models on the other. Semantic Topology (2014) Parts of this work was what became the text analysis company Gavagai which I co-founded in 2008 and where I worked until 2019. We started by building a lexical learning model The Gavagai Living Lexicon (2016) which we used for sentiment analysis in media monitoring, Usefulness of Sentiment Analysis (2014) and for analysis of customer feedback and questionnaires. Analysis of Open Answers to Survey Questions through Interactive Clustering and Theme Extraction (2018) We even toyed with applying our model to non-human communication, but while the approach still seems reasonable we never managed to get the project properly afloat or in the air as it were. A proposal to use distributional models to analyse dolphin vocalisation (2017) I worked on an application of random indexing for construction grammar during my visit to Stanford in 2017-2018. High-dimensional distributed semantic spaces for utterances (2019)
Interacting with information
Since 1990, I have worked on various aspects of interaction with information systems in numerous projects. The Interaction of Discourse Modality and User Expectations in Human-Computer Dialog (1992) Inferring Complex Plans (1993) Transparent Natural Language Interaction through Multimodality (1993 Interaction Models, Reference, and Interactivity in Speech Interfaces to Virtual Environments (1995) Socially Intelligent Interfaces for Increased Energy Awareness in the Home (2008) A Glass Box Approach to Adaptive Hypermedia (1995)
Evaluating information systems
That led me to think more about models for evaluating quality of information retrieval. This latter has largely been channeled into my participation in the CLEF series of workshops and conferences where I usually go to talk with colleagues about shared tasks and innovative evaluation schemes. Especially interesting to me is how an intrinsic evaluation scheme could be built for learning models Evaluating Learning Language Representations (2015) and how evalution schemes could be moved from laboratories to operational settings. Adopting Systematic Evaluation Benchmarks in Operational Settings (2019) One of my favourite gripes is to make the distinction between benchmarking and validation clear, how to figure out if a solidly reliable component on a laboratory bench will be useful for practical application, and how a laboratory benchmark can be variously useful in real life and how it might impact the activities the model is used for. How Lexical Gold Standards Have Effects On The Usefulness Of Text Analysis Tools For Digital Scholarship (2019)
(Also, research needs to be exciting and fun.) From Boxes and Arrows to Conversation and Negotiation or how Research should be Amusing, Awful, and Artificial (2006)