Tags

, , , , , , , , , , , , , ,

I believe that it is possible to predict people’s interests by looking at their written language. This can be done by identifying a few psychological traits that shine through in personal relaxed texts such as Tweets or blogs. To accomplish this, we need large amounts of data on each individual and large amounts of individuals in a database. Here is an outline showing how I believe it can be done and tested.

Firstly, a general model of psychological traits is needed. The human mind is complex, but there are patterns on a fundamental level. There are a gazillion ways to express personality, yet most of us realize the paradox that personality, with the words of Leary in 1957, must be considered somewhat unified yet expressed in a variety of ways.
A useful personality model must therefore be very abstract and yet allow for a myriad of different expressions in a coherent way on the more practical levels.
In my view, such a meta-model exists in the form of Ken Wilber´s Integral Theory. There are only four fundamental ways to approach the world cognitively (see quadrants) according to this model and that makes a good starting point.  In general, people are unaware that they are inclined to choose one quadrant more often than the others when they express something about their experience of the world. Whether they report on a visit to the super market or write about political philosophy, some aspects or phenomena are more dominant than others in a systematic way. Why and how this works is a question that I leave for later inquiry. The important thing is that it is possible to count the number of words and expressions related to each quadrant using very simple data linguistics techniques.

Memes and personal values are especially interesting from the perspective of predicting interest and understanding how ideas spread. If I value knowledge over belonging I am likely to choose certain types of books, hobbies and I will find one line of work more  rewarding than the other etc. (update: This post today about the company Hunch is based on the same idea, but the results come from people answering questionnaires instead.) The most comprehensive and “low-level” model of values I have come across so far, during my years studying different psychological models, is The Emergent Cyclical Levels of Existence Theory developed by Clare W Graves and later popularized as Spiral Dynamics by Don Beck and Chris Cowan. The theory is related to and have strong resemblance with Maslow´s hierarchy of needs. Clare W Graves believed that an individual will recognize different words faster than others depending on what level he or she operates on. The underlying idea is that certain words are typical for certain levels where they become more meaningful or invoke more interest. I have found that it is possible to expand Graves’ basic word list and add additional words and expressions according to these levels of existence or value memes. This makes it possible to segment individuals psychologically with improved precision by looking at their use of language. Thus, I believe, it is possible to lay the foundation for the practice of predicting people’s interests using a model on how ideas spread between minds.

In today´s world, Internet provides an abundance of data about people. Among those data, my main interest is aimed at personal communication and I want to study this under the assumption that language is a product of the mind and therefore has to bare traces of the mind that created it. Mind and personality are closely related so I have spent a great deal of time studying personality type theories. Timothy Leary, one of the early personality researchers that I studied, actually predicted our modern times and the opportunities that the Internet now have brought to us. Leary was concerned with the more narrow field of interpersonal behaviour, but I believe the same problems and solutions are applicable on the general study of the human mind. He and his team of researchers also had the ambition to collect and classify the behavioural and psychological cues that personal language provides. However, at the time when he was active the world had not yet seen the technology and the abundance of data that we are blessed with today:

How can we measure these written, oral and physical expressions in such a way as to provide comparative conceptual material? It is possible, but rarely feasible, to capture these events by sound and movie equipment. Even then we must decide what to do with these unwieldy materials when we get them.

/Timothy Leary in Interpersonal Diagnosis of Personality first published in 1957

Regarding the collection and analysis of written texts, these technologies are now widely and often freely available. However, being a non-technician, I have not yet found a collection of tools that suits my needs. I look for a system that allows me to collect data, code the collected data for analysis, perform the analysis and visualise it in an easily understood way, preferably in a motion chart like Gapminder. Without funding, I still have to rely on help from voluntary hands that take an interest in exploring the human mind online. I am pretty sure that funding will be available, either in the form of commercial investments in applied psychographics or as grants or donations when I have had the opportunity to test the theory from beginning to end.
To begin with, I believe that the following steps will be necessary to perform this experiment:
1. Collect a large amount of standardized personal texts in a database. Tweets or blog posts are probably the easiest for this purpose.

2. Segment the texts psychographically based on the word lists I have created.

3a. – interest prediction
Search for statistical correlation between the occurrence of mentions of named entities, such as people, brands, locations, books, movies etc and the psychographic segments.

3b. – a theory of influence
Visualize how the individuals link to each other and media sources. This would make it possible to search for psychographic clusters in further detail and analyze how ideas spread between people and media.

At the moment, the core building blocks for a robust and scalable system that allows all of the above is being built slowly, oh so slowly, on the precious spare time of an awesome technical architect (that prefers to be anonymous at this time) with the aim of laying a good foundation for future development. If you find this project interesting send me a mail or follow the project via me on Twitter or via the RSS-feed of this blog.

(Thank you Johan Lundgren for helping me with the English language!)