The results
We've now given 94 people the tea taste test and can announce some provisional results.
On average, it seems that people could not tell the difference between tea made with the milk first and tea made with the tea first. 42 people out of our sample got it right, and 52 got it wrong. This is not different from what we would expect if people were just guessing!
Interestingly, people who believed that they would be able to tell the difference were no better than those who believed they would not be able to or were unsure. Both groups performed at chance levels.
Sometimes in psychology, people are able to make decisions with information they do not know they have. Was this the case in our taste test? Did people with strong preferences prefer their traditional cuppa even through they didn't believe they could consciously tell the difference?
No.
People's stated preferences were not reflected in the cup of tea that they said they preferred during the taste test.
Overall, our findings suggest that not only can most people not tell the difference, but even those who have strong beliefs or preferences are not able to tell the difference. This is very interesting because it suggests that people have developed beliefs about how they like tea which are out of step with reality and that these beliefs persist because people systemmatically fail to test them.
The methods we've used in the tea taste test can be used in other domains to reveal similar biases and unfounded beliefs..
Background information
The results were analysed using SPSS - a program that we use to run statistical tests on our experimental data. In our case we were looking for a non-effect; that is we thought that participants would not be able to tell the difference between a cup of tea made with the milk first and a cup of tea made with the tea first. A 'non-significant result' would indicate that this was the case.
In psychology, we refer to a result as significant when the probability of obtaining that result by chance is less than .05 or 5%. If the probability (p) is greater than .05 then the statistic is said to be 'non-significant'. That is, the results observed are likely to have occurred due to chance and, more importantly, there is unlikely to be a relationship between the variables of interest.
This principle of 95% confidence is the foundation of most modern statistics, especially in psychological research. However, there is little justification for it being set as such, suffice to say that Fisher (the original tea taste tester) decided upon it. He considered it the smallest probability where by one could be confidant that the variables of interest were related in some way because the likelihood of a relationship occurring by chance alone was so small. Even so, psychologists are careful not to make what is termed a ‘Type 1’ error: finding a relationship when in fact none exists. This is why replication and peer reviewing are such important components of psychological research.
Which test did we use?
The statistical test we used to analyse the data is called a chi square test. It was chosen because it allows for the comparison of categorical variables. But what is a categorical variable? Categorical variables are items or groups to which one can belong to one or another, but not both. Eye colour is a good example of this.
There is another type of variable which is known as continuous. These are items that can be measured on a scale and so have an infinite number of possible values. Some good examples of this are height and weight.
As mentioned previously, the chi square test examines the relationship between two categorical variables. It calculates the expected number of people for each condition, were the results to occur by chance, and compares this to what was actually observed. If the observed statistics are significantly different from the expected ones then the experimental question is supported and a relationship proposed.
Why did we use a forced-choice paradigm?
Making people make a forced-choice decision, even though we thought there would be no actual difference between the tea made with the milk first and the tea made with the tea first, was an important component of our experimental design. This was because we wanted a strong test of people's explicit and implicit abilities.
Were we not to use a forced-choice paradigm then they may be some who did guess the correct answer but because of their lack of confidence in that answer would have answered to being unsure. By including a forced-choice we could fully investigate whether people were truly able to tell the difference between the two methods of tea making, even if this was an implicit or explicit skill. Explicit skills are skills or knowledge which one is consciously aware of having, where as implicit skills are skills or knowledge which one possesses but is not consciously aware of having.
1. Can people actually tell the difference?
Whilst the graph may suggest that more people were wrong than right, there is actually no significant difference between the two groups [X2(1, N=95) = .93, p = .334].
In other words, people were as likely to correctly identify the cup of tea made with the milk first as they were to incorrectly choose the cup of tea made with the tea first, meaning that correct responses occurred on a purely chance basis. This suggests that people were unable to tell the difference between the two cups of tea and responded at a rate we would expect if they were simply guessing!
2. Is there a gender difference?
Often psychologists are interested in whether one gender out performs the other on particular tasks or skills. Whilst the graph below may appear to suggest that females made more incorrect responses than males, this mainly reflects the larger number of female participants we had in our sample.
In fact, there was no effect of gender on ability to identify the cup of tea made with the milk first [X2 (1, N=95) = .97, p= .325]. The non-significant p-value suggests that neither gender was better or worse than the other, indicating that tea-tasting ability may not be gender specific.
3. Does belief affect ability?
One question we were particularly interested in was whether those who were more sure of the superiority of their tea tasting abilities would be better at identifying the cup of tea made with the milk first.
As demonstrated by the graph below, belief had no affect on discrimination [X2 (2, N=95) = .228, p= .892]. Those who considered themselves able to tell the difference between the two cups of tea still performed at chance level, which was similar to the performance of those who thought they wouldn't be able to tell the difference and those who were unsure.
4. Do previous preferences matter?
Another point of interest was whether people’s previously stated preferences actually translated into the same preference choice in our experimental situation. Were this to be the case then it would suggest that the way the two cups of tea were made mattered.
However, as demonstrated by the graph below there appears to be little relationship between people’s previous preferences and the cup of tea they chose as tasting better as part of the experiment [X2 (2, N=95) = .63, p = .728]. In line with our previous findings, this suggests that people were unable to tell the difference between the cup of tea made with the milk first and the cup of tea made with the tea first. This is important because it implies that people have developed beliefs that are not in line with reality and that these persist because they fail to systematically test them
5. Did participants make a decision based on other factors?
As we thought that, on average, people would be unable to taste the difference between the two cups of tea there were a number variables we controlled for in order to ensure that they did not affect the results observed.
The main biases we could construe as problematic were that people would implicitly prefer the cup they drank from first, because the first sip is always the most refreshing, or that they would exhibit a left/right bias, as has been shown in other studies.
In light of this we used random number lists to ensure that the cup of tea made with the milk first was placed on the left hand side for 50% of the participants and on the right hand side the other 50% of the participants. We also instructed each individual which cup to drink from first such that 50% of our sample drank from the left hand cup first and the other 50% drank from the right hand cup first. This is also known as counterbalancing.
As demonstrated by the graph above, there was no relationship between the first cup that participants drank from and the subsequent preference they then stated [X2 (1, N = 95) = .55, p = .457]. This suggests that participants did not make their subsequent decision based upon which cup they drank from first.
The questions
Q1) If someone made you a cup of tea, how would you prefer it to be made?
Milk first
Tea first
No preference
Q2) How strongly do you feel about this?
Not strongly at all
It matters a bit
It matters a lot
Q3) Do you think you would be able to tell the difference between tea made with the milk first and tea made with the milk second?
Yes
No
Don’t know
Q4) Which tea did you prefer?
Left
Right
Q5) Which tea do you think was made with milk added first?
Left
Right