**It was Friday, the 16th day of April 1943.**

In the middle of WWII, dr Hofmann, a German chemist, was experimenting with a synthesis of a new substance. It was dark outside, as another busy day of work passed by in his laboratory.

Flasks, dishes and tubes were neatly organised on the bench, with a bright light of the lamp illuminating the journal.

As the time went, the impeccable handwriting of a rigid scientist turned into clumsy, illegible scribbles, as Dr Hofmann couldn’t hold the pen still. The world began to turn around in a mad race, as his head was getting lighter and lighter.

Suddenly, a sharp thought slid through his mind, penetrating the brain like a shaft of an arrow:

“**I must secure my lab immediately**,” he said to himself, scraping the bottom of the barrel of his consciousness.

He closed the door shut and went home to lay down, as his mindly powers subsided. He sank in, going deeper and deeper into a pleasant intoxication. In a dreamlike state, he saw fantastic pictures, extraordinary shapes, with intense, kaleidoscopic play of colours.

“Fantastic images surged in on me, alternating variegated, opening and closing themselves in circles and spirals, exploding in coloured fountains … Every sound generated a vividly changing image, with its own consistent for and colour,”

**– he then noted in his journal.**

This entry is a very vibrant description of the consequences of an LSD use. Dr Hofmann was the first to synthesise the substance in a lab, and to this day, the *Lysergic Acid Diethylamid *is considered the most hallucinogenic compound known to man.

Because of its peculiar attributes, many suggested it should be used for the treatment of anxiety, alcohol and tobacco dependence, treatment-resistant depression and many other psychiatric diseases.

• Mean

• Median

• Range

• Standard Deviation

• Prevalence

• Incidence

## PROBLEM 1 – LSD and Psychiatric wellbeing

*Adapted from Gasser et al. 2014*

This study wanted to compare people who take LSD to people who will be given a placebo (dummy pill, not working). The authors wanted to measure the psychiatric outcome, using a special test called the STAI scale.

*NB in epidemiological terms: the LSD (Experimental Dose) was the EXPOSURE and the STAI scale was the OUTCOME.*

Let’s have a look at the results: at the **baseline** (at the beginning of the experiment) Gasser and colleagues had eight participants (**n=8**) for experimental dose and three for placebo dose (**n=3**).

What do you think about these numbers?

## EBM 2.1
Quiz déscriber. |

You should always be cautious when considering experiments on a small sample. It does not, however, mean that you should discard it: sometimes a small experiment is the only source of any data on the subject. It can give you some indication of the effect or give reasons to design a larger study.

**Now, consider the two values: mean and SD (standard deviation)**

**MEAN –** also known as “average”. It’s the sum of all values divided by the number of values in that set. It is a very basic, yet quite useful statistical method to compile many observations into one understandable number.

**STANDARD DEVIATION –** is the measure of SPREAD of the data: it is based on how values are different from the mean. So, if the SD is low, the values are quite close to the mean. If it’s large, it means that they are farther apart.

Consider the figure below. As you can see, although the average (mean) is 100 for both red and blue samples, the red sample had a very low SD=10 (the graph is quite narrow) and in the blue sample, the results are more spread on the graph (it’s wider and flatter), SD=50

As you can see, it is not difficult at all. I know you may not yet be practised in working with numbers, but stick to it and you’ll see this as your second nature in no time.

We’ve now learned how to use two statistical tools in our armoury: **MEAN** and **SD**. Let’s add a couple more:

**MEDIAN** – it’s the middle value of a set, *when results are arranged in order.*

**How to work out the median of a set of values?**

Let’s see how it works in an example set of *ordered* numbers: [3,3,3,5,6,6,7]. The trick here is to cross out the most distant numbers from both sides.

- The first and the last number of an ordered set are 3 and 7. We can eliminate them now.
- Now, the set looks like this: [3,3,5,6,6]. Let’s cross the next ones out: 3 and 6. [3,5,6,].
- We do that again, excluding 3 and 6: [5].
- The last one standing is 5, so the median of that set is 5.

The advantage of a median over an average is that the median is more resistant to the OUTLIERS (e.g. very small or very large numbers in the set).

**When could median be better than average?**

Let’s go back to our example set: [3,3,3,5,6,6,7]. The median of that is 5, and the average is 4.71. We then collect two more observations: 22 and 1. Now the set looks like this: [1,3,3,3,5,6,6,7,22]. The median will still be 5 – it resisted adding very extreme numbers. However, the average now is 6.22, which is higher than previously.

Here is another example: assume you have five (n=5) people earning £20,000 a year in a company. Stats are easy here: the average is £20k and the median is £20k. After long consideration, the director decides to increase the average pay to boost his stats. So, he gives a raise to his secretary (for reasons beyond the scope of this investigation –* learn this phrase, it’s extremely useful)*.

Now she’s earning £100k a year, and the rest stays the same.

The average pay would therefore be £36,000 a year, a whopping £16k increase!

**But did it really change the situation of an average worker? No.**

The median, however, did not change. It’s still £20k.

For the median to change, the director would have to make more changes in salary levels.

I hope this example gives you an idea of where a median could be useful.

**Another measures of variability**

We’ve learned about one measure of variability: the STANDARD DEVIATION. Now, let’s consider anther one: **RANGE**.

RANGE – is simply the difference between the largest and smallest values. It shows you the overall spread of different values in a set. So in this set: [3,3,3,5,6,6,7], the RANGE would be MIN-MAX = 3-7 (or 4)

Now you’re equipped with good tools to analyse more data!

**PLEASE NOTE**: I appreciate you’ll have normally been taught about the mathematical formulae to work out these values. I decided not to dive into that, as I do not think this knowledge is required at this point. If you want, please look the formulae up and play with them. It’s good to be aware of them, although Excel now can easily work these out for you.

If you’re interested in Excel statistics, you can download this example sheet here and play with the numbers to see how stats will change

## PROBLEM 2 – Illegal drug abuse and self-esteem

Another study measured the mean of Self-Esteem Score Based on Illegal Drug Abuse. Let’s start from defining the EXPOSURE (E) and OUTCOME (O):

## EBM 2.2

Quiz déscriber.

Well done! You can read the data now. Again, if you made a couple mistakes, don’t worry: you’ll be doing lots of other examples later on.

**INCIDENCE AND PREVALENCE**

INCIDENCE is how many NEW cases of the disease are recorded

PREVALENCE is how many cases of the disease are recorded IN TOTAL

You may see this reported in the news, like this:

Both values are often presented per 100,000 people, to make more sense – you can then compare the values between different groups of people (e.g. between the UK with 64M people and Iceland 323k population)

**What influences the incidence and prevalence values?**

Incidence will largely depend on risk factors, trends in population and (very often!) diagnostic criteria.

Prevalence will largely depend on the SURVIVAL RATES of the disease. If the prognosis of a disease is very poor, then the overall number of people living with that disease will be low (people will die very soon after diagnosis). However, if the disease has a very high survival rate, the prevalence can still be high, even though the incidence is low (few new cases diagnosed)

**Congratulations, you have reached the end of another lesson.**

If you feel very confident about the terms introduced in this part, you are on your way to be a proficient data user. If, however, you think it’s a bit too complicated, please do not feel discouraged. Some of the concepts will make more sense in real life examples considered in the next part.

PART 3 – MANAGING THE UNCERTAINTY

*IS HYPNOTHERAPY EFFECTIVE AND WHY CHARCOT WAS WRONG ABOUT HYSTERIA?*

I never would’ve thought I can learn the stats so easily. Makes perfect sense now

Thanks a lot for the comment, I hope to put more lessons up soon!

I really liked that one especially the beginning, really interesting! Where is that from?

Adapted (and slightly fictionalised, I admit) from “One Man’s Medicine” book by Archie Cochrane, it really is worth the read, on of the best books (both medial and autobiographies) I’ve ever read. My version was Max Blythe’s edition, you can hardly get it on the market, but Cardiff University is always happy to send you the copy of their modern edition, if you ask!

Good explanation, but it’s very basic. Are there people who really dont know that?

It’s called “Basic” for a reason! Thanks for the comment anyway!

I’m still confused about incidence and prevalence, which one is most important?

There’s no “more important” measurement. If you want to measure how many people actually have the disease, the prevalence would be most useful. If you want to measure how many NEW people have been diagnosed with the disease then incidence may be more useful.

If a disease is characterised by a very short lifespan after diagnosis, you can have a massive incidence rate, relative to prevalence as the amount of LIVING people who HAVE BEEN diagnosed will decrease rapidly.

On the flipside, if you invented an ultimate prevention of a disease, you’ll have a 0 incidence rate, but the prevalence rate will still be high, as there will be people who have already been diagnosed, living with the disease.

It simply depends on what you want to measure.

Does it make sense?

Govet departments often play with the range and average to mislead the public about the figures, you should never trust anything that is written and said without checking it first or asking for the full fugures.

Always #AskForEvidence!

The story from the beginning is very wise. This is what happens if you take drugs. They are devils force trying to enslave you. The course is very intersting I just can’t wait for the next part

Well, I don’t know about the devils, Ahmed, but side effects of various drugs can indeed be nasty!

Yes! This is why one must always read labels to get away from devils forces!!!

On that graph, is that the same sample? How can it be the SD is equal in both cases?

Hi Deb, what I was trying to show with that diagram is that the average can be the same for two distinct samples, with very distinct SDs. Basically, it shows you that the SD measures the variability inside the population (you can see that on the graph: red is quite narrow and blue is very wide, yet they both have the same average).

This is also showing that you may need more than one measurement to see a bigger picture of the population you’re studying.

Thanks for your input anyway! Hope you enjoy the course!!!

Very interesting story indeed… I like how these lessons are introduced, it makes it a pleasure to read, just like a good novel

Thanks Adam, I like these stories, too! Happy to hear your thoughts – keep them coming about the other parts of the course, too. Your help is much appreciated.

And here we go again seriously do you think that just because your an oxford graduate you can school us about what to use and what not and that hypnotherapy is bad? My mum had a hypnotherapy last year she was diagnosed with cancer and it helped her a lot you cant just say you dont like hypnotherapy because it doesnt suit your scientific needs and your feeling of being better than us

yes