## Relative Age Effect

## Does your child have a better chance of getting into the top ability set for Mathematics if they were born nearer the beginning of the academic year?

5 years ago I read Outliers by Malcolm Gladwell. The blurb states that “if we want to understand how some people thrive, we should spend more time looking *around* them – at such things as their family, their birthplace, or even their birth date.” He argues how the Relative Age Effect gives advantage to those born nearer the beginning of the year (academic year or sports season). Since reading the book I have wondered whether this phenomenon is in effect in the ability setting of Mathematics pupils in their lower secondary years, i.e. at the age of 12. And now we have the data to be able to answer this question…

### Relative Age Bias

In any random sample of the population, you can expect to find a roughly uniform distribution of birth months.

But when you look at the birth month distribution of members of, for example, professional sports teams, you get something quite different:

Evidently, people born nearer the beginning of the year, or season in this case, are represented far more greatly in these professional sports teams.

### What’s going on?

At my school, in the first week in September, all of the new Year 7 boys play in mandatory football trials. Those with good skills are immediately recognised and picked for an initial A Team, B Team and C Team. And then they begin coaching in earnest. And therein the damage is done.

What has happened here? Simply, we took the better players in Year 7 and immediately widened the gap between them and the rest. The A Team will go off within a couple of weeks and play A Teams from other schools and get lots of hardcore training. The B team might get the same level of training, but their matches won’t be as challenging. The C Team will get some training but probably from a lesser coach, and will play fewer matches because a lot of other schools don’t even have a C Team. And then there’s everyone else, who just have a casual kick about in P.E. lessons until December, when the focus shifts to sports other than football. The A and B Teams will be largely fixed now for the next 5 years, with occasional tweaks due to injury or replacements if team members can’t keep up.

But who replaces them? Games captains find it notoriously difficult to replace Team players for the simple reason that only the Team players have had any decent coaching. Children develop at different times, and whilst it’s possible that someone who didn’t make the Team in the September trials will suddenly emerge as a great player, and be lucky enough to get spotted by a coach, this doesn’t happen often, again simply because they are already many coaching sessions and matches behind the others.

But who are those children lucky enough to get into the team in those initial September trials? On average, it’s the oldest ones who are most likely. Children are growing quickly at the age of 11. Six months or a year can make a huge physical difference at that age – several centimetres in height and breadth, and several kilograms in mass, not to mention mental development. This probability imbalance manifests in Relative Age Bias, or the ‘Relative Age Effect’.

## Looking for the Relative Age Effect in an Academic Setting

At a particular UK school, which we will call “Z School”, pupils are set according to their *pace of learning* and the *amount of individual support they require in the classroom* to learn effectively. Pupils are taught in mixed ability classes in Year 7 (age 11-12), then placed at the beginning of Year 8 in one of 4 linear ability sets of decreasing size. Set 1 is the ‘top set’ – the largest and highest ability; Set 4 the smallest and lowest ability.

The academic year ‘cut-off’ date is September 1st. So a pupil born on 31st August 2002 would currently be in Year 9, but a pupil born on 1st September 2002 would currently be in Year 8 – with an entirely different set of friends, teachers, and future, and perhaps chances of success in life. We will label September 1st ‘day 0’, September 2nd ‘day 1’ etc throughout the academic year, with August 31st being ‘day 364’ or ‘day 365’ on a leap year.

We are looking for significant correlation between a pupil’s birth day and the set into which he or she is placed at the beginning of Year 8 at “Z School”.

### The Data

Setting data for each Year 8 cohort from 2004-05 to the present year was analysed. There was data available going back to 2000-01 but before 2004-05 a different setting structure was used so it wasn’t feasible to include those data. For the same reason, boys in 2013-14 and 2006-07, and girls in 2004-05 were excluded because a different setting structure was in place due to their unusual cohorts.

From archived setting spreadsheets, each pupil’s name, their beginning-of-year-8 maths set and, where available, their UPN – Unique Pupil Number (a database index) – were loaded into memory, then present and archived records used to try to match them up to extract their date of birth, with UPN taking precedence where available and a surname-firstname match used otherwise. 1114 pupils were matched in total, giving 1114 pairs of values: the first being the number of days into the academic year that pupil was born, and the second being their Maths set at the beginning of Year 8.

### Exploratory Data Analysis

There is evidently some weak positive correlation:

To check we’re not in Simpson’s Paradox territory, we must check this correlation exists when ignoring the gender variable. It does:

### Hypothesis Test (5% one-tailed)

Due to the large number of tied ranks in the ‘set’ variable, Spearman’s Rank Correlation Coefficient ρ is the statistical test tool of choice.

[The n = 1114 pupils considered in this test are considered to be a sample of the wider population of pupils placed into ability sets for Mathematics in Year 8 at “Z School” (i.e. including those in future cohorts, assuming no changes are made to the setting structure or policy at the school). ρ is Spearman’s rank correlation coefficient for this population.]

Null Hypothesis H_{0}: ρ = 0. There is no correlation between the number of days into the academic year that a pupil at “Z School” is born and their Maths set at the beginning of Year 8.

Alternative Hypothesis H_{1}: ρ > 0. There is positive correlation between the number of days into the academic year that a pupil at “Z School” is born and their Maths set at the beginning of Year 8.

### Results, Simulation and Significance

Spearman’s correlation coefficient for these data was found to be 0.0535, thus quantifying the weak positive correlation. But is it significant?

Because of the ‘large’ n and large number of tied ranks, it was not possible to calculate an *exact* p-value to determine if this constituted significant correlation. Therefore to test for significance, a simulation was carried out wherein 100000 similarly structured datasets (with the correct totals of pupils in each set) were randomly generated assuming a uniform distribution of birth dates across the year. When these 100000 datasets were analysed, just 3.709% of them had their correlation coefficient greater than (or equal to) 0.0535. It is therefore concluded that the weak positive correlation in our real dataset *is significant* at the 5% level; there is evidence that there is positive correlation between the number of days into the academic year that a pupil at “Z School” is born and their Maths set at the beginning of Year 8.

### Gender as a factor

The correlation coefficient for Males only was found to be 0.0461, and for Females only it was found to be 0.0597. Interestingly, neither one of these results is significant at the 5% level. The explanation for this is that there simply wasn’t enough data when considering only one gender.

### Conclusion

The Relative Age Effect is observed at “Z School”: **A pupil with a birthday nearer the beginning of the academic year** **is more likely to get into a higher Maths set **in Year 8 at “Z School” than a pupil with a birthday nearer the end of the academic year.

You can access the sanitised data and the R script used in this project at this GitHub repo.