Why is the Median resistant, but the Mean is not?

Statistics is an essential part of understanding any data, and it includes various measures such as mean, median, and mode. These measures help us to summarize and analyze data in a meaningful way. However, the most commonly used measures of central tendency are mean and median. While both of these measures help to determine the center of the data, they behave differently when outliers are present in the data set. In this article, we will explore why the median is resistant to outliers, but the mean is not.

Firstly, let's understand what mean and median are. Mean is the average value of a data set, which is calculated by adding up all the values and dividing it by the number of observations. It is a useful measure of central tendency when the data set has no outliers or is symmetrically distributed. On the other hand, the median is the middle value of a data set when arranged in order. If the data set has an odd number of observations, then the median is the middle value. If the data set has an even number of observations, then the median is the average of the two middle values.

Now let's consider an example to understand how mean and median behave differently in the presence of outliers. Suppose we have a data set that represents the salaries of ten employees in a company, as shown below:

$50,000, $55,000, $60,000, $65,000, $70,000, $75,000, $80,000, $85,000, $90,000, $10,000,000

In this data set, the last value, $10,000,000, is an outlier. Outliers are extreme values that differ significantly from the other values in the data set. Outliers can be due to measurement errors, data entry errors, or genuinely extreme values.

The mean of this data set can be calculated by adding up all the values and dividing it by the number of observations, which is 10. So, the mean salary in this data set is:

Mean = ($50,000 + $55,000 + $60,000 + $65,000 + $70,000 + $75,000 + $80,000 + $85,000 + $90,000 + $10,000,000)/10 = $1,020,500

As we can see, the mean salary is significantly affected by the outlier value of $10,000,000. It is much higher than the other values in the data set and distorts the mean value.

On the other hand, the median of this data set can be calculated by arranging the values in order and finding the middle value. As there are 10 observations in this data set, the median will be the average of the fifth and sixth values. So, the median salary in this data set is:

Median = ($70,000 + $75,000)/2 = $72,500

As we can see, the median salary is not affected by the outlier value of $10,000,000. It is a more robust measure of central tendency in the presence of outliers.

The reason why the median is resistant to outliers is that it only considers the middle value(s) of the data set. Outliers do not affect the position of the middle value(s), and hence, the median value remains unchanged. On the other hand, the mean takes into account all the values in the data set and gives each value equal weightage. Outliers have a significant effect on the mean value because they are far away from the other values and can significantly change the overall average.

In conclusion, mean and median are essential measures of central tendency, but they behave differently in the presence of outliers. The median is a more robust measure of central tendency