Truncated mean > 통계용어

본문 바로가기
서울논문컨설팅 / 무료상담 010-2556-8816
신뢰할수 있는 서울대 박사님들이 함께합니다. seoulpaper@daum.net, 02-715-6259


Home > 통계 > 통계용어
통계용어

Truncated mean


From Wikipedia, the free encyclopedia

https://en.wikipedia.org/wiki/Truncated_mean 

A truncated mean or trimmed mean is a statistical measure of central tendency, much like the mean and median. It involves the calculation of the mean after discarding given parts of a probability distribution or sample at the high and low end, and typically discarding an equal amount of both. This number of points to be discarded is usually given as a percentage of the total number of points, but may also be given as a fixed number of points.

For most statistical applications, 5 to 25 percent of the ends are discarded; the 25% trimmed mean (when the lowest 25% and the highest 25% are discarded) is known as the interquartile mean. For example, given a set of 8 points, trimming by 12.5% would discard the minimum and maximum value in the sample: the smallest and largest values, and would compute the mean of the remaining 6 points.

The median can be regarded as a fully truncated mean and is most robust. As with other trimmed estimators, the main advantage of the trimmed mean is robustness and higher efficiency for mixed distributions and heavy-tailed distribution (like the Cauchy distribution), at the cost of lower efficiency for some other less heavily-tailed distributions (such as the normal distribution). For intermediate distributions the differences between the efficiency of the mean and the median are not very big, e.g. for the student-t distribution with 2 degrees of freedom the variances for mean and median are nearly equal.

 

 

Terminology[edit]

In some regions of Central Europe it is also known as a Windsor mean,[citation needed] but this name should not be confused with the Winsorized mean: in the latter, the observations that the trimmed mean would discard are instead replaced by the largest/smallest of the remaining values.

Discarding only the maximum and minimum is known as the modified mean, particularly in management statistics.[1] This is also known as the Olympic average (for example in US agriculture, like the Average Crop Revenue Election), due to its use in Olympic events, such as the ISU Judging System in figure skating, to make the score robust to a single outlier judge.[2]

Interpolation[edit]

When the percentage of points to discard does not yield a whole number, the trimmed mean may be defined by interpolation, generally linear interpolation, between the nearest whole numbers. For example, if you need to calculate the 15% trimmed mean of a sample containing 10 entries, strictly this would mean discarding 1 point from each end (equivalent to the 10% trimmed mean). If interpolating, one would instead compute the 10% trimmed mean (discarding 1 point from each end) and the 20% trimmed mean (discarding 2 points from each end), and then interpolating, in this case averaging these two values. Similarly, if interpolating the 12% trimmed mean, one would take the weighted average: weight the 10% trimmed mean by 0.8 and the 20% trimmed mean by 0.2.

Advantages[edit]

The truncated mean is a useful estimator because it is less sensitive to outliers than the mean but will still give a reasonable estimate of central tendency or mean for many statistical models. In this regard it is referred to as a robust estimator. For example, in its use in Olympic judging, truncating the maximum and minimum prevents a single judge from increasing or lowering the overall score by giving an exceptionally high or low score.

One situation in which it can be advantageous to use a truncated mean is when estimating the location parameter of a Cauchy distribution, a bell shaped probability distribution with (much) fatter tails than a normal distribution. It can be shown that the truncated mean of the middle 24% sample order statistics (i.e., truncate the sample by 38%) produces an estimate for the population location parameter that is more efficient than using either the sample median or the full sample mean.[3][4] However, due to the fat tails of the Cauchy distribution, the efficiency of the estimator decreases as more of the sample gets used in the estimate.[3][4] Note that for the Cauchy distribution, neither the truncated mean, full sample mean or sample median represents a maximum likelihood estimator, nor are any as asymptotically efficient as the maximum likelihood estimator; however, the maximum likelihood estimate is more difficult to compute, leaving the truncated mean as a useful alternative.[4][5]

Drawbacks[edit]

The truncated mean uses more information from the distribution or sample than the median, but unless the underlying distribution is symmetric, the truncated mean of a sample is unlikely to produce an unbiased estimator for either the mean or the median.

Examples[edit]

The scoring method used in many sports that are evaluated by a panel of judges is a truncated mean: discard the lowest and the highest scores; calculate the mean value of the remaining scores.[6]

The Libor benchmark interest rate is calculated as a trimmed mean: given 18 response, the top 4 and bottom 4 are discarded, and the remaining 10 are averaged (yielding trim factor of 4/18 ≈ 22%).[7]


Consider the data set consisting of:

 

{\displaystyle \{92,19,\mathbf {101} ,58,\mathbf {1053} ,91,26,78,10,13,\mathbf {-40} ,\mathbf {101} ,86,85,15,89,89,28,\mathbf {-5} ,41\}\qquad (N=20,mean=101.5)}\{92,19,{\mathbf  {101}},58,{\mathbf  {1053}},91,26,78,10,13,{\mathbf  {-40}},{\mathbf  {101}},86,85,15,89,89,28,{\mathbf  {-5}},41\}\qquad (N=20,mean=101.5)

 

The 5th percentile (-6.75) lies between −40 and −5, while the 95th percentile (148.6) lies between 101 and 1053 (values shown in bold). Then, a 5% trimmed mean would result in the following:

 

{\displaystyle \{92,19,101,58,91,26,78,10,13,101,86,85,15,89,89,28,-5,41\}\qquad (N=18,mean=56.5)}\{92,19,101,58,91,26,78,10,13,101,86,85,15,89,89,28,-5,41\}\qquad (N=18,mean=56.5)

 

This example can be compared with the one using the Winsorising procedure.

See also[edit]

References[edit]

  1. Jump up^ Arulmozhi, G.; Statistics For Management, 2nd Edition, Tata McGraw-Hill Education, 2009, p. 458
  2. Jump up^ Paul E. Peterson (August 3, 2012). "Lessons from LIBOR". Once the quotes are compiled, LIBOR uses a trimmed mean process, in which the highest and lowest values are thrown out and the remaining values are averaged. This is sometimes called an "Olympic average" from its use in the Olympics to eliminate the impact of a biased judge on an athlete's final score.
  3. Jump up to:a b Rothenberg, Thomas J.; Fisher, Franklin, M.; Tilanus, C.B. (1964). "A note on estimation from a cauchy sample". Journal of the American Statistical Association. 59 (306): 460–463. doi:10.1080/01621459.1964.10482170.
  4. Jump up to:a b c Bloch, Daniel (1966). "A note on the estimation of the location parameters of the Cauchy distribution". Journal of the American Statistical Association. 61 (316): 852–855. doi:10.1080/01621459.1966.10480912. JSTOR 2282794.
  5. Jump up^ Ferguson, Thomas S. (1978). "Maximum Likelihood Estimates of the Parameters of the Cauchy Distribution for Samples of Size 3 and 4". Journal of the American Statistical Association. 73 (361): 211. doi:10.1080/01621459.1978.10480031. JSTOR 2286549.
  6. Jump up^ Bialik, Carl (27 July 2012). "Removing Judges' Bias Is Olympic-Size Challenge". The Wall Street Journal. Retrieved 7 September 2014.
  7. Jump up^ "bbalibor: The Basics". The British Bankers' Association.

 

번호 제목 글쓴이 날짜 조회 수
15 다익스트라(Dijkstra) 알고리즘의 재발견 서울논문 03-15 2094
14 상관계수와 결정계수의 관계 서울논문 03-15 13099
13 그리스어/라틴어 알파벳 발음 서울논문 03-22 13064
12 통계유의도 서울논문 08-07 2185
열람중 Truncated mean 서울논문 12-27 2312
10 부트스트랩법 서울논문 07-06 2851
9 모수위의 모자(hat)-모수의 추정치 서울논문 10-06 4207
8 회귀분석의 다양한 종류들 서울논문 10-06 3710
7 모형적합(model fitting) 또는 모수추정(parameter estimation) 서울논문 10-06 2869
6 최대우도법(maximum likelihood) 서울논문 10-06 6120
5 표본 (sample) 서울논문 08-07 1334
4 추출틀 (frame) 서울논문 08-07 1685
3 추출단위 (sampling unit) 서울논문 08-07 2222
2 모집단 (population) 서울논문 08-07 1443
1 조사단위 (element) 서울논문 08-07 1836

대표:이광조ㅣ사업자등록번호: 643-09-02202ㅣ대표전화: 02-715-6259ㅣ서울시 용산구 효창원로 188