When might the reported median of data be more appropriate than the  mean of data? What is your biggest challenge in deciding which is more appropriate?

What is the purpose of finding the mode of data? Provide an example of when the mode is more appropriate than the mean or median of data.

Response 1

One reason that we would want to use the median over the mean would  be if the information is far to skewed at the lowest point or the highest  point. For example, engineers at company all make between 50k to 60k, but a couple of them make over 100k, this would influence the mean to look the average that each engineer at the company would make is far more than what most of them make. The median would be lower and give a better picture of a salary that a typical engineer would make for the company. The biggest challenge of deciding to use the median over the mean is to decide if the two values have enough of a difference to declare the median is more accurate. I think it is only appropriate to use the median if there are some big skews on the lower end or higher end of the data that may effect the integrity of the information.

 

Response 2

Finding the mode helps find the most repetitive piece of data that may occur. If there is more than one piece of data that occurs the same amount of times as the highest, than there are multiple modes that can be declared. The reason why you would want to know the mode is if the data is nominal and a number represents something. This would make the mean and the median not important, but the mode would be information needed. An example would be testing soil for toxins. Separate areas are set up of an area to decide of the soil is useful. Red, which is toxic, would be equal to 1, yellow, which is in between, would be equal to 2, and green, which would be healthy, would be 3. After taking 10 samples, red appears 6 times, yellow 2 times, and green 2 times. Since the mode would be 6, which is red, the soil is not usable. The mean and median would have no significance here.

Response 3

The purpose of finding the mode of data is to find the value that is most common, example: The number of points scored in a series of football games (7, 13, 18, 24, 9, 3, 18) ordering the scores from least to greatest, (3, 7, 9, 13, 18, 18, 24) the score which occurs most often is 18.  In general, the mode is not used very often. It is not a measure of the center of the data in the same way that the median and mean can be. However, the mode is the only measure of center appropriate for nominal data. For example, if we were looking at the most frequently purchased food item in a certain snack bar in 1999, it does not make sense to talk about the median food or the mean food but it does make sense to say that the most frequent food purchased was mini chocolate bars. Sometimes there might be more than one mode in a set of data it is possible that the most popular food items purchased were both lamingtons and mini chocolate bars.

 

Response 4

One example that can be used when the median is more appropriate than the mean is a football players stats.  For this example I will use a quarterback.  In each of seven seasons, a quarterback threw for 3,596 yds, 3,348 yds, 105 yds, 3,624 yds, 4,060 yds, 4,128 yds, and 396 yds.  For all seven years the quarterback threw for and average of 2,751 yds.  This may not be a true example of how good the quarterback is, because one season he threw 105 yds, and another season only 396 yds.  He may have been injured these seasons, and these numbers skew the overall picture of how good the quarterback is.  If we find the median, which in this case may show his potential more accurately. The median yards thrown for all seven seasons is 3,596.  You can apply this to any sports stats you can think of.  Seasons cut short due to injuries or other factors will produce low numbers and lower the mean.

 

One of the major problems that statisticians face is whether to keep the outliers in your analysis or whether you should throw them out. If you worked as a statistician for this football team and you were rating this player based on his yards, would you include the outliers in your analysis/ calculation or would you opt to throw them out? Explain.

 

Response 1

As a statistician for this football, if i was rating this player based on his yards, I would be inclined to include the outliers in my analysis.  I understand that some of the stats may have been compiled from a shorter season (due to injury, being cut from a team, or benched for a more talented player).  However, all of these factors should be taken into account when deciding the talent level of a player.  If someone got injured, or cut from a team due to an arrest off the field, scouts and team owners would want to know about it.  It may not be fair, but it is correct.

 

Response 2

Sports statistics are a good example of when the median is used to analyze a player over mean. Sometimes, athletes will have extremely good or bad games, causing skewed statistics that will influence the mean to look like the player is better or worse than what they are. Looking at the median can give a better picture of what the player normally would do in a usual game and someone like a scout could determine if that player is good enough for a team.

 

Another question that we will be discussing this week is calculating the variance and standard deviation. When is it more appropriate to use the standard deviation of data rather than the variance of data? Is one a better measure of dispersion than the other? Explain

 

Response 1

It is more appropriate to use the standard deviation of data rather than the variance of data when you are trying to compare results from the mean of the data.  For example, if you took the heights of different breeds of dogs, it would be useful to look at the standard deviation of data. You can calculate the variance and standard deviation and then compare the heights of breeds to see which are within one standard deviation of the mean.  Using the standard deviation, we have a “standard” way to know what is normal, and what breed is extra large or extra small.

 

Response 2

In response to the teachers question when would it be more appropriate to use the standard deviation of data rather than the variance of data? And after reading and learning more about each I was able to determine that when we report the mean, then it is more appropriate to report the standard deviation as it is expressed in the same unity. Moreover, it is easier for the reader to consider confidence intervals if the standard deviation is provided rather than the variance. However, you may consider reporting the variance if you are interested in comparing variance and bias, or giving “different variance components”, since the total variance is the sum of the intra and inter variances, while the standard deviations do not sum up.

 

Response 3

The standard deviation is best described as a measure of how spread out the numbers are within a sample, whereas the variance is the average of the squared differences from the mean. The difference comes in when you look at the source of your data. If the source of the data is a whole population, the standard deviation is more appropriate, but if you are using a sample from a population, we need to find the variance. This, in essence, corrects for not having the data from the whole population.

 

Response 4

There are certain times when to use mean and certain times when to use median. The median is always representative of the center of the data. The mean is only representative if the distribution of the data is symmetric, otherwise it may be heavily influenced by outlying measurements. The median is not subject to change, for example a distribution center that produces boxes may make 5,000 boxes per day and that is typically what they are set at. That would be where the median falls in to. The mean on the other hand, is subject to change. An example of where the mean would come into play would be the number of customers that a restaurant brings in during the week. Monday may be your slowest day with only 10 people, whereas Friday you may bring in close to 60 people. The mean would be affected by a single change in the number of customers per day. So there are always different instances where mean and median should be used, it just all depends on the data that is being used.

 

Response 5

A distribution center is a good example.  Another good example that I know, along the lines of a distribution center, is a company that stocks candy at a gas station.  These companies need to know what candy is selling the most, and also what time of year people want specific candy.  Like this time of year, Easter, people want chocolate bunnies and Peeps.  Also, geographical location is another example of what kind of candy to stock up on.  For example in the Aspen area of Colorado, in winter you need to have more candy on hand compared to the summer months.  They may sell an average  of one box of certain candy every day throughout the year, but they may only sell 1/4 box in the summer months.  Knowing the median is better then the average, just to make sure they don’t run out.

 

Response 6

Finding the mode is very important when it comes to certain pieces of data. The mode is the value that appears most often in a set of data. Like the statistical mean and median, the mode is a way of expressing, in a single number, important information about a random variable or a population. The numerical value of the mode is the same as that of the mean and median in a normal distribution, and it may be very different in highly skewed distributions. A good example of when to use mode is when you are trying to figure out the data that is showing up the most times. In a lab, you are testing colors such as blue, red, and green. The blue shows up 4 times, the red shows up 7 times and the green shows up 10 times. So the mode would be the green since it shows up the most times.

 

How meaningful is the range of a data set? If you had to choose to analyze the variance of the data would you choose to use the range or the standard deviation? Why??

Response 1

Range is the difference between the largest value and the smallest value in the data set.  While being simple to compute, the range is often unreliable as a measure of dispersion since it is based on only two values in the set. Example, 12,25,27,29,36,38,40,43,50,54,62, range=62-12=50.  A range of 50 tells us very little about how the values are dispersed. Mean absolute deviation, variance and standard deviation are ways to describe the difference between the mean and the values in the data set without worrying about the signs of these differences.

 

The range is the difference between the highest and lowest scores in a data set and is the simplest measure of spread.  Even though using the range as a measure of spread is limited, it does set the boundaries of the scores.  This can be useful if you are measuring a variable that has either a critical low or high threshold that should not be crossed.  The range will instantly inform you whether at least one value broke these critical thresholds.  Also, the range can be used to detect any errors when entering data.  For example, if in your study you recorded the age of school children and your range is 7 to 123 years old, you would know that you made a mistake.

 

Response 2

The range of data set is the difference between the highest and lowest scores in a data set and is the simplest measure of spread. For example, you have a set of test scores. The test scores are as follows: 56, 98, 95, 75, and 40. So the range would be 98 minus 40 which comes to 58. Sometimes the range is good to use with certain situations, but it can also be a bad choice to use with some things as well. If it’s used with the wrong data, it could really throw your answers off and make it not look realistic. So sometimes you just have to watch where you are using the range

 

 

Response 3

A range is generic, it is only a measure of the highest and lowest score and this could be misleading in the fact that an individual may feel they will either get the lowest or highest score.  Standard Deviation has to take all scores into account so it would be more accurate because each number is significant.  For instance, I have seen a credit card application advising that range of credit scores from 560-800 are likely approved however there are other factors that must be considered.  It would be best to provide a sample deviation of the population who has been approved, using their credit scores for offering a proper range if that breakdown were necessary.

The importance and usefulness of looking at statistics about your data set is that you can determine whether or not the results are really representative of the data.  The range will tell you how spread apart your data is.We would need to look at the range.  Let’s say our range is 150.  This means that the difference between the child with the most sets and the one with the least is 150. Another thing that we could look at is the standard deviation.  This “normalizes” the data and gives us an idea of how far apart our data is from the mean.  If we have a large standard deviation.This means that our data is farther from our mean. If we have a small standard deviation, that means that our data is closer to our mean.

 

Response 4

Standard deviation is more specific.  The range is only the lowest and highest number of a set and this can be misleading if there are outliers present in the set of numbers.  A deviance ca. Fall within the range but is more reliable when trying  to factor or the best mean.  I would trust a standard deviation than  range.   It’s too broad.

 

Response 5

A good example of when to report a median over the mean of information would be on national home prices.

Given the diverse areas, income levels, available housing, market demand and other factors, using the mean of the data

would not be as representative as using the median because the median would “find the middle” value.

As illustrated, you could have two homes, one valued at $1M and one valued at $100,000.

These would obviously be homes in different areas, markets and with disparate quality.

The average may be $550,000 however, depending upon the number of homes at the higher or lower end of the market,

this may not reflect the true middle.

Looking at the complete population and finding the mean, would give a more representative sample of the population.

This is why in news reports, most reporting agencies use “median home price” when discussing the current market.

 

A good example of where mode might be a better approach than median or mean could be on calculating the frequency and reasons in which

students decide to drop out of college and how they correlate.  You may find that there is a higher frequency associated with financial hardship or difficulty managing time, whereas if you simply took an average (mean) or a median of the information, the conclusions may tell us less about WHY the students actually dropped out.  We could know how many on average, but we could draw a more specific conclusion by looking deeper into the data to understand the cause, effect and frequency.

 

Response 6

The students have provided great examples of when to use median as opposed to the mean.  The mean has one major disadvantage.  It is particularly susceptible to the influence of outliers.  These are values that are unusual compared to the rest of the data set by being especially small or large in numerical value.  When the data is perfectly normal, the mean, median, and mode are identical. However, this is rarely the case.  Whenever we have an influence of outliers causing the data to be skewed, the median is a better solution than using the mean.

 

Response 7

Statcrunch can do many of the calculations needed for this week. Statcrunch is a statistical calculator that can compute the Measures of Variation and the Measures of Center for you. All you need to do is log into statcrunch and you have the statcrunch link to the right of the screen. Here is what you want to do:

 

To use StatCrunch to find the Mean, Median, Mode, Range, Variance and Standard Deviation, you want to follow these steps:

 

¨  Log into MSL and click on the Statcrunch button to the right of the screen

¨  Enter the data vertically under var1

¨  Select Stat > Summary Stats > Columns

¨  Under select column(s) click on the var1 to transfer it to the adjacent box. This tells statcrunch to perform the math on this variable.

¨  Select Mean, Median, Mode, Range, Variance and Standard Deviation. You can select only these values by using CTRL and clicking on the variables that you want.

¨  Finally hit compute and your statistics are computed and will appear in a new dialog box.

 

Statcrunch can do many of the calculations needed for this week. Statcrunch is a statistical calculator that can compute the Measures of Variation and the Measures of Center for you. All you need to do is log into statcrunch and you have the statcrunch link to the right of the screen. Here is what you want to do:

 

To use StatCrunch to find the Mean, Median, Mode, Range, Variance and Standard Deviation, you want to follow these steps:

 

¨  Log into MSL and click on the Statcrunch button to the right of the screen

¨  Enter the data vertically under var1

¨  Select Stat > Summary Stats > Columns

¨  Under select column(s) click on the var1 to transfer it to the adjacent box. This tells statcrunch to perform the math on this variable.

¨  Select Mean, Median, Mode, Range, Variance and Standard Deviation. You can select only these values by using CTRL and clicking on the variables that you want.

¨  Finally hit compute and your statistics are computed and will appear in a new dialog box.

 

Response 8

The mean of the data sample is what most of us think as the average. It is the sum of all the numbers in the sample divided by the number of values in the data set. The median, on the other hand, is the value in the middle of the data sample.

 

Because of how the mean is figured, high occurrences of values at the extremes of the spectrum can skew the mean of the data one way or the other. In these cases, it may be more appropriate to utilize the median value.

 

For example, looking at test scores in a class, if many people scored low on a test, then the class mean will be negatively affected. Looking for the median score, however, might better reflect what the average student would hope to score on the test