In this post I am going to talk briefly about outliers and the effect they might have on your data. With an example of course. Let's start with defining the word "outlier":

*what is an outlier in math/statistics?*

An outlier is basically a number (or data point) in a set o data that is either way smaller or way bigger than most of the other data points.

Let's go through a practical example in order to understand

**the implications of having an outlier within your data set**.

Say we have a sample data set like the following:

For this data set I can easily calculate the mean which is 4.3:

I can also find the median which represents the middle value of the distribution. In our case, since there are two middle values I can average them and get a median of 4.5.

And I can algo figure out the mode which is 5 since this is the most frequent value in the distribution.

Finally, let's calculate the standard deviation by which I can see how much my data are spread out around the mean (remember that the standard is the square root of the variance).

Cool, we now know the mean, median, mode and standard deviation for our sample data set:

All right, let's now make a change on our data set. Imagine to remove the the last data point 6 and replace it with a much bigger value like 600...yep

**an outlier**.

See now what happens when we calculate again the mean, median, mode and standard deviation.

**The new mean is much higher**, 63.7! As expected, the standard deviation is much hogher too. On the other hand,

**median and mode remain exactly the same**.

So, this is what happens if you have outliers.

**Outliers skew the data when you are trying to do any type of average**.

*What can you do then if you need to get a measure of central tendency?*

It really depends on each specific situation how to deal with outliers. What is sure, anyway, is that most statistics measures like means, standard deviations, correlations, etc. can be strongly influenced by outliers and you might end up with an incorrect analysis. Generally you can follow two different strategies:

**Remove the outliers**, and and analyse your data set without them. In such case, the mean would not be affected and you might use it as a measure of central tendency.**Do not use the mean**. In this case you keep the outliers, but since the mean would be change a lot, you might use instead other measures of central tendency like the median or the mode.

Either case, I think it's important to

**report in your analysis**that you identified outliers and what decision you made of them.*Why did you drop them? Why those values happened to be out there? Was it likely to be a data entry mistake? What were your assumptions?*
Great beat ! I wish to apprentice while you amend your site, how could i subscribe for a blog website? The account helped me a acceptable deal. I had been a little bit acquainted of this your broadcast provided bright clear concept

ReplyDeletecreate email marketing campaignsGreat Article

DeleteData Mining Projects IEEE for CSE

JavaScript Training in Chennai

Project Centers in Chennai

JavaScript Training in Chennai

Wow that was odd. I just wrote an really long comment but after I clicked submit my comment didn't show up. Grrrr... well I'm not writing all that over again. Anyways, just wanted to say fantastic blog! T shirt supplier in Singapore

ReplyDeleteexcellent post, very informative. I wonder why the other specialists of this sector do not notice this. You should continue your writing. I am sure, you've a great readers' base already! customer support software

ReplyDeleteThis is very interesting, You're a very skilled blogger. I've joined your feed and look forward to seeking more of your wonderful post. Also, I've shared your site in my social networks! The Takeaway

ReplyDeleteIts like you read my mind! You appear to know so much about this, like you wrote the book in it or something. I think that you could do with a few pics to drive the message home a little bit, but other than that, this is great blog. A fantastic read. I will certainly be back.

ReplyDeleteSingapore SEO agenciesAppreciating the persistence you put into your site and in depth information you present. It's awesome to come across a blog every once in a while that isn't the same outdated rehashed information. Great read! I've saved your site and I'm including your RSS feeds to my Google account.

ReplyDeleteDouble parallel fold booklet printing servicesتعمل الشركة على مكافحة الثعابين في الدمام لإبادة جميع أنواعها. الثعابين بأحجام وأشكال مختلفة. لدينا معدات وأدوات حديثة لمحاربة الثعابين.شركة مكافحة حشرات

ReplyDeleteشركة مكافحه النمل الابيض بالمزاحمية

شركة مكافحه حشرات بالمزاحمية

There are some interesting deadlines on this article however I don’t know if I see all of them middle to heart. There may be some validity but I'll take maintain opinion until I look into it further. Good article , thanks and we wish extra! Added to FeedBurner as well

ReplyDeleteMediaOne is a web marketing consultantThanks for the post, can I set it up so I receive an update sent in an email whenever you make a new post?

ReplyDeleteclear and definite SEO strategyI wanted to type a note to be able to appreciate you for all of the superb tips and hints you are giving on this website. My time intensive internet search has finally been compensated with excellent facts and techniques to talk about with my classmates and friends. I 'd mention that many of us visitors are undeniably blessed to be in a wonderful site with so many wonderful professionals with very beneficial methods. I feel somewhat grateful to have encountered your webpages and look forward to some more awesome times reading here. Thanks once more for all the details.

ReplyDeleteBetter user-experience is a factor in SEO rankingsThis is really attention-grabbing, You are a very skilled blogger. I have joined your feed and look forward to in search of extra of your fantastic post. Additionally, I've shared your web site in my social networks!

ReplyDeleteWeb Design in SingaporeYou actually make it seem so easy with your presentation but I find this topic to be actually something that I think I would never understand. It seems too complex and extremely broad for me. I am looking forward for your next post, I’ll try to get the hang of it!

ReplyDeleteHow to choose SEO agencyLikelihood, measurements, and AI go under the extent of Mathematical perspective while connected angles help you gain learning of information science, dialects which incorporates Python, MATLAB, JAVA, SQL. ExcelR Data Science Courses

ReplyDeleteThank you very simple and understandable

ReplyDelete