Should we drop or embrace the outliers in our data? Outliers are extreme values that are abnormally too high or too low

Post by **answerhappygod** » Sun Oct 03, 2021 12:23 pm

Should we drop or embrace the outliers in our data?
Outliers are extreme values that are abnormally too high or too
low relative to the majority of the observations in the data.
Existence of outliers in a data can significantly affect the
process of estimating population parameters and not controlling for
it may lead to bias estimates of certain population parameters.
There are certain methods to detect outlier in a data
(Box-plots, standardized values, regression methods etc. ) and to
treat them (dropping, trimming, re-weighting, imputing etc.).
The reading article by Shapiro emphasizes learning from outliers
instead of ignoring for consumer experience analysis. Can we
do the same line of reasoning for the AI-powered automation (with
minimal human intervention or supervision) where data patterns are
discovered to offer automated decisions and during the data
preprocessing, some outlier detection and removal method are
applied? By using the same analogy, can we say that
AI-powered automation is like talking with an annoying chatbot on
AT&T website; on the other hand, data analysis done by
professionals is like talking with a real AT&T service
provider?