# Data visualization and descriptive statistics

1
1
STAM4000
Quantitative Methods
Week 2
Data visualization and
descriptive statistics
2
COMMONWEALTH OF AUSTRALIA
WARNING
This material has been reproduced and communicated to you by or on behalf of Kaplan
Business School pursuant to Part VB of the
Copyright Act 1968 (the Act).
The material in this communication may be subject to copyright under the Act. Any further
reproduction or communication of this material by you may be the subject of copyright
protection under the Act.
Do not remove this notice.
2

 3 on e #1 #2 #3 Create data visualisations Distinguish between measures of central tendency Distinguish between measures of dispersion Week 2 Data visualisati and descriptiv statistics Learning Outcomes

4
#1 Create data visualisations
1&usg=AI4_-kT88IfIf_dkQGI1tipICwu3u78KHQ&sa=X&ved=2ahUKEwjPr6rW1pjuAhW7yDgGHfODA-AQ9QF6BAgJEAE#imgrc=YQEFz4DyRQqgNM&imgdii=ZEF-JXVm8KeepM

5
bm=isch&hl=en&chips=q:beautiful+cutest+cat,g_
1:beautiful:y5l6wMp0MCI%3D,online_chips:kitte
n&rlz=1C1CHBF_enAU841AU846&sa=X&ved=2ah
UKEwiHkLzl53uAhVZCLcAHQ8hCZgQ4lYoBnoECAEQIg&biw=
Do you like to draw diagrams when explaining something? 1466&bih=635#imgrc=r4ntESJIpg3H3M
When using Google maps for directions, do you prefer to watch
the map and mute the audio?
When meeting new people, do you find it easier to remember
faces, instead of names?
Do you use a mind map diagram with links and words to organize
and remember things?
Are you more of a
VISUAL thinker or a VERBAL thinker?
Count how many times you reply
“yes” to the following quiz questions:
Why
does
this
matter?
A picture
tells a
thousand
words ..

6
#1 Example of visualisations
Example of
charts
Categorical
Pie chart
Bar chart and Pareto chart
Quantitative
Pie chart
Histogram
Frequency
Polygon
Frequency
curve
Stem and Leaf Plot Ogive
Box Plot
7
Pie chart:
Label segments or use a legend.
Check segment size
Check segment values
Check categories are mutually
exclusive and collectively
exhaustive.
Check total value of pie chart:
o If frequencies, check totals
sample size.
o If relative frequencies, check
totals 1 or 100%
o Note: If the pie chart is for
quantitative data and
displaying numerical, check
totals to sum of values.
Charts for categorical data
Biotechnology
10%
Capital
Markets
10%
Diversifes
Banks
40%
Grocery Stores
10%
Home
Improvement
Retail
10%
Metals &
Mining
20%
Pie chart for top ASX 10 companies in Australia (%)
#1
8
More charts for categorical data
Bar chart Pareto bar chart
One bar per category
Bar height reflects frequency
Equal bar width
Gaps between bars
Sorted bars in ascending or descending
order
0%
5%
10%
15%
20%
25%
30%
35%
40%
45%
Bar chart of top 10 companies from Australia (%)
#1

 9 Histogram • One bar per class • Bar height may reflect frequency or relative frequency • Equal class widths • NO gaps between bars. Why? Following number scale. Description of histogram: • General shape: o symmetric (evenly balanced) OR o skewed (tail on either end) • Peaks: number and position • Unusual features: gaps, multiple peaks, no peak etc. Common chart for quantitative data 12 5 0 10 15 (0, 5] (5, 10] (10, 15] (15, 20] (20, 25] (25, 30] Frequency Class (minutes) Histogram of call wait times #1 Example: The call centre of an electricity provider has received a number of complaints from customers that the call wait time is too long. The manager of the call centre claims that most wait times are 15 minutes or less. To investigate the complaints, a consumer group telephoned the electricity provider 25 times and recorded the call wait times . This histogram displays the data collected by the consumer group. E X C E L

1 2
0
3
7

10
Understanding the importance of shape
Bimodal
Multimodal
Uniform
Unimodal and symmetric
Positively skewed (or skewed to the right)
Negatively skewed (or skewed to the left)
How would
you
describe the
shape of
the
histogram
for the call
waiting
example?
#1
11
How do we
describe a data set?
We use descriptive statistics.

 Shape For a histogram or frequency curve: •Is there a single peak or several peaks? oIs it symmetrical or skewed?

 Centre •If you had to pick a single number to describe all the data, what would you choose?

 Spread •Since statistics is about variation, how dispersed is our data?

 Unusual features •Are there any gaps in the data set? •Is there more than one mode? If so, is there a lurking variable?

#1
12
© 2010 Pearson Education
Example: These histograms compare the daily volume (number) of shares traded by
month on the New York Stock Exchange (NYSE) in one year, divided by January to
June and July to December. Histograms are OK for comparing two groups; box and
whisker plots (or boxplots) are better when comparing several groups. See the next
slide.
#1

 Compare datasets with visualizations

13
© 2010 Pearson Education
Example
This chart of box and whisker plots compares the daily volume (number) of shares traded by
month on the New York Stock Exchange (NYSE) in one year. The months follow a calendar
year and are denoted by numbers. E.g.., 1= January
#1
14
© 2010 Pearson Education
From this visualization, we can ascertain the following:
March had the least variation overall; June and December had the greatest variation
overall.
May and November have the highest median sales traded; August had the lowest median
March had the smallest interquartile range; December had the largest interquartile range
March, May, June, July, September and November each had trading days with extreme
values.
All months had skewed distributions.
#1 Example continued
15
Box and whisker plot (boxplots)
Displays a five-number summary:
o minimum
o Q1
o median, Q2
o Q3
o maximum
Median shown inside box
Length of box displays interquartile range
Whiskers show data values considered usual
Shapes e.g., dot or asterisk, represent unusual data values (outliers);
o dot to represent values outside 1.5 IQR
o asterisk to represent values outside 3 IQR, from nearest quartile
#1
16
Boxplot
https://lsc.deployopex.com/box-plot-with-jmp/
#1
17
General shapes of frequency curves and boxplots
Negatively
skewed
Unimodal
and
symmetric
Positively
skewed
#1
18
#2 Distinguish between measures of central tendency
http://methods.sagepub.com/book/testing-and-measurement/n4.xml

 19 Population parameters and sample statistics Population parameters •Measurements based on the entire data set. Sample statistics •Measurements based on a sample of data. Notation •Greek letters for population parameters. •English letters for sample statistics. https://www.causeweb.org/cause/resources/fun/cartoons/parameter-notation #1

20
#2 What is the typical value for a data set?
https://nebusresearch.wordpress.com/tag/statistics/
21
#2
Modal
value:
most
frequently
occurring
value
Modal class:
the class(s) with
the highest
frequency, or
tallest peak(s)
in a bar chart or
histogram
Mode, Mo

 •It can be found for both categorical and quantitativ data.

the mode:

 •It’s use is limited to descriptive statistics. •It does not use all the value in a data set.

of the mode:

22
#2
A dataset with one
mode is unimodal.
E.g.. A sample of
latte prices (\$):
5, 3, 6, 5, 4, 6, 5
Mo = \$5
A dataset with
two
modes is bimodal.
E.g.. A sample of
espresso prices (\$:)
4, 5, 6, 3, 6, 5, 6, 4, 5
Mo = \$5 and \$6
A dataset with
three
or more
modes
is
multimodal.
E.g.. A sample of
ice-coffee prices (\$):
5, 8, 7, 6, 5, 9, 7, 6
Mo = \$5, \$6 and \$7
A dataset with
no mode
is
uniform.
E.g.. A sample of
cappuccino prices (\$):
5, 3, 4, 6
No mode
Number of modes
23
#2 Median, Me

 •It is not influenced by extreme hi or by extreme low values. Hence when we have a skewed data set, the median is usually the best measure of central tendency.

the median:

 •It does not use all the values i data set. •Only used in descriptive statis •It is tedious to calculate manu •Cannot find the median for categorical data.

of the median:
Median:
the middle
value
(midpoint) in an
ordered set of
numbers.

24
#2 Median, Me
If n Is ODD, the median is the middle
value in a sorted dataset.
E.g.. Sample of customer sales (\$)
8, 12, 4, 10, 7
Sorted: 4, 7, 8, 10, 12
n = 5
Median = \$8
If n is EVEN, the median is the
average of the two middle values in a
sorted dataset.
E.g.. Sample of customer sales (\$)
8, 12, 4, 10, 7, 13
Sorted: 4, 7, 8, 10, 12, 13
n = 6
Median = (8 +10)/2 = \$9
HINT: Sort data first
25
#2 Mean, μ or