Tuesday, August 31, 2010

Example 7.28: Bubble plots

A bubble plot is a means of displaying 3 variables in a scatterplot. The z dimension is presented in the size of the plot symbol, typically a circle. The area or radius of the circle plotted is proportional to the value of the third variable. This can be a very effective data presentation method. For example, consider Andrew Gelman's recent re-presentation of health expenditure/survival data/annual number of doctor visits per person. On the other hand, Edward Tufte suggests that such representations are ambiguous, in that it is often unclear whether the area, radius, or height reflects the third variable. In addition, he reports that humans tend not to be good judges of relative area.

However, other means of presentin! g three dimensions on a flat screen or piece of paper often rely on visual cues regarding perspective, which some find difficult to judge.

Here we demonstrate SAS and R bubble plots using the HELP data set used in our book. We show a plot of depression by age, with bubble size proportional to the average number of drinks per day. To make the plot a little easier to read, we show this only for female alcohol abusers.

SAS

In SAS, we can use the bubble statement in proc gplot. We demonstrate here the use of the where data set option (section 1.5.1) for subsetting, which allows us to avoid using any data steps. SAS allows the circle area or radius to be proportional to the third variable; we choose the radiu! s for compatibility with R. We alter the size of the circles ! for the same reason. We also demonstrate options for coloring in the filled circles.


libname k "c:\book";

proc gplot data = k.help (where=((female eq 1)
and (substance eq "alcohol")));
bubble cesd*age=i1 / bscale = radius bsize=60
bcolor=blue bfill=solid;
run;



R

In R, we can use the symbols() function for the plot. Here we also demonstrate reading in data previously saved in native R format (section 1.1.1), as well as the subset() function and the with() function (the latter appears in section 1.3.1). The inches option is an arbitrary scale factor. We note that the symbols() function has a great deal of additional capability-- it can substitute squares for circles for plotting the third variable, and add additional dimensions with rectangles or stars. ! Proportions can be displayed with thermometers, and boxplots can also be displayed.


load(url("http://www.math.smith.edu/sasr/datasets/savedfile"))
femalealc = subset(ds, female==1 & substance=="alcohol")
with(femalealc, symbols(age, cesd, circles=i1,
inches=1/5, bg="blue"))


The results are shown below. It appears that younger women with more depressive symptoms tend to report more drinking.



scale factor examples

No comments:

Post a Comment