Saturday 18 July 2015

GoogleVis package in R

Introduction:

In 2006 Hans Rosling gave an inspiring talk at TED about social and economic developments in the world over the last 50 years, which challenged the views and perceptions of many listeners. Rosling had used extensive data analysis to reach his conclusions. To visualize his talk, he and his team at Gapminder had developed animated bubble charts.


Rosling's presentation popularized the idea and use of interactive charts, and as a result the software 
behind Gapminder was bought by Google and integrated as motion charts into their Visualisation APIone year later.
"googleVis is an R package providing an interface between R and Google Charts. The functions of 
the package allow the user to visualize data with the Google Chart Tools without uploading their 
data to Google.
The output of googleVis functions is html code that contains the data and references to JavaScript 
functions hosted by Google. To view the output a browser with Flash and Internet connection is 
required, the actual chart is rendered in the browser.
Source: google
Data visualizations are an integral part of the Analytics process as it helps to convey important information from a multitude of data easily and attractively.The output of the various functions in the package can be embedded into a web page.In this article we will look at a few basic functions in the googleVis package for visualizing data. 
1.TreeMap:

A tree map is a visual representation of a data tree, where each node can have zero or more children,and one parent (except for the root, which has no parents). Each node is displayed as a rectangle,sized and colored according to assigned values .Sizes and colors are valued relative to all other nodes in thegraph. The default behavior is to move down the tree when a user left-clicks a node, and to move back up the tree when a user right-clicks the graph.We will study the package googleVis and it’s dependencies,the packages it imports and enhances via the gvisTreeMap function.For this lets see the dependencies graph depicting the relationships.

library(“sqldf”)
library("googleVis")
tag <- "googleVis"
plot(makeDepGraph(tag, includeBasePkgs=FALSE, suggests=TRUE, enhances=TRUE), 
     legendPosEdge = c(-1, 1), legendPosVertex = c(1, 1), vertex.size=20)


As you can see in the above graph,googleVis is the parent for knitr and knitr imports 7 other packages.googleVis has no parent and has 23 linkages.Similarly the values for all the packages like stable,R6 etc are calculated and arranged in the below format:

Each record is a node and can have only one parent.The values in the column Val are incoming links + 1.So for the packages like RCCP,R6 etc. it is 1. Knitr has 7 imports and hence Val is 7+1. Fac is a column containing random numbers to generate colors for the tree.
Tree <- gvisTreeMap(pkg,idvar = "Package",parentvar = "Parent",
                    sizevar = "Val",color = “Fac”)
plot(Tree)

Attributes of the function:

data-The dataframe(pkg) used as an input to the function needs to have atleast 4 columns.Each record represents one node and each node has one or more parent nodes.
idvar- column name of data describing the ID for each node.This value is displayed as the node header.
parentvar-column name of data that match to entries in idvar. If this is a root node, its value is NA. Only one root is allowed per treemap.
sizevar -column name of data with positive values to define the size of maps. Any positive value is allowed.This value determines the size of the node, computed relative to all other nodes.

colorvar- column name of data with values to define range of color.The value is used to calculate a color for this node. The color value is first recomputed on a scale from minColorValue to maxColorValue,and then the node is assigned a color from the gradient between minColor and maxColor.


The Output:

TreeMapID1024753d785a
Data: pkg • Chart ID: TreeMapID1024753d785agoogleVis-0.5.8
R version 3.2.0 (2015-04-16) • Google Terms of UseDocumentation and Data Policy


For the charts 2,3 & 4 we will be using the below dataset os

2.MotionChart:
A motion chart is a dynamic chart to explore several indicators over time.Here we will explore the variation of Sales,Profit and expenses of the three items across the years.
M <- gvisMotionChart(os,idvar="Item",timevar="Year")
plot(M)
Attributes of the function gvisMotionChart:
data- a data.frame. The data has to have at least four columns with subject name (idvar), time (timevar) and two columns of numeric values. Further columns,numeric and character/factor are optional. The combination of idvar and timevar has to describe a unique row. The column names of the idvar and timevar have to be specified.
idvar- column name of data with the subject to be analysed.

timevar - column name of data which shows the time dimension. The information has to be either numeric, of class Date or a character .

Output:

MotionChartID102473f95523
Data: os • Chart ID: MotionChartID102473f95523googleVis-0.5.8
R version 3.2.0 (2015-04-16) • Google Terms of UseDocumentation and Data Policy


3.Bar chart:

For the bar chart we combine the expenses,sales and profit for the y-axis and plot its variation              across locations.When we hover above each bar,the corresponding values are displayed.


Attributes of the function gvisBarChart:
xvar- name of the character column which contains the category labels for the x-axes.
yvar-a vector of column names of the numerical variables to be plotted.Each column is                        displayed as a separate bar/column.      

Output:

BarChartID102478ce3fa2
Data: data • Chart ID: BarChartID102478ce3fa2googleVis-0.5.8
R version 3.2.0 (2015-04-16) • Google Terms of UseDocumentation and Data Policy

4.Pie Chart:
An interactive pie chart which shows the values of the variable which is specified while creating the chart.
pie_data <- sqldf("select Item,sum(Sales) as sales from os group by Item")
pie <- gvisPieChart(pie_data,labelvar = 'Item',numvar = 'Sales')
plot(pie)
Attributes of the function gvisPieChart:
labelvar -Name of the character column which contains the category labels for the slice labels.
numvara vector of column names of the numerical variables of the slice values.The output of the above code:

PieChartID10247222555f
Data: pie_data • Chart ID: PieChartID10247222555fgoogleVis-0.5.8
R version 3.2.0 (2015-04-16) • Google Terms of UseDocumentation and Data Policy

5.Geo Map(World):
A geo map is a map of a country, continent, or region map, with colours and values assigned to specific regions. Values are displayed as a colour scale, and optional hover-ext for regions can be specified.
For our understanding we will be using the distribution of Mac Donalds stores across the world.There are 113 locations(Country/Territory) where the stores are located,but for our example here will consider only 21 such locations.



map_world <- gvisGeoMap(Mac_Donalds, locationvar = "Country", 
                   numvar="Numberofoutlets",
                   options=list(dataMode="regions"))                  

plot(map_world)

Attributes of the function gvisGeoMap:
data -data.frame. The data has to have at least two columns with location name (locationvar), value to be mapped to location (numvar) and an optional variable to display any text while the mouse hovers over the location (hovervar).
locationvar- column name of data with the geo locations to be analysed. The locations can be provide in two formats:
Format 1 latitude:longitude.
Format 2 Address, country name, region name locations, or US metropolitan area codes.
If we use regions other than US we need to specify the region in options.
numvar -column name of data with the numeric value displayed when the user hovers
over this region.
hovervar- column name of data with the additional string text displayed when the user
hovers over this region.
options -list of configuration options. 
The Output:

GeoMapID102430ee7e50
Data: Mac_Donalds • Chart ID: GeoMapID102430ee7e50googleVis-0.5.8
R version 3.2.0 (2015-04-16) • Google Terms of UseDocumentation and Data Policy



7.Geo Map(For a specific country):
For this part we will use the distribution of screens of PVR cinemas across the country.



The output :

PVR_SCREENS
Data: pvr • Chart ID: PVR_SCREENSgoogleVis-0.5.8
R version 3.2.0 (2015-04-16) • Google Terms of UseDocumentation and Data Policy




Notice the differences between this graph and the preceding one:
A region is specified here.This has to be done when the states are in a country other than the US.
A hovervar variable is specified here.This will show us the name of the state when we place the point above the markers.
The datamode here is markers.The "markers" style displays a circle, sized and colored to indicate a value,over the regions.
Conclusion:
There are many packages in R which can be used to create interactive plots,but displaying them over the browser is what makes GoogleVis special.Have you used GoogleVis in any other ways,please feel free to make suggestions.
Reference: official google vis vignette

No comments:

Post a Comment