In this section
The Jupyter Notebook 'Sunburst.ipynb' which is available in Github contains the code used in this article.
What is Plotly Express
Plotly Express is built on top of the Plotly library and can create figures with just a single line of code. Plotly is free and open source.
One of the main reasons why Plotly is such a great visualization library is because it provides interactivity, beautiful aesthetics as well as a number of charting options.
What is a Sunburst Chart
Sunburst Charts allow you to visualize hierarchical data using concentric rings / slices, the inner circle is surrounded by rings belonging to lower level of hierarchy. The size of each ring is proportionate to a value. You can click on a specific ring to drill down and roll up the hierarchies that you specify.
Sunburst Charts can be used within a limited space to display a large number of items simultaneously.
Installation
Plotly Express is a built-in module from the Plotly library, so to be able to use Plotly Express you’ll need to install Plotly.
To install Plotly you can type either of the following in your terminal or command prompt:
pip install plotly
If you are using the Anaconda environment you can type
conda install plotly
You can import Plotly Express using the following line of code:
import plotly.express as px
The syntax for any Plotly Express chart is:
px.chart(data_frame, parameters)
Refer to the official guidance from Plotly here. https://plotly.com/python/getting-started/#installation
Note: I am using Jupyter Notebook as my preferred IDE but you can use other contexts such as Google Collaboratory, Visual Studio Code or even the Python Shell. Note that some other contexts might require a compatible renderer to be able to display your charts, refer to additional guidance in this instance.
Dataset Overview
As with other libraries in Python like Seaborn, Plotly has a number of built-in sample datasets.
I’ve selected the gapminder dataset which returns a pandas DataFrame consisting of Country names, Continents and key indicators such as GDP per Capita and Life Expectancy across various years.
Prior to returning the dataset we should import the pandas library to perform data cleansing.
import pandas as pd
You can import these sample datasets using the data_package subpackage, the syntax is as follows.
px.data.gapminder()
I’ll give the dataframe a name by assigning it to the variable df, while also filtering the dataframe to include records for the year 2007
df = px.data.gapminder().query('year==2007')
The data looks like this:
data:image/s3,"s3://crabby-images/56b35/56b356bdafdb546f0fc229f8006411cd5ff996ca" alt=""
General Syntax for Plotly Express Sunburst Charts
px.sunburst(data_frame, path, values, parameters)
essential parameters include:
data_frame: this is the DataFrame
path: list of column(s) that define the hierarchy of the slices in the chart
values: values from this column or array_like are used to set values associated to sectors.
There are other useful parameters we will also cover in this article. For the official documentation refer here.
So let's create a Sunburst Chart that will take in the df dataframe and plot it with slices for each country and where the size of each slice will be determined by the countries population.
To do this we require the column 'country' for the path argument and the column 'pop' for the values argument as follows.
px.sunburst(data_frame=df, path=['country'], values='pop')
Because there is only one level of hierarchy we have essentially created a pie chart.
Adding additional columns to your hierarchy
You can add additional columns to the hierarchy to determine the layout of each ring / slice via the path argument. Let's add a continent into the list of values for the path argument and add it before 'country' in the list.
px.sunburst(data_frame=df, path=['continent','country'], values='pop')
So now you can see the outer most rings represent countries (which form the inner most ring). Within each continent ring we have the countries associated to that continent aligned with the inner continent ring.
The chart above is interactive, try selecting a continent and then a country.
Coloring each slice
You can change the color of each ring by using the color argument, let's color the rings based on the population value. We can do this by passing 'pop' in for the color argument.
px.sunburst(data_frame=df, path=['continent','country'], values='pop', color='pop')
So now there is an additional legend with the color code. The brighter yellow slices represent the highest population and the purple colors represent the lower populations.
You can also use the color_continious_scale argument to use a specific color theme - the list of options can be found here. Let's change the color scale to be 'orrd'
px.sunburst(data_frame=df, path=['continent','country'], values='pop', color='pop', color_continuous_scale='orrd')
Let's change it once more to 'rdbu' and this time color the chart based on life expectancy which is the lifeExp column from the dataframe.
px.sunburst(data_frame=df, path=['continent','country'], values='pop', color='lifeExp', color_continuous_scale='rdbu')
You can see the slices are now colored by Life Expectancy.
Conclusion
Sunburst charts are great for visualizing hierarchical data and spotting trends.
However they do have certain limitations - they are unable to display negative values and are not as useful when there is a large variance between values.
When used with the right data set they can be a valuable tool in your arsenal.
Sources
https://plotly.com/python/sunburst/
https://plotly.com/python-api-reference/generated/plotly.express.sunburst.html
https://plotly.com/python/builtin-colorscales/
Commentaires