Visualising Using Python Plotting Libraries
You can visualize Python on the Spark driver by using the display(<dataframe-name>)
function.
The following Python libraries are supported:
plotly
matplotlib
seaborn
altair
pygal
leather
Note
The display()
function is supported only on PySpark kernels.
Using plotly
import plotly.express as px
data_canada = px.data.gapminder().query("country == 'Canada'")
fig = px.bar(data_canada, x='year', y='pop')
display(fig)
The following image shows the visualization of the plotly plot.
data:image/s3,"s3://crabby-images/9cfa8/9cfa8c7783f9e4a7f03ce89bafec9519f1032c98" alt="../../../../../_images/plotly.png"
Using matplotlib
import pandas as pd
import matplotlib.pyplot as plt
plt.switch_backend('agg')
sdf = spark.sql("select * from default_qubole_airline_origin_destination limit 10")
data = sdf.toPandas()
data['distance'] = pd.to_numeric(data['distance'], errors='coerce')
data.plot(kind='bar', x='dest', y='distance', color='blue')
display(plt)
The following image shows the visualization of the matplotlib plot.
data:image/s3,"s3://crabby-images/2bc99/2bc9988c7a418dd2700ddd1346da74fccefb883e" alt="../../../../../_images/matplotlib.png"
Using seaborn
import numpy as np
import matplotlib.pyplot as plt
plt.switch_backend('agg')
import seaborn as sns
print(sns)
data = np.random.normal(0, 1, 3)
plt.figure(figsize=(9, 2))
sns.boxplot(x=data);
display(plt)
The following image shows the visualization of the seaborn plot.
data:image/s3,"s3://crabby-images/fa38a/fa38a3b5398d996a73ec772a5303b045e8a6db0a" alt="../../../../../_images/seaborn.png"
Using altair
import altair as alt
import pandas as pd
source = pd.DataFrame({
'a': ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I'],
'b': [28, 55, 43, 91, 81, 53, 19, 87, 52]
})
plt = alt.Chart(source).mark_bar().encode(
x='a',
y='b'
)
The following image shows the visualization of the altair plot.
data:image/s3,"s3://crabby-images/f7ead/f7ead5336403f4af2d538f8e0b0428d537b19f32" alt="../../../../../_images/altair.png"
Using pygal
import pygal
bar_chart = pygal.Bar()
bar_chart.add('Fibonacci', [0, 1, 1, 2, 3, 5, 8])
display(bar_chart)
The following image shows the visualization of the pygal plot.
data:image/s3,"s3://crabby-images/a74c6/a74c657d464684523c97a9a504109b1b745cb504" alt="../../../../../_images/pygal.png"
Using leather
import random
import leather
dot_data = [(random.randint(0, 250), random.randint(0, 250)) for i in range(100)]
def colorizer(d):
return 'rgb(%i, %i, %i)' % (d.x, d.y, 150)
chart = leather.Chart('Colorized dots')
chart.add_dots(dot_data, fill_color=colorizer)
display(chart)
The following image shows the visualization of the leather plot.
data:image/s3,"s3://crabby-images/31cf1/31cf11f11b019cf334e26c330c010fb61c9d566b" alt="../../../../../_images/leather.png"
For other plot types, refer to the PlotExamplesPySpark.ipynb in the Example Notebooks of the Jupyter notebooks.