Exploring echarts4r – Paul Simmering

As web-oriented presentation in R Markdown and Shiny becomes more and more popular, there is increasing demand for interactive graphics with R. Whereas ggplot2 and its vast extension ecosystem is clearly leading in static graphics, there is no one go-to package for interactivity. This article is a tour of echarts4r, an interface with the echarts.js JavaScript library.

There are numerous options for interactive graphics with R:

plotly, an interface to Plotly.js that also works with ggplot2
ggiraph, an extension for ggplot2 that implements interactive geoms
highcharter, an interface to Highcharts
r2d3, an interface to custom graphics with D3
googleVis, an interface to the Google Visualization API

In addition, there are many packages specializing in a type of graph, such as dygraphs (time series) and visNetwork (network graphs).

echarts4r is a relatively new addition (CRAN release was 2018-09-17). It is an R interface with echarts.js, a free JavaScript charting library developed by Baidu and now part of the Apache Foundation.

Why echarts?

Charts look great out of the box, especially the opening animations, tooltips and hover highlighting look great and work on mobile too
While Plotly is optimized for exploratory data visualization by experts, echarts provides simpler interactions for a general audience, similar to Highcharts
It covers almost all chart types imaginable, so there’s no need to switch between packages and have inconsistent styling
echarts.js is highly customizable and it thoroughly documented (see documentation and cheat sheet cheat sheet). There is also a giant library of examples, all with source code
It’s free to use commercially, unlike Highcharts, which otherwise ticks the same boxes
In the development version, echarts4r offers proxies for interaction with Shiny (see https://echarts4r.john-coene.com/articles/shiny.html)

In addition to these advantages, it also offers features not seen in other packages (or at least not in this specific form): Geospatial 3D maps and timelines

In terms of ease of use, I’d put echarts4r in the middle of the pack. The R documentation is easy to follow and has good examples, but it cannot cover every detail, so one has to consult the official echarts documentation frequently. However, in contrast with learning D3 from scratch, this doesn’t take much time, an advantage that GitLab also noted in their comparison. As a long time ggplot2 user, I do miss the in-depth aesthetic mappings of ggplot2, faceting and ease of use when customizing axes. One last thing to keep in mind is that as a recent package and a larger userbase in China than in the West, StackOverflow doesn’t yet have many questions and answers on echarts.

Let’s get started

The remainder of this article is a tour of echarts4r’s features using the nycflights13 dataset. The official package website shows many more types of graphs, including maps, which are not covered here.

library(echarts4r)
library(dplyr)
library(lubridate)
library(nycflights13)
library(stringr)

Set a theme

Like ggplot’s theme_set(), e_common() let’s us set a theme for all plots to come.

e_common(font_family = "helvetica", theme = "westeros")

Bar charts

We start with a classic bar chart. Composition is similar to ggplot’s geom_col() and even e_flip_coords() sounds suspiciously like ggplot’s coord_flip().

A little quirk of the package is that its Chinese origins at Baidu sometimes show through. Here, I gave the save as image button a new English title instead of the Chinese tooltip.

Horizontal bar chart

top_destinations <- flights %>%
    count(dest) %>%
    top_n(15, n) %>%
    arrange(n)

top_destinations %>%
    e_charts(x = dest) %>%
    e_bar(n, legend = FALSE, name = "Flights") %>%
    e_labels(position = "right") %>%
    e_tooltip() %>%
    e_title("Flights by destination", "Top 15 destinations") %>%
    e_flip_coords() %>%
    e_y_axis(splitLine = list(show = FALSE)) %>%
    e_x_axis(show = FALSE) %>%
    e_toolbox_feature(
        feature = "saveAsImage",
        title = "Save as image"
    )

Stacked bar chart

echarts uses grouping with dplyr::group_by() instead of an aes() function like ggplot2 and highcharter.

flights_daytime <- flights %>%
    transmute(origin, daytime = case_when(
        hour >= 22 & hour < 6 ~ "Night",
        hour >= 6 & hour < 12 ~ "Morning",
        hour >= 12 & hour < 18 ~ "Afternoon",
        TRUE ~ "Evening"
    )) %>%
    count(origin, daytime) %>%
    group_by(daytime)
flights_daytime %>%
    e_charts(origin, stack = "grp") %>%
    e_bar(n) %>%
    e_tooltip(
        trigger = "axis",
        axisPointer = list(
            type = "shadow"
        )
    ) %>%
    e_title(
        text = "Outgoing flights by time of day",
        subtext = "There are no night flights"
    ) %>%
    e_y_axis(
        splitArea = list(show = FALSE),
        splitLine = list(show = FALSE)
    )

Scatter plot

I plot arrival delay and departure delay. The original dataset has 336776 rows, which is too much to plot. I simply draw a sample of 1000 rows for the scatterplot and later show the full data in a heatmap.

For 1000 closely clustered points, it doesn’t make much sense to have a tooltip for each of them, so I used spike lines instead (called axis pointers in echarts).

A linear regression model fits the relationship between arrival and departure delay well, so I added a regression line with the convenient e_lm() function.

set.seed(123)
flights_sm <- flights %>%
    filter(complete.cases(.)) %>%
    sample_n(1000)
flights_sm %>%
    e_charts(x = dep_delay) %>%
    e_scatter(arr_delay, name = "Flight") %>%
    e_lm(arr_delay ~ dep_delay, name = "Linear model") %>%
    e_axis_labels(x = "Departure delay", y = "Arrival delay") %>%
    e_title(
        text = "Arrival delay vs. departure delay",
        subtext = "The later you start, the later you finish"
    ) %>%
    e_x_axis(
        nameLocation = "center",
        splitArea = list(show = FALSE),
        axisLabel = list(margin = 3),
        axisPointer = list(
            show = TRUE,
            lineStyle = list(
                color = "#999999",
                width = 0.75,
                type = "dotted"
            )
        )
    ) %>%
    e_y_axis(
        nameLocation = "center",
        splitArea = list(show = FALSE),
        axisLabel = list(margin = 0),
        axisPointer = list(
            show = TRUE,
            lineStyle = list(
                color = "#999999",
                width = 0.75,
                type = "dotted"
            )
        )
    )

n_bins <- 100 # binning
flights %>%
    filter(complete.cases(.)) %>%
    mutate(
        arr_delay = cut(arr_delay, n_bins),
        dep_delay = cut(dep_delay, n_bins)
    ) %>%
    count(arr_delay, dep_delay) %>%
    e_charts(dep_delay) %>%
    e_heatmap(arr_delay, n) %>%
    e_visual_map(n) %>%
    e_title("Arrival delay vs. departure delay") %>%
    e_axis_labels("Departure delay", "Arrival delay")

Pie chart

Pie charts tend to be a bad choice for accurate visualization, but they look nice. Here, the plot shows about even numbers of flights for the three origin airports, but it’s near impossible to tell that EWR has the most flights. Creating pie charts is surprisingly hard in ggplot2, especially when it comes to labeling them. In echarts it’s very easy.

pie <- count(flights, origin) %>%
    e_charts(x = origin) %>%
    e_pie(n, legend = FALSE, name = "Flights") %>%
    e_tooltip() %>%
    e_title("Flights by origin", "This is really hard with ggplot2")
pie

Time series

I need time series graphs for an upcoming project, so that’ll be a focus of this article. Let’s analyze departure delays from all three origin airports.

flights_ts <- flights %>%
    transmute(week = as.Date(cut(time_hour, "week")), dep_delay, origin) %>%
    group_by(origin, week) %>% # works with echarts
    summarise(dep_delay = sum(dep_delay, na.rm = TRUE), .groups = "drop_last")

Regular time series

After much testing, I found that the way Highcharts does time series is the most intuitive and easy to use. The graph has a slider on the bottom for zooming, tooltips for multiple series are collected in one box and points grow when brushing over them.

ts_base <- flights_ts %>%
    e_charts(x = week) %>%
    e_datazoom(
        type = "slider",
        toolbox = FALSE,
        bottom = -5
    ) %>%
    e_tooltip() %>%
    e_title("Departure delays by airport") %>%
    e_x_axis(week, axisPointer = list(show = TRUE))
ts_base %>% e_line(dep_delay)

Stacked area

Switching from line to area graphs is done in one line of code. It’s also possible to reuse chart elements.

area <- ts_base %>% e_area(dep_delay, stack = "grp")
area

Timeline

A standout feature of echarts is the timeline visualization. It’s somewhat similar to gganimate, but instead of videos it outputs an HTMLwidget with controls. Here, I used it to show weekly aggregated departure delays for JFK airport. One thing to look out for is that the timeline view doesn’t provide as much context as the classic time series graphs shown before. However, this focus on a single time period at a time also lends itself to data storytelling.

flights_ts %>%
    filter(origin == "JFK") %>%
    group_by(month = month(week, label = TRUE)) %>%
    e_charts(x = week, timeline = TRUE) %>%
    e_bar(
        dep_delay,
        name = "Departure Delay",
        symbol = "none",
        legend = FALSE
    )

Synchronized plots

Similar to the crosstalk package, echarts allows linking of plots. They share legend, sliders and data zooms. From my point of view, it’s easier to use but not as flexible. Crosstalk allows linking of any compatible HTMLWidgets like Leaflet maps and DT tables, while echarts is limited to echarts itself. When used in Shiny applications, the complex interactions can be handled by the server functions, so echarts can also be linked without limitations.

Grab bag

This final section is a collection of various specialized graphs.

Correlation matrix

The convenient e_correlations() function combines e_heatmap() with corrMatOrder() from the corrplot package. As a specialized function, the original corrplot() function has many more options though, such as only displaying the upper of lower triangle and visualizing the results of statistical significance tests.

cor_data <- flights %>%
    select(arr_delay, dep_delay, air_time, distance, hour) %>%
    filter(complete.cases(.)) %>%
    magrittr::set_colnames(colnames(.) %>% str_replace("_", " ") %>% str_to_title()) %>%
    cor()
cor_data %>%
    e_charts() %>%
    e_correlations(
        visual_map = TRUE,
        order = "hclust",
        inRange = list(color = c("#edafda", "#eeeeee", "#59c4e6")), # scale colors
        itemStyle = list(
            borderWidth = 2,
            borderColor = "#fff"
        )
    ) %>%
    e_title("Correlation")

Wordcloud

Wordclouds fall in the same category as pie charts - a pretty, but imprecise display. A hover effect can add more detail by showing the precise word frequency. Here, I show the 50 top destinations by number of flights.

tf <- flights %>%
    count(dest, sort = TRUE) %>%
    head(50)
tf %>%
    e_color_range(n, color, colors = c("#59c4e6", "#edafda")) %>%
    e_charts() %>%
    e_cloud(
        word = dest,
        freq = n,
        color = color,
        shape = "circle",
        rotationRange = c(0, 0),
        sizeRange = c(8, 100)
    ) %>%
    e_tooltip() %>%
    e_title("Flight destinations")

Wrapup

The charts covered in this article are just a sample of the large variety of capabilities of echarts. There are many more examples on the official site and the echarts4r website. Personally, echarts4r will become my go-to for interactive HTML publications in R Markdown and Shiny. For static graphs I’ll stick with ggplot2 and vast ecosystem of extension packages, and for quick exploratory data analysis plotly’s ggplotly() is by far the easiest tool.