Webscraping and Visualizing the Top CryptoCurrencies

Top Cryptocurrencies According to Price and Market Cap

Webscraping
Data Visualization
Published

July 4, 2024

Cryptocurrencies have captivated the financial world, bringing immense joy to some and heartache to many. One thing is certain: when you get it right and strike gold with cryptocurrencies, it can set you up for life (depending on your frugality and investment amount). As a data scientist, I know the thrill of diving into this dynamic market.

This post, however, isn’t about trading strategies or price forecasting. Instead, it’s a exploring the capabilities of R in web scraping using the rvest package. Since I like to spend time in creating good visuals, I will be making some interesting visualization here.

Getting Our data

First, I scraped data from CoinMarketCap using the URL https://coinmarketcap.com/all/views/all/. The code extracts a specific table and selects relevant columns like name, symbol, market cap, and price.

Show the code
library(pacman)
p_load(rvest, tidyverse, magick, ggimage)

url <- "https://coinmarketcap.com/all/views/all/"

crypto <- read_html(url) |> 
  html_nodes("table") |> 
  html_table()

crypto <- crypto[[3]]

head(crypto)
# A tibble: 6 × 1,001
   Rank Name        Symbol `Market Cap` Price `Circulating Supply` `Volume(24h)`
  <int> <chr>       <chr>  <chr>        <chr> <chr>                <chr>        
1     1 BTCBitcoin  BTC    $1.34T$1,34… $67,… 19,769,253 BTC       $36,029,579,…
2     2 ETHEthereum ETH    $317.47B$31… $2,6… 120,388,000 ETH *    $16,245,728,…
3     3 USDTTether… USDT   $119.96B$11… $0.9… 120,010,326,614 USD… $59,088,076,…
4     4 BNBBNB      BNB    $87.18B$87,… $597… 145,931,653 BNB *    $1,682,348,8…
5     5 SOLSolana   SOL    $72.15B$72,… $153… 469,830,827 SOL *    $2,087,634,9…
6     6 USDCUSDC    USDC   $34.73B$34,… $0.9… 34,737,755,734 USDC… $5,662,491,0…
# ℹ 994 more variables: `% 1h` <chr>, `% 24h` <chr>, `% 7d` <chr>, `` <lgl>,
#   `` <lgl>, `` <lgl>, `` <lgl>, `` <lgl>, `` <lgl>, `` <lgl>, `` <lgl>,
#   `` <lgl>, `` <lgl>, `` <lgl>, `` <lgl>, `` <lgl>, `` <lgl>, `` <lgl>,
#   `` <lgl>, `` <lgl>, `` <lgl>, `` <lgl>, `` <lgl>, `` <lgl>, `` <lgl>,
#   `` <lgl>, `` <lgl>, `` <lgl>, `` <lgl>, `` <lgl>, `` <lgl>, `` <lgl>,
#   `` <lgl>, `` <lgl>, `` <lgl>, `` <lgl>, `` <lgl>, `` <lgl>, `` <lgl>,
#   `` <lgl>, `` <lgl>, `` <lgl>, `` <lgl>, `` <lgl>, `` <lgl>, `` <lgl>, …

Data Exploration and Cleaning

I started by cleaning the column names using janitor::clean_names() and selecting the columns I needed. Then, I ensured data types were appropriate by converting market_cap and price to numeric values.

Show the code
crypto <- crypto |> 
  janitor::clean_names() |> 
  select(name, symbol, market_cap, price)

glimpse(crypto)
Rows: 200
Columns: 4
$ name       <chr> "BTCBitcoin", "ETHEthereum", "USDTTether USDt", "BNBBNB", "…
$ symbol     <chr> "BTC", "ETH", "USDT", "BNB", "SOL", "USDC", "XRP", "DOGE", …
$ market_cap <chr> "$1.34T$1,341,571,887,495", "$317.47B$317,468,775,558", "$1…
$ price      <chr> "$67,861.54", "$2,637.05", "$0.9996", "$597.39", "$153.56",…

Is the data structure as expected? From the data which we have above, there are some columns that needs their data types changed. The market_cap and price column should be numeric/double data type and not character.

Show the code
crypto <- crypto |> 
  mutate(
    market_cap = str_remove_all(market_cap, r"--[\$[\d.]+[TB]]--"),
    market_cap = parse_number(market_cap),
    price = parse_number(price)
  )

str(crypto)
tibble [200 × 4] (S3: tbl_df/tbl/data.frame)
 $ name      : chr [1:200] "BTCBitcoin" "ETHEthereum" "USDTTether USDt" "BNBBNB" ...
 $ symbol    : chr [1:200] "BTC" "ETH" "USDT" "BNB" ...
 $ market_cap: num [1:200] 1.34e+12 3.17e+11 1.20e+11 8.72e+10 7.21e+10 ...
 $ price     : num [1:200] 67862 2637 1 597 154 ...

Handling missing Data

I used the skimr package to identify missing data. The code then filtered the crypto data frame to keep only complete rows with values in all columns.

Show the code
skimr::n_missing(crypto)
[1] 360
Show the code
skimr::skim(crypto)
Data summary
Name crypto
Number of rows 200
Number of columns 4
_______________________
Column type frequency:
character 2
numeric 2
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
name 0 1 3 37 0 200 0
symbol 0 1 0 4 180 21 0

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
market_cap 180 0.1 1.064536e+11 2.995398e+11 5413880128 6.976564e+09 1.268934e+10 4.408795e+10 1.341572e+12 ▇▁▁▁▁
price 180 0.1 3.587950e+03 1.514005e+04 0 8.900000e-01 5.010000e+00 9.246000e+01 6.786154e+04 ▇▁▁▁▁
Show the code
crypto <- crypto[complete.cases(crypto), ]
skimr::skim_without_charts(crypto)
Data summary
Name crypto
Number of rows 20
Number of columns 4
_______________________
Column type frequency:
character 2
numeric 2
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
name 0 1 6 17 0 20 0
symbol 0 1 3 4 0 20 0

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100
market_cap 0 1 1.064536e+11 2.995398e+11 5413880128 6.976564e+09 1.268934e+10 4.408795e+10 1.341572e+12
price 0 1 3.587950e+03 1.514005e+04 0 8.900000e-01 5.010000e+00 9.246000e+01 6.786154e+04

Data Visualization

Now, let’s visualize the data! I created a donut chart to represent the market cap distribution of the top six cryptocurrencies. The remaining currencies are grouped into an “Other” category.

The code calculates the market cap share for each currency, along with cumulative values and labels for the chart.

Show the code
crypto <- crypto |> 
  mutate(
    new_sym = fct_lump(
      symbol, n = 6, w = market_cap
    )
  )

crypto |> 
  summarize(
    .by = new_sym,
    market_cap = sum(market_cap),
    count = n()
  )
Table 1: Top Six Cryptocurrencies according to Market Capitalization
# A tibble: 7 × 3
  new_sym    market_cap count
  <fct>           <dbl> <int>
1 BTC     1341571887495     1
2 ETH      317468775558     1
3 USDT     119962189339     1
4 BNB       87178499276     1
5 SOL       72148313314     1
6 USDC      34734495731     1
7 Other    156008218228    14

Table 1 shows that the market cap has been compressed into 7, 6 for the top cryptocurrency and 14 lumped together into a new category, Other.

Show the code
crypto_summary <- crypto |> 
  summarize(
    .by = new_sym,
    market_cap = sum(market_cap)
  ) |> 
  mutate(
    prop = market_cap/sum(market_cap) * 100,
    market_cap = round(market_cap/1e9, 2),
    market_cap = paste0(market_cap, " B"),
    ymax = cumsum(prop),
    ymin = c(0, head(ymax, n = -1)),
    lab_pos = (ymax + ymin)/2,
    label = paste0(new_sym, "\nValue: ",round(prop, 2), "%")
  )

Crytocurrencies by Market Capitalization

Show the code
ggplot(
  crypto_summary,
  aes(xmin = 3, xmax = 4,ymin = ymin, ymax = ymax, fill = new_sym)
) +
  geom_rect() +
  expand_limits(x = c(1.5, 4)) +
  coord_polar(theta = "y", start = 1) +
  scale_fill_brewer(palette ="YlOrRd") +
  theme_void() +
  ggrepel::geom_label_repel(
    x = 3,
    aes(y = lab_pos, label = label),
    size = 2,
    col = "gray3"
  ) +
  theme_void() +
  ggtitle("Market Cap of Top Cryptocurrencies") +
  theme(
    legend.position = "none"
  )
Figure 1: Crypto Asset Market Cap

As shown in Figure 1 Bitcoin, and ETH are clearly dominating the crypto space in market capitalization. Assets such as BNB, SOL and USDT are slowly increasing their dominance ranging from 3 - 6%.

Top 20 Cryptocurrencies Price

I downloaded logos for the top 20 cryptocurrencies and added them as an “images” column to the data frame.

The code then creates a bar chart to visualize individual cryptocurrency prices, with labels indicating the price for each currency.

Show the code
images <- list.files(path = "images", full.names = TRUE)

crypto <- crypto |> 
  arrange(symbol) |> 
  bind_cols("images" = images)

crypto_img <- crypto |> 
  mutate(
    images = paste0("<img src='", images, "' width='15'/>")
  )

Now we can visualize the prices of each asset.

Show the code
crypto_img |> 
  ggplot(aes(price, fct_reorder(images, price))) +
  geom_col(
    width = .1,
    fill = "#FBD25B"
  ) +
  geom_label(
    aes(label = round(price, 2)),
    col = "white",
    fill = "#AE1D0E",
    size = 2.5
  ) +
  labs(
    title = "Price of the Top 20 Cryptocurrencies"
  ) +
  scale_x_log10(label = scales::label_number()) +
  theme_minimal() +
  theme(
    axis.text.y = ggtext::element_markdown(),
    axis.text.x = element_blank(),
    axis.ticks = element_blank(),
    axis.title = element_blank(),
    plot.title = element_text(hjust = .5, color = "#AE1D0E")
  )

Conclusion

In this project, I successfully scraped cryptocurrency data, cleaned it for analysis, and created visualizations to explore market cap distribution and individual cryptocurrency prices. This process demonstrates the power of web scraping and data visualization in R.