Cryptocurrencies have captivated the financial world, bringing immense joy to some and heartache to many. One thing is certain: when you get it right and strike gold with cryptocurrencies, it can set you up for life (depending on your frugality and investment amount). As a data scientist, I know the thrill of diving into this dynamic market.
This post, however, isn’t about trading strategies or price forecasting. Instead, it’s a exploring the capabilities of R in web scraping using the rvest
package. Since I like to spend time in creating good visuals, I will be making some interesting visualization here.
Getting Our data
First, I scraped data from CoinMarketCap using the URL https://coinmarketcap.com/all/views/all/. The code extracts a specific table and selects relevant columns like name, symbol, market cap, and price.
Show the code
library(pacman)
p_load(rvest, tidyverse, magick, ggimage)
<- "https://coinmarketcap.com/all/views/all/"
url
<- read_html(url) |>
crypto html_nodes("table") |>
html_table()
<- crypto[[3]]
crypto
head(crypto)
# A tibble: 6 × 1,001
Rank Name Symbol `Market Cap` Price `Circulating Supply` `Volume(24h)`
<int> <chr> <chr> <chr> <chr> <chr> <chr>
1 1 BTCBitcoin BTC $2.02T$2,01… $101… 19,811,250 BTC $59,951,582,…
2 2 ETHEthereum ETH $408.67B$40… $3,3… 120,500,593 ETH * $27,730,173,…
3 3 XRPXRP XRP $186.99B$18… $3.25 57,564,441,898 XRP * $24,012,515,…
4 4 USDTTether… USDT $137.43B$13… $0.9… 137,460,714,171 USD… $131,028,237…
5 5 SOLSolana SOL $104.45B$10… $214… 486,611,359 SOL * $6,401,898,0…
6 6 BNBBNB BNB $103.46B$10… $718… 144,005,881 BNB * $1,945,057,1…
# ℹ 994 more variables: `% 1h` <chr>, `% 24h` <chr>, `% 7d` <chr>, `` <lgl>,
# `` <lgl>, `` <lgl>, `` <lgl>, `` <lgl>, `` <lgl>, `` <lgl>, `` <lgl>,
# `` <lgl>, `` <lgl>, `` <lgl>, `` <lgl>, `` <lgl>, `` <lgl>, `` <lgl>,
# `` <lgl>, `` <lgl>, `` <lgl>, `` <lgl>, `` <lgl>, `` <lgl>, `` <lgl>,
# `` <lgl>, `` <lgl>, `` <lgl>, `` <lgl>, `` <lgl>, `` <lgl>, `` <lgl>,
# `` <lgl>, `` <lgl>, `` <lgl>, `` <lgl>, `` <lgl>, `` <lgl>, `` <lgl>,
# `` <lgl>, `` <lgl>, `` <lgl>, `` <lgl>, `` <lgl>, `` <lgl>, `` <lgl>, …
Data Exploration and Cleaning
I started by cleaning the column names using janitor::clean_names()
and selecting the columns I needed. Then, I ensured data types were appropriate by converting market_cap and price to numeric values.
Show the code
<- crypto |>
crypto ::clean_names() |>
janitorselect(name, symbol, market_cap, price)
glimpse(crypto)
Rows: 200
Columns: 4
$ name <chr> "BTCBitcoin", "ETHEthereum", "XRPXRP", "USDTTether USDt", "…
$ symbol <chr> "BTC", "ETH", "XRP", "USDT", "SOL", "BNB", "DOGE", "USDC", …
$ market_cap <chr> "$2.02T$2,019,406,074,818", "$408.67B$408,666,757,899", "$1…
$ price <chr> "$101,752.81", "$3,386.22", "$3.25", "$0.9998", "$214.65", …
Is the data structure as expected? From the data which we have above, there are some columns that needs their data types changed. The market_cap and price column should be numeric/double data type and not character.
Show the code
<- crypto |>
crypto mutate(
market_cap = str_remove_all(market_cap, r"--[\$[\d.]+[TB]]--"),
market_cap = parse_number(market_cap),
price = parse_number(price)
)
str(crypto)
tibble [200 × 4] (S3: tbl_df/tbl/data.frame)
$ name : chr [1:200] "BTCBitcoin" "ETHEthereum" "XRPXRP" "USDTTether USDt" ...
$ symbol : chr [1:200] "BTC" "ETH" "XRP" "USDT" ...
$ market_cap: num [1:200] 2.02e+12 4.09e+11 1.87e+11 1.37e+11 1.04e+11 ...
$ price : num [1:200] 1.02e+05 3.39e+03 3.25 1.00 2.15e+02 ...
Handling missing Data
I used the skimr
package to identify missing data. The code then filtered the crypto data frame to keep only complete rows with values in all columns.
Show the code
::n_missing(crypto) skimr
[1] 360
Show the code
::skim(crypto) skimr
Name | crypto |
Number of rows | 200 |
Number of columns | 4 |
_______________________ | |
Column type frequency: | |
character | 2 |
numeric | 2 |
________________________ | |
Group variables | None |
Variable type: character
skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
---|---|---|---|---|---|---|---|
name | 0 | 1 | 3 | 37 | 0 | 200 | 0 |
symbol | 0 | 1 | 0 | 4 | 180 | 21 | 0 |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
market_cap | 180 | 0.1 | 1.631606e+11 | 4.470852e+11 | 9476350139 | 1.421979e+10 | 1.898691e+10 | 1.03712e+11 | 2.019406e+12 | ▇▁▁▁▁ |
price | 180 | 0.1 | 5.338840e+03 | 2.270619e+04 | 0 | 8.700000e-01 | 5.180000e+00 | 1.55360e+02 | 1.017528e+05 | ▇▁▁▁▁ |
Show the code
<- crypto[complete.cases(crypto), ]
crypto ::skim_without_charts(crypto) skimr
Name | crypto |
Number of rows | 20 |
Number of columns | 4 |
_______________________ | |
Column type frequency: | |
character | 2 |
numeric | 2 |
________________________ | |
Group variables | None |
Variable type: character
skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
---|---|---|---|---|---|---|---|
name | 0 | 1 | 6 | 15 | 0 | 20 | 0 |
symbol | 0 | 1 | 3 | 4 | 0 | 20 | 0 |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 |
---|---|---|---|---|---|---|---|---|---|
market_cap | 0 | 1 | 1.631606e+11 | 4.470852e+11 | 9476350139 | 1.421979e+10 | 1.898691e+10 | 1.03712e+11 | 2.019406e+12 |
price | 0 | 1 | 5.338840e+03 | 2.270619e+04 | 0 | 8.700000e-01 | 5.180000e+00 | 1.55360e+02 | 1.017528e+05 |
Data Visualization
Now, let’s visualize the data! I created a donut chart to represent the market cap distribution of the top six cryptocurrencies. The remaining currencies are grouped into an “Other” category.
The code calculates the market cap share for each currency, along with cumulative values and labels for the chart.
Show the code
<- crypto |>
crypto mutate(
new_sym = fct_lump(
n = 6, w = market_cap
symbol,
)
)
|>
crypto summarize(
.by = new_sym,
market_cap = sum(market_cap),
count = n()
)
Table 1 shows that the market cap has been compressed into 7, 6 for the top cryptocurrency and 14 lumped together into a new category, Other.
Show the code
<- crypto |>
crypto_summary summarize(
.by = new_sym,
market_cap = sum(market_cap)
|>
) mutate(
prop = market_cap/sum(market_cap) * 100,
market_cap = round(market_cap/1e9, 2),
market_cap = paste0(market_cap, " B"),
ymax = cumsum(prop),
ymin = c(0, head(ymax, n = -1)),
lab_pos = (ymax + ymin)/2,
label = paste0(new_sym, "\nValue: ",round(prop, 2), "%")
)
Crytocurrencies by Market Capitalization
Show the code
ggplot(
crypto_summary,aes(xmin = 3, xmax = 4,ymin = ymin, ymax = ymax, fill = new_sym)
+
) geom_rect() +
expand_limits(x = c(1.5, 4)) +
coord_polar(theta = "y", start = 1) +
scale_fill_brewer(palette ="YlOrRd") +
theme_void() +
::geom_label_repel(
ggrepelx = 3,
aes(y = lab_pos, label = label),
size = 2,
col = "gray3"
+
) theme_void() +
ggtitle("Market Cap of Top Cryptocurrencies") +
theme(
legend.position = "none"
)
As shown in Figure 1 Bitcoin, and ETH are clearly dominating the crypto space in market capitalization. Assets such as BNB, SOL and USDT are slowly increasing their dominance ranging from 3 - 6%.
Top 20 Cryptocurrencies Price
I downloaded logos for the top 20 cryptocurrencies and added them as an “images” column to the data frame.
The code then creates a bar chart to visualize individual cryptocurrency prices, with labels indicating the price for each currency.
Show the code
<- list.files(path = "images", full.names = TRUE)
images
<- crypto |>
crypto arrange(symbol) |>
bind_cols("images" = images)
<- crypto |>
crypto_img mutate(
images = paste0("<img src='", images, "' width='15'/>")
)
Now we can visualize the prices of each asset.
Show the code
|>
crypto_img ggplot(aes(price, fct_reorder(images, price))) +
geom_col(
width = .1,
fill = "#FBD25B"
+
) geom_label(
aes(label = round(price, 2)),
col = "white",
fill = "#AE1D0E",
size = 2.5
+
) labs(
title = "Price of the Top 20 Cryptocurrencies"
+
) scale_x_log10(label = scales::label_number()) +
theme_minimal() +
theme(
axis.text.y = ggtext::element_markdown(),
axis.text.x = element_blank(),
axis.ticks = element_blank(),
axis.title = element_blank(),
plot.title = element_text(hjust = .5, color = "#AE1D0E")
)
Conclusion
In this project, I successfully scraped cryptocurrency data, cleaned it for analysis, and created visualizations to explore market cap distribution and individual cryptocurrency prices. This process demonstrates the power of web scraping and data visualization in R.