Exploring Google Scholar coauthorship


I woke up today to read Maëlle Salmon’s latest blog entry in which she scraped her own mathematical tree. Running through the code I had an idea about scraping the coauthorship list that a Google Scholar profile has. With this, I could visualize the network of coauthorship of important scientists and explore whether they have closed or open collaborations.

I sat down this morning and created the coauthornetwork package that allows you to do just that! It’s actually very simple. First, install it with the usual:


There’s two functions: grab_network and plot_coauthors. The first scrapes and returns a data frame of a Google Scholar profile, their coauthors and the coauthors of their coauthors (what?). More simply, by default, the data frame returns this:

Google Scholar Profile –> Coauthors –> Coauthors

It’s not that hard after all. The only thing you need to provide is the end of the URL of a Google Scholar profile. For example, a typical URL looks like this: https://scholar.google.com/citations?user=F0kCgy8AAAAJ&hl=en. grab_network will accept the latter part of the URL, namely: citations?user=F0kCgy8AAAAJ&hl=en. Let’s test it:


network <- grab_network("citations?user=F0kCgy8AAAAJ&hl=en")
## # A tibble: 21 x 4
##    author       href                 coauthors     coauthors_href          
##    <fct>        <fct>                <fct>         <fct>                   
##  1 Hans-Peter ~ citations?user=F0kC~ Melinda Mills /citations?user=HX9KQ5M~
##  2 Hans-Peter ~ citations?user=F0kC~ Karl Ulrich ~ /citations?user=iuzu9xw~
##  3 Hans-Peter ~ citations?user=F0kC~ Florian Schu~ /citations?user=MWCt6hQ~
##  4 Hans-Peter ~ citations?user=F0kC~ Yossi Shavit  /citations?user=brfWXKM~
##  5 Hans-Peter ~ citations?user=F0kC~ Jan Skopek    /citations?user=Mmo1hFk~
##  6 Melinda Mil~ /citations?user=HX9~ Hans-Peter B~ /citations?user=F0kCgy8~
##  7 Melinda Mil~ /citations?user=HX9~ Tanturri Mar~ /citations?user=xN3XevQ~
##  8 Melinda Mil~ /citations?user=HX9~ René Veenstra /citations?user=_9OVrqM~
##  9 Melinda Mil~ /citations?user=HX9~ Francesco C.~ /citations?user=-JR6yo4~
## 10 Karl Ulrich~ /citations?user=iuz~ Paul B. Balt~ /citations?user=vcOZeDg~
## # ... with 11 more rows

The main author here is Hans-Peter Blossfeld, a well known Sociologist. We also see that Melinda Mills is one of his coauthors, so we also have the coauthors of Melinda Mills right after him. grab_networks also has the n_coauthors argument to control how many coauthors you can extract (limited to 20 by Google Scholar). You’ll notice that once you go over 10 coauthors things start to get very messy when we visualize this.

plot_coauthors(network, size_labels = 3)

Cool eh? We can play around with more coauthors as well.

plot_coauthors(grab_network("citations?user=F0kCgy8AAAAJ&hl=en", n_coauthors = 7), size_labels = 3)

Hope you enjoy it!

comments powered by Disqus