Quantifying Cultural Histories via Person Networks in
Wikipedia
Doron Goldfarb1,2,3 , Dieter Merkl3 , Maximilian Schich1,2
arXiv:1506.06580v1 [cs.SI] 22 Jun 2015
1 School of Arts, Technology, and Emerging Communication
The University of Texas at Dallas, TX, USA
2 Edith O’Donnell Institute of Art History
The University of Texas at Dallas, TX, USA
3 Institute of Software Technology and Interactive Systems
Vienna University of Technology, Austria
doron.goldfarb@gmail.com
dieter.merkl@ec.tuwien.ac.at
maximilian.schich@utdallas.edu
1
Introduction
At least since Priestley’s 1765 Chart of Biography [1], large numbers of individual person records
have been used to illustrate aggregate patterns of cultural history. Wikidata [2], the structured
database sister of Wikipedia, currently contains about 2.7 million explicit person records, across
all language versions of the encyclopedia. These individuals, notable according to Wikipedia
editing criteria, are connected via millions of hyperlinks between their respective Wikipedia articles. This situation provides us with the chance to go beyond the illustration of an idiosyncratic
subset of individuals, as in the case of Priestly.
In this work we summarize the overlap of nationalities and occupations, based on their cooccurrence in Wikidata individuals. We construct networks of co-occurring nationalities and
occupations, provide insights into their respective community structure, and apply the results
to select and color chronologically structured subsets of a large network of individuals, connected by Wikipedia hyperlinks. While the imagined communities [3] of nationality are much
more discrete in terms of co-occurrence than occupations, our quantifications reveal the existing overlap of nationality as much less clear-cut than in case of occupational domains. Our
work contributes to a growing body of research using biographies of notable persons to analyze
cultural processes [4]- [9]
2
Method
In our processing pipeline (cf. Figure 1), we use the Wikidata Toolkit [10] to extract 2.7 million
records about humans (instances of class Q5), in the form of person – property – value triples,
from a downloaded Wikidata json dump (09/02/2015). We focus on the properties country of citizenship (P27) and occupation (P106) (numbers see Table 1A), restricting our analysis to nationalities with at least 10 and occupations with at least 100 occurrences. We construct and project
1
the bipartite person-value affiliation matrices to uni-partite matrices of value-co-occurrence. To
identify relevant co-occurrences, of nationalities or occupations respectively, the projected matrices are compared against a null model. Applying an established approach [11] [12], we derive
expected co-occurrence weights from an ensemble of 10,000 degree-preserving random affiliation
matrices. Co-occurrences with positive Pearson residuals are considered for further analysis
(numbers see Table 1B).
The resulting co-occurrence networks, with residuals as edge weights, are subsequently examined
for community structure using the Louvain method [13] [14]. Detecting communities at different
granularities, we perform modularity optimization at different resolutions [15], resulting in multiple partitions with varying numbers of communities. Using these partitions we can replace the
plain co-occurrence weights in the original value-matrices with the probabilities of two values
mutually co-occurring in the same community. The resulting mutual community matrix (Figures 2 and 3) is then hierarchically clustered, with the resulting tree cut into a preset number
of clusters (Figures 4 and 5). The preset number – 28 for nationalities, and 24 for occupations
– is based on visual inspection of repeated clusterings.
Visualizations of the backbones of the co-occurrence networks (Figures 6 and 7) show the resulting community structure in context. The network backbones are created by iteratively adding
edges with the largest residuals until the maximal giant connected components (GCC) of the
original networks are restored. For comparison, we plot the occurrence of nationalities and
occupations over time, ordered by their first occurrence while disregarding outliers in terms of
ordering (Figures 8 and 9).
Next, the clusters of nationalities and occupations are used to partition Wikipedia biographies
into national community and domain specific sub-sets. Hyperlinks connecting Wikipedia articles about individuals are obtained from DBpedia [16] and filtered to approximate contemporary
relationships by excluding links between individuals with birth dates more than 75 years apart.
Using hyperlinks from the English Wikipedia, we visualize the giant connected component of
the partition of individuals connected to occupations in the community of ”arts, architecture,
crafts, and design” (Figure 10). Colored by nationality cluster (cf. Figures 2,4,6), the visualization connects 22,825 nodes with 78,447 edges. We also visualize the giant connected component
of the partition of individuals connected to nationalities in the community of ”predominantly
English speaking countries” (Figure 11). Colored by occupational domain (cf. Figures 3,5,7)
the visualization connects 160,913 nodes with 1,004,415 edges. While the arts domain (Figure
10) seems to reflect the established narrative of art history where a sequence of nationalities
dominates at different points of time, the predominantly English speaking partition (Figure 11)
is clearly characterized by a more complex structure that excludes the construction of a simple
narrative.
2
3
Conclusion
In sum, we characterize networks of co-occurring nationalities and occupations related to Wikidata individuals. Our quantifications indicate that communities of nations derived from cooccurrence are much more complex than the rather clear-cut communities of occupational domain. This may be due to substantially more complex social processes leading to co-citizenship,
as we observe in (post)colonial ties, due to the potentially vague concept of citizenship/nationality
itself [3], as found in references to bygone and transient national constructs, or due to the considerable difference in the amount of available data (93,661 citizenship vs. 585,407 occupation
co-references). Our approach can be used to group synonyms and attributions of differing granularity, occurring due to the free nature of Wikidata.
Algorithmically mining occupational domains from a large set of individuals, we create an alternative to manually curated meta-domains of occupation, as used in multiple strains of recent
research [5] [8]. Deriving domain specific groups of individuals directly from a crowd-sourced
ecosystem, such as Wikipedia, we also provide a useful alternative (Figure 10) to using expert
curated datasets, such as the Getty Union List of Artist Names [17] as used to analyze the
domain of art history in previous work [9]. Visualizing the Wikipedia hyperlink sub-networks
of such domain specific groups of individuals reveals network patterns that would be obscured
when using the network as a whole.
3
References
[1] J. A. Priestley: Chart of Biography. (London: J. Johnson, 1765)
[2] D. Vrandečić, M. Krötzsch: Wikidata: A Free Collaborative Knowledgebase. Communications of the ACM 57,10 (2014) 78-85
[3] B. R. O. G. Anderson: Imagined communities: Reflections on the origin and spread of
nationalism. (London: Verso, 1991)
[4] S. Ronen, B. Gonçalves, K.Z. Hu, A. Vespignani, S. Pinker, C.A. Hidalgo: Links that speak:
the global language network and its association with global fame. Proc. Natl. Acad. Sci.
111,52 (2014) E5616-E5622
[5] A. Z. Yu, S. Ronen, K. Hu, T. Lu, C. A. Hidalgo: Pantheon: A Dataset for the Study of
Global Cultural Production. ArXiv preprint, arXiv:1502.07310v1 (2015)
[6] M. Klein, P. Konieczny: Gender Gap Through Time and Space: A Journey Through
Wikipedia Biographies and the ”WIGI” Index. ArXiv preprint, arXiv:1502.03086v1 (2015)
[7] Y.-H. Eom, P. Aragon, D. Laniado, A. Kaltenbrunner, S. Gigna and D. L. Shepelyansky:
Interactions of culture and top people of wikipedia from ranking 24 language editions. Plos
ONE, (2014)
[8] M. Schich, C. Song, Y.-Y. Ahn, A. Mirsky, M. Martino, A.-L. Barabási, D. Helbing: A
network framework of cultural history. Science 345, 6196 (2014) 558-562
[9] D. Goldfarb, M. Arends, J. Froschauer, D. Merkl: Art History on Wikipedia, a Macroscopic
Observation, Proceedings of the 3rd Annual ACM Web Science Conference, (2012) 163-168
[10] M. Krötzsch, F. Erxleben, M. Günther, J. Mendez: Wikidata Toolkit: A Java library
for working with Wikidata. http://korrekt.org/talks/2014/wikimania-wikidata-toolkit.pdf,
accessed May 20th, 2015
[11] K. A. Zweig, M. Kaufmann: A systematic approach to the one-mode projection of bipartite
graphs. Social Network Analysis and Mining 1,3 (2011) 187-218
[12] Y. Gu: Ein neues Empfehlungssystem mit FDSM-basierter einseitiger Projektion und Link
Community Clustering. (Thesis: Heidelberg University, 2013)
[13] V. D. Blondel, J.-L. Guillaume, R. Lambiotte, E. Lefebvre: Fast unfolding of communities
in large networks. Journal of Statistical Mechanics: Theory and Experiment 2008,10 (2008)
10008
[14] V.A. Traag: Implementation of the Louvain algorithm for various methods for use with
igraph in python. https://github.com/vtraag/louvain-igraph, accessed May 20th, 2015
[15] J. Reichardt, S. Bornholdt: Partitioning and modularity of graphs with arbitrary degree
distribution. Physical Review E 76, 1 (2007) 015102
[16] S. Auer, C. Bizer, G. Kobilarov, J. Lehmann, R. Cyganiak, Z. Ives: DBpedia: A nucleus for
a web of open data. Proceedings of the 6th International Semantic Web Conference (2007)
722-735
[17] Getty Vocabulary Program: Union List of Artist Names (The J. Paul Getty Trust, Los Angeles, 2010) http://www.getty.edu/research/tools/vocabularies/ulan/, accessed May 20th,
2015
4
Visualization of semantic
co-occurrences
1
0
1
0
0
0
1
0
0
0
0
1
1
0
0
0
Person 5
0
0
0
1
1
Person 6
1
1
0
0
0
Person 4
Triples
0
0
0
1
Affiliation
Matrix
Person to
Occupation/Citizenship
Wikidata / DBpedia
Attr. value 5
Person 1
Person 2
Person 3
JSON Dump
Attr. value 4
Table
to bipartite
matrix
Attr. value 3
Triples
to tables
Attr. value 2
Extraction
of person
properties
Attr. value 1
Network Backbone
Repeated
community detection
at different
resolutions
Fixed Degree
Sequence Model
comparison
Hierarchical
clustering
Mutual community
Matrix
Residual
Matrix
Community
structure
ID mapping
Basic node information /
Hyperlinks
Wikidata / DBpedia hybrid
attribute / hyperlink
dataset
Visualization of Wikipedia
hyperlink network
Figure 1: Data processing pipeline
Table 1: Person links to Nation/Occupation (A) and Nation/Occupation Co-occurrences (B)
A
Raw data
Reduced data
1:1 References
1:n References
B
All
Positive
Backbone
Nationalities
#persons
#P27 links
#nationalities
1,318,484
1,366,777
833
1,317,676
1,365,600
282
1,271,939
1,271,939
282
45,737
93,661
282
#co-occurrences
#nodes
2,100
282
1,565
282
996
282
5
Occupations
#persons
#P106 links
#occupations
1,363,032
1,706,766
3,419
1,352,909
1,685,000
431
1,099,593
1,099,593
431
253,316
585,407
430
#co-occurrences
#nodes
13,846
430
7,641
430
2,964
430
Figure 2: Louvain communities of co-occurring nationalities at different resolutions
Figure 3: Louvain communities of co-occurring occupations at different resolutions
6
et
So
ci
an
al
is
da
tR
to
e
ry
Pe
Pa Ye pub
Re Q
op
le's
le m li
pu in
bli g d I stin en c
Re
pu H c of ynasrae e
o
bli n C st l
c o g K hin y
f C on a
So
uth C hin g
Su hina a
d
S
ud an
Sie
L
r
ib an
A ra
Tur fghanLeon ya
km ista e
enis n
Sau
ta
Pers
d Sy n
ian i Ara ria
E bia
Leb mpire
ano
Jord n
a
Iraqn
Mon Iran
golia
Azerb Arab
aija
Kingdo Palest n
ine
Republi m of Naple
s
c of Flor
en
Papal St ce
ates
Togo
Kingdom of
Bohemia
Czechoslovakia
Austria−Hungary
Slovakia
Czech Republic
So
vi
M
an
en
i
m
go
on ru a s
C au yan pine
N u ilip
G h ta o ica
P al oth Afr
M es th a nd
L ou trali ala
S us Ze
A ew a
N ong oa
T m
e
Saiji
pir
F rma Em
Buritish bwe
ago
B ba na d Tob
Zimtswa d an
Boinida os
Tr rbad
a
B iwan aj
Ta tish R
Bri onesia
Ind gapore te
es
nadin
Sin or−Les
e Gre
Tim
and th
India t Vincent d Nevis
Sain Kitts an
Saint t
Kuwai
Laos s
Maldive
Malaysia
Nepal
Pakistan
Sri Lanka
Thailand
Bangladesh
Tanzania
United Kingdom of Great Britain and Ireland
Kingdom of Great Britain
Kingdom of England
Abbasid calipha
te
Rashidun
Umayya Caliphate
Japan d Caliphate
Ashika
ga
sh
Empire
og
of Ja unate
Tokuga
wa shpan
The
ogunat
An Baham
e
Ancciieent Gre as
e
Byz nt Eg ce
Romantine ypt
Anc an Re Empire
pub
Ro ient
Re man Rom lic
Eg publi Emp e
Eth ypt c of V ire
enic
Ga iopia
e
Kin mb
Ic gd ia
E ela om
N ston nd of Ak
sum
Finorwa ia
S la y
D we nd
G en de
E ra ma n
K rit nd rk
Ko ing rea Duc
S s d
hy
M lo ov om
of
C ac ven o of
Fi
ro e ia
Yu
nla
at do
go
nd
ia ni
sla
a
via
na
vi
go
ze
er
H
ia
ia
ro
d
rb
lav
eg
os
an ro Se
en
ug
ia eg of
nt
fY
o
sn ten m ia
co
Bo on gdo lav nd M
ia bli
M n s a
rb pu
Ki ugo bia onia
Se Re
of ral
Y er ed
ion
S ac ia ality ede
rat
de
M erb cip t F
nfe
S rin ialis
Co ire
P c ia
So lban vo rmanEmp ation hine
A oso Ge an der e R
K or th Rom onfe of th
N oly an C tion lic
H rm era pub
Ge nfedar Re pire
Coeim n Em ny
W rma erma y
Geest G rman xony
W i Ge of Sa
Nazgdom Empire ria
Kin rman of Bava ver
Ge om
ano
d
H
King dom of russia
rg
King dom of P
L nebu
wick−
King ia
Bruns
Pruss
ate of of Lithuania
alth
hy
Elector
uc
monwe
D
Grand ithuanian Com
PolishL
Empire
Austrian
Mauritius
of Tuscany
Grand DuchyHungary
Kingdom of
Kingdom of Italy
Second Polish Republic
Kingdom of LombardyVenetia
Ar
Suriname
Qatar
Central African Repu
blic
Burundi
Comoro
Republics
Democra of the Congo
Djibout tic Republic of
the Cong
Switz i
o
Liec erland
Ger htenstein
Eastmany
G
e
King
rma
Mo dom o ny
Kin ldova f France
Ro gdom
Hunmania of Rom
ania
g
ary
Au
Ha stria
S iti
Nigaint L
Nig eria ucia
R er
Ugwan
M an da
P ala da
Unuer t wi
C ite o
E an d KRic
S ng ad ing o
do
G co lan a
m
W rea tlan d
N a t d
I o les Brit
ain
U rela rthe
ni n rn
te d
Ire
d
St
la
nd
at
es
of
Am
er
ic
a
Netherlands
Bahrain
Luxembourg
Belgium
Flanders
public
Dutch Re nya
Ke
a
Somalico
Moroc
al
Seneg d
Cha
la
Angond
Pola so
a
ina F in
Burk Ben ia
is
TunMali
m
tna
Vie irates
da
m
b E ena tu
Ara Gr anua ds
n
V
ited
la
ia
r
Un
the bod s
Ne am land ia
ern C ther lger co
e A a h
uth
N
n
c
So
the
Mo ren ce
of
F ran es
om
F hell nia
gd
yc ita car
Kin
Se ur as ire
a
M dag 'Ivo bon a
a d a e
M ote G uin na
G ha
C
G
Ja
Sa
m
ua V Nn M It aica
to ati a ar aly
ria ca m in
i
n
Ca l Gu C bia o
i
m in ty
e e
Lit Lib roo a
So Lithu hua eria n
uth an nia
Ko ian
Fa
E
roe cu re
Is ad a
No Por land or
r th tug s
Ko al
re
K
o
M
r a
Guin ozam Brazea
biq il
e
Capa−Biss ue
e Ve au
rde
NicaMexico
ra
Pan gua
Urugama
uay
Bo
Colomlivia
Paragubia
Venezu ay
Catalonela
ia
Peru
Argentina
Chile
Spain
Lübeck
Andorra
Cuba
Costa Rica
El Salvador
Dominican Republica
Dominic
Guatemalas
Honduraia
Zamb
istria
Transn istan
Tajik tan
zs
Kyrgy mpire
E
n
lic
ia
Russ t Repub n
alis
ista
Soci Uzbekpublic
t
e
vi
o
e
's R khazia
ian S
le
ss
p
ru
Ab hstan
Peo
Byelo
ak sia
inian
Kaz t Rusrg
ia
Ukra
o
ie
Sov Ge publice
in
Re
list Ukraatvia
ia
c
L
So
ssia
tive
Ru us
of elar sia
era
d
B
e
m
s
tF
do
Ru nionlic
r
ie
a
v
o
Ts
t U ub c
nS
vie ep bli
ssia
So st R epu ens y
Ru
li R th e
cia n l A urk ce
So ssia ica T ree nia s
iet Ru lass
G me ru s
ov
C
Ar Cyp pru ria
nS
ia
n y a re
er C ulg pi
rain
B Em
r th
Uk
o
N
an
of
m
lic
tto
ub
O
p
Re
h
is
rk
Tu
Eq
Figure 4: Hierarchical clustering of communities of co-occurring nationalities
7
ty
illuminator
club DJ
choir director
clarinetist
violinist
saxophon
ist
disc jock
cellist ey
music
con ologist
jazzductor
sician
oboismu
t
trum
peter
banjoi
st
sing
er
record
orga
oducer
n mprak
rapp
er
mataer
dor
voca
op ralist
mue
s sin
urcich exegcer
ccohm
po m utive
d
p rum serusician
sinianismer
b ge t
mas r−s
s us sist ongw
rite
stroengic
n
wia
r
ri
luth
org ieet ar titer
a r
gban nis st
s uit dle t
r po aris ad
praokugr ts jot er
mix fe ok ur
s
jurik e sioa nalis
t
b d ish d m n
v ea ok i ar t al w
ial res
deolle ch a
ar t tle
f bo nt yb vo
ist r
teieldxerist all plleyb
nn h
la
a
bsawim
ll
o
y
is c
er pla
baice seb m plakey
ye
r
h go d ho a er ye p
ha an lferminckell pla r laye
nd db ton y p ye
r
ba all
p la r
ll pla lay yer
co y
er
e
ac r
h
au
potter
affichiste
comics artist
art director
draughtsman
ratorr
illustigne
graphic des
r
designe
artist
ct
architeris
t
atu
caric
pher
gra
photoen
graverr
strato
cal illu itect
botaniape arch
rapher
or
landscscenogtauratith
smaka
Res
gold
g
mancarvert
d alis r
woomedp
he
gra rsonr
e
lithhotspera
v
ug engsigneerr
drala
te de nn r
perp me pla lpto t
cop costuurban scuoonis
tert
cartpain
r tis
e a atorr
anc nim nce r
orm a da phe r
perf balleeot graanceent r
d ag to
r
cho lent d acntesr t
ta chil esear ti or
r
p
up ditiyu r
ion − ese ce r
e
vis ak film oduacto el
tele m
pr ic od tor r
ion ph m acucetor r
visogra
od ta e t
pr en rit os r
teleorn
m m nw h nteer r
p
il
e
m
f
e
io
e
co cre ad es et ke or
ts s r prupp maect ianor
or
ws p m dir ed ct
sp
ne grailm om e a
o
c
pr f cvoi
be
st
r na
r
te
ye te
r
lif ym
plathle laye
ht g
p
ig stic r
nis a
waerti ce er ten do tball
n
n
fearcbhle wo r foo er
ta aeknoe on layter
r
t ca idir r h p oo
ye
pla er
gr iveuast sh
dsq or r oloplay e
spurleter spselayermpir er
cwa ro l p ll u r lay
e
lauctsaebaplayllo p r
f as ts pa ye
b ar ae pla er
d es dy ep
n
p an lke
b oa ree ersoh yer
er
g fe r tsp ac pla
nag
rsepo d co all
ma e
tb
achtball refeyreer
heaaskeh all cofo
b oac etb on ootbaallll pla
c ask ciati on fo otb nner
b sso ciati on foe ru r
a so ciati nc etito ner
assso distacomp e run
a ng− cs tanc r
er)
lothleti −dis nne
runn
a iddle on ru er
nce
m arath throwort−dista
m velin r (sh
ja rinte
sp rfer
su bsledder
boger
r footballer
lu eton race
les
skelstralian−ru
Auicketer yer
cr gby pla player
ru by union player
rug by league player
rug
football
Gaelicler
netbal
ire
cricket ump
bined skier
ic com
Nord
er
ski jump
cross−country skier
biathlete skier
freestyle
alpine skier
musher
cultural historian
military historian
econom
regional ic historian
an
historian histori
of the Middle Ages
historian of
modern age
libra
numrianatis
historism
ian t
legal
his
tor
arc ist ian
ar hiv
ctural his
egch
ypite
tolog
preh
torian
oriaist
ar ist
n
storia
ar t hiae
anch
olognis
thropo
cano
t
abbo n logist
t
m
th uha
caeolodgdith
m non ian
p inis
haastorter
c ugiogra
clehrg
rch phe
h
V
m ica y istorira
C iss r
n
b ath ion
p ish oli ary
preriestop c prie
st
a
acracrdinche
Ca hb al r
is
th
m
o o h
ppre nk lic op
invarssobyte bish
op
pwa en n r
a hy tc to
m strosichismar
tohistathenomt ker
p o
asstat oloria mat er
civ tr is g n ic
craud il eophticiaist of mian
at
maer ypt io enginysicn
he
et os og n e is
m
al pa ra gin er t
at
lu c p e
ics
rg e he er
is en r
t g
in
ee
r
fig te
ht s
as a er pt pi
tr vi i lo
re
ofonaatolot t
f
s
ig
mo
ur tau c chfice ut r
ep
tiv m
e ra o e r
id
atio e emsc sk te ok f
nateo ioloien ateur
ph p l sprologgistist r
ar ha ch ea is t
a rm em ke t
ne m
uroprocoloacisist r
s fe g t
ne biourgessoisr t
ur lo o
ne
uro suor loggistn
p sc geoist
g hy ien n
bio enestiiciatinst
psynutrcithemcist
psycholoionisist
chia gis t
psy vete
c
tr t
medhothrienariaist
ica rapisn
o
b
write t
moleioinfo ncol lo
r
cula rma gis
ticia t
immr biolo
n
u
g
n
is
milita viroologis t
ry ph logis t
ysicia t
cardio
ophtha
n
lo
lmologgist
ist
omis
phan
ysatlo
gistt
beekioee
per
gecu
oloratort
zoologgis
entomolo ist
t
mycologgis
ist
botanist
naturalist
speleologis
paleontologistt
pteridologist
bryologist
mineralogist
er
t
ne
tis r
gi r
en e ien rive er
g m c
d
in am r r s
wn
in gr ee te r ne r o
t
mpro gin pu rive O rive amcer
en
encomlly d ular dR te rar rliam
e a
ra ormecaCAcycle
F ac S or drivof p t rian
r Aot g er nis to
Nm cin b io is
ist
ra emcat al h
log
m u ic r
ilo
ed edcheer l pher
mea ch ica h
t ea ss op t
t cla ilos is
g r
phfolocato
u du i
e b t nt
rabctivoisunta t
a
is
acacilornomist n st
s gro omticia nti
a con oli scie t
e eop cal gis
g oliti olo t
or
p riminlogis
fess
c ocio t
s en ity pro
donciversyer er
u o pla play r
g hess playe yer
c oker r pla r
p ooke playe er
sn dge
play ser
briau
ghts
dr ess compo
chwer
ro eed skatster
sp ck cycli cyclist
traclo−cross
cy le racer
bicyc
logist
ethno
lorer her
exptog
rap pher
car ano
gra
oce
geographer
e
mountain guid
r
mountainee
gardener
horticulturist
winemaker
ornithologist
pa te cine
gelev m
an isi ato
t on g
ca con di rap
ba l te rec he
e s
an maret ctu tantor r
pr no gicar t rer t
u ia is
e
c
tele
f olu se nc n t
vis railm amnnteer r
ion dio ct ist
blo ac D or
co the drain
mten gg torJ
ntr a
ibutre diratudr aner
rad tingdireectoge t
io reped cto r
pro o ito r
wa
jo d r te r
r co
rr fi urnaucer r
lis
caellsplm
ig ondcritic t
typ praphent
ograrin er
te
mupublispherr
sic her
c
im ly riti
Esppresriacisct
era ri
librentisot
ttis
nove
faculistt
child
lty
ren's poe
write t
r
literar auth
y itior
publcr
c
ist
woman ar t cricitic
of lette
rs
wr
ite
translat r
or
ywright
autobipla
ograph
er
troubadour
literary
tre
specialistthea
in literacritic
ture
publicist
short story writer
lexicographer
linguist
philologist
essayist
chansonnier
salon−holder
Lady−in−waiting
rian
histo
ry
litera
editor
officer
general
politician
er
barrist
ly
mb
se
As ldier
gislative
sotrate
of the Le
r
magis
Member
lawye
man
ce
states
an
Fr
in
anatt
rvant vil serv
ci diplomhe
civil se
r
nctary
anking
ra
−r
o
gh
hi
law n jurisst
ar
civil
mis e
com judagtor
ical
sednmiraelr
polit
a rm ist
fabby r
lo sadopy
s el
bas
am rsonenror
e p e
y p emNobilro
itar
ttie g
mil
do keinign r
n
co ver nke r
so ba fficeon
oers err
tive p ok e
cu ess kbrnag ant r
exe sin tocma ch cie r
ief bu s mer anealerer t
ch
fint d ctu piser
arufahrolay tor r
n nt p c eu
mahila olocollreen wifese r
p p r t p id ur ce y
a tre m n ffi kean r
o c
en
e jo tri e
lic es ain
po equ e tr
rs
ho
Figure 5: Hierarchical clustering of communities of co-occurring occupations
8
Burma
British Empire
Kingdom of England
Kingdom of Great Britain
United Kingdom of Great Britain and Ireland
Zimbabwe
Nauru
Malawi
Lesotho
Botswana
Uganda
Tonga
Samoa
Fiji
South Africa
Wales
Northern Ireland
New Zealand
Trinidad and Tobago
Barbados
Great Britain
Macedonia
Scotland
England
Serbia and Montenegro
Serbia
Bosnia and Herzegovina
United Kingdom
Socialist Federal Republic of Yugoslavia
Kingdom of Yugoslavia
Principality of Serbia
Kosovo
Kingdom of France
Kingdom of Serbia
South Sudan
Guyana
Libya
Macedonia
Kingdom of Bavaria
German Empire
Hong Kong
Malta
Sudan
Pakistan
Sierra Leone
Electorate of Brunswick-L_neburg
Saint Vincent and the Grenadines
British Raj
Sri
Lanka
Republic of China
Kingdom of Saxony
Kingdom of Hanover
Kosovo
Cyprus
Bangladesh
Malaysia
Canada
Qing dynasty
Kingdom of Prussia
Puerto Rico
Tanzania
Thailand
Albania
North German Confederation
Mauritius
Prussia
People's Republic of China
India
Bulgaria
German Empire
Nepal
Greece
China Laos
German Confederation
West Germany
Turkish Republic of Northern Cyprus
Indonesia
Jamaica
Confederation of the Rhine
Timor-Leste
Weimar Republic
East Germany
Denmark
Holy Roman Empire
Saint Lucia
Nazi Germany
Lübeck
Empire of Japan
Kingdom of Bohemia
Nigeria
Haiti
The Bahamas
Kingdom of Aksum
Korea
Austrian Empire
Kingdom of Hungary
Niger
Hungary
South Korea
Ashikaga shogunate
Eritrea
Austria-Hungary
Turkey
Slovakia
North Korea
Japan
Ethiopia
Philippines
Finland
Grand Duchy of Tuscany
Kingdom of LombardyVenetia
Nicaragua
Tokugawa shogunate
Kingdom of Italy
Grand Duchy of Finland
Iceland Norway
Mongolia
Rwanda
Czechoslovakia
Ottoman Empire
Sweden
Ecuador
Ghana
Gambia
Chile
Dominican Republic
Czech Republic
Austria
PolishLithuanian Commonwealth
Vietnam
Yemen
Dominica
Grand Duchy of Lithuania
Papal States
Liechtenstein
Argentina
Bolivia
Switzerland
Kingdom of Naples
San Marino
Ukrainian People's Republic
Paraguay
Zambia
Republic of Florence
Namibia
Cuba
Vatican City
Brazil
Estonia
Egypt
Costa Rica
Uruguay
Second Polish Republic
Mandatory Palestine
Poland
Somalia
Venezuela
Mexico
Togo
Republic of Venice
Armenian Soviet Socialist Republic
El Salvador
Andorra
Colombia
Liberia
Ancient Egypt
Romania
Peru
Spain
Faroe Islands
Luxembourg
Byzantine Empire
Ukrainian Soviet Socialist Republic
Benin
Catalonia
Israel
Guatemala
Ancient Greece
Kenya
Roman Empire
Honduras
Ancient
Rome
Cameroon
Djibouti
Roman Republic
Panama
Equatorial Guinea
Armenia
Jordan
Comoros
Tsardom of Russia
Iran
Lithuanian
Portugal
Iraq
Qatar
Lebanon
Lithuania
Bahrain
Mozambique
Angola
Latvia
Classical Athens
Afghanistan
Suriname
Russian Empire
Cape Verde
Palestine
Moldova
Morocco
Chad
Byelorussian Soviet Socialist Republic
Guinea-Bissau
Azerbaijan
Syria
Democratic Republic of the Congo
Belarus
Arab
Russian Soviet Federative Socialist Republic
Netherlands
Republic of the Congo
Saudi Arabia
Belgium
Russian Republic
Ukraine
Uzbekistan
Senegal
Georgia
Soviet Russia
Soviet Union
Central African Republic
Kingdom of Romania
Algeria
Monaco
Persian Empire
Flanders
Gabon
French
Russia
Mali
Southern Netherlands
Burkina Faso
Dutch Republic
Cote d'Ivoire
Tajikistan
Turkmenistan
Tunisia
United Arab Emirates
Kazakhstan
Kyrgyzstan
Abkhazia
Guinea
Abbasid caliphate
Transnistria
Cambodia
Burundi
Umayyad Caliphate
Vanuatu
Rashidun Caliphate
Madagascar
Seychelles
Grenada
Kingdom of the Netherlands
Taiwan
Maldives
Saint Kitts and Nevis
Congo
Singapore
Ireland
Montenegro
Australia
Slovenia
Kuwait
United States of America
Yugoslavia
Croatia
Germany
Italy
France
Mauritania
Figure 6: Network of national overlap through co-occurrence, colored by community
9
aichiste
cartoonist
comics artist
potter
mangaka lithographer
woodcarver
illustrator
caricaturist
art director
copperplate engraver
graphic designer
illuminator
painter
artist
costume designer
bandleader choir director
animator
oboist conductor
engraver
scenographer
saxophonisttrumpeter church musician
sculptor
draughtsman
musicologist
cellist organist
clarinetist
medalist
jazz musician
pianist
director
printer
lyricist
draughtsperson
vocalist composer
intendant
performance artisttypographer
canon
music critic
calligrapher
violinist
puppeteer
guitaristmusician
canonarchbishop
goldsmith
chansonnier
ilm editor
bishop cardinal
parson
designer
singer-songwriter
theatre director
Catholic bishop
street artistchoreographer
bassist
photographer
dramaturge
cinematographer
librettist
Catholic priest
botanical illustrator
songwriter ballet dancer
publisher
children's writer
troubadour
television director
Vicar
banjoist
luthier
ilm
director
theatre critic
pastor
drummer
publicist
monk abbot
television producerscreenwriter
singer
dancer
art critic
poet
hagiographer
matador
specialist in literature
opera singer
ilm producer
playwright
literary
television actor
rapper
minister
presbyter
ilm actor
radio producer
priest
ilm critic
short story writer editor literary critic
preacher
voice actor
impresario
comedian
translator
lexicographer
theologian
club DJ
publicist
disc jockey
philologist
essayist
cabaret artist
clergy
child actor record producer
novelist
linguist
librarian
church historian
make-up artist
music executive
magicianwar correspondent
architect
literary historian
urban planner
columnist
numismatist
seiyu
program maker
radio DJ
cultural historian
missionary
art dealer
radio
host
woman
of
letters
beauty pageant contestant
landscape architect
historian archivist
Restaurator
television presenterorgan maker
architectural historian regional historian
faculty
model
art historian historian of the Middle Ages
educator
announcer presenter talent agent
muhaddith
reporter
Lady-in-waiting
historian of modern age
rabbi
news presenter
salon-holder
egyptologist
teacher
military historian
autobiographer
archaeologist
Esperantist
blogger
lecturer
pornographic actor
economic historian
classical philologist
prehistorian
educationist
chef
anthropologist
teacher philosopher
cook
legal
historian
art
collector
rakugoka
activist
ufologist emperor
contributing editor
political scientist
surfer
curator
king accountant
sociologist
restaurateur
ethnologist
sovereign
docent
chess composer
barrister
sailor
geopolitician
sports commentator
cartographer
watchmaker
merchant
condottiero
horticulturist
gardener
mountaineer
professional wrestler
chess player
speleologist
philanthropist
magistrate
manager
general oicer
mountain guide economist
rikishi
audio engineer
member of parliament diplomat
poker player
inancier
statesman geographer
jurist
diver
banker Nobile
university professorprofessor
ambassador political commissar
polo player entrepreneurmilitary personnel
criminologist
civil law notary
mixed martial artist
civil
servant
winemaker
businessperson
judge
judoka
civil engineer
canoer
historian of mathematics medical historian
bridge player go player
spy
lawyer
chief executive oicer
explorer
farmer
manufacturer
paleontologist
motivational speaker
high-ranking civil servant in France
programmer
sports journalist
engineer lobbyistMember of the Legislative Assembly psychotherapist geologist
water polo player
pteridologist
stockbroker
admiral
soldier
police
oicer
botanist
rancher
psychologist
igure skater
naturalist
mycologist
draughts player horse trainer
mining engineer
equestrian
boxer
mathematician agronomist
mineralogist
bryologist
cryptographer
aerospace
engineer
jockey
astronomer
referee swimmer
oceanographer
ornithologist
senator
inventor
meteorologist
table tennis player snooker player
rally driver
beekeeper
computer scientist topologist
entomologist
association football referee
squash player
astrophysicist statistician psychiatrist
metallurgist
zoologist
coach badminton player Formula One driver
baseball player
racing driver
oicer
physicist
neurologist
artistic gymnast ield hockey player
NASCAR team owner test pilot
ophthalmologist
lacrosse player
motorcycle racer
baseball umpire
astronaut
scientist
midwife
physician
aviator
racecar driver
ice hockey player basketball coach head coachtennis player
nurse
ighter pilot
bioinformatician
golferGaelic football player
basketball player
futsal player
pharmacist surgeon anatomist
association football manager gridiron football playerAustralian-rules footballer
chemist pharmacologist biologist
league player
sportsperson rugbyrugby
dentist
bandy player
neuroscientist
union player
goalkeeper
netballer
cricketer
veterinarian epidemiologist
neurosurgeon geneticist
handball player pesaepallo player rugby player
handball coach
cardiologistbiochemist
darts player weightlifter cricket umpire
molecular biologist
volleyball player
archer freestyle skier sport shooter
medical writer
physiologist
beach volleyball player
military physician
cyclo-cross cyclist musher
fencer
lugeralpine skier bicycle racer
sprinter (short-distance runner) taekwondo athlete
virologist
nutritionist
javelin thrower
immunologist
bobsledder
curler
track cyclist
skeleton racer
middle-distance runner
athletics competitor
ski jumper speed skater
oncologist
long-distance runner
biathlete
marathon runner Nordic combined skier
cross-country skier
rower
actor
writer
journalist
author
politician
association football player
Figure 7: Network of co-occurring occupations, colored by community
10
Figure 8: Nationalities over time based on person life-spans
11
Figure 9: Occupations over time based on person life-spans
12
Figure 10: Hyperlink network of English Wikipedia biographies having occupations in ”arts,
architecture, crafts and design”, colored by nationality community corresponding to the colors
in figures 2,4,6
13
Figure 11: Hyperlink network of English Wikipedia biographies with a nationality in the ”predominantly english speaking” community, colored by occupation community corresponding to
the colors in figures 3,5,7
14