Academia.eduAcademia.edu
Quantifying Cultural Histories via Person Networks in Wikipedia Doron Goldfarb1,2,3 , Dieter Merkl3 , Maximilian Schich1,2 arXiv:1506.06580v1 [cs.SI] 22 Jun 2015 1 School of Arts, Technology, and Emerging Communication The University of Texas at Dallas, TX, USA 2 Edith O’Donnell Institute of Art History The University of Texas at Dallas, TX, USA 3 Institute of Software Technology and Interactive Systems Vienna University of Technology, Austria doron.goldfarb@gmail.com dieter.merkl@ec.tuwien.ac.at maximilian.schich@utdallas.edu 1 Introduction At least since Priestley’s 1765 Chart of Biography [1], large numbers of individual person records have been used to illustrate aggregate patterns of cultural history. Wikidata [2], the structured database sister of Wikipedia, currently contains about 2.7 million explicit person records, across all language versions of the encyclopedia. These individuals, notable according to Wikipedia editing criteria, are connected via millions of hyperlinks between their respective Wikipedia articles. This situation provides us with the chance to go beyond the illustration of an idiosyncratic subset of individuals, as in the case of Priestly. In this work we summarize the overlap of nationalities and occupations, based on their cooccurrence in Wikidata individuals. We construct networks of co-occurring nationalities and occupations, provide insights into their respective community structure, and apply the results to select and color chronologically structured subsets of a large network of individuals, connected by Wikipedia hyperlinks. While the imagined communities [3] of nationality are much more discrete in terms of co-occurrence than occupations, our quantifications reveal the existing overlap of nationality as much less clear-cut than in case of occupational domains. Our work contributes to a growing body of research using biographies of notable persons to analyze cultural processes [4]- [9] 2 Method In our processing pipeline (cf. Figure 1), we use the Wikidata Toolkit [10] to extract 2.7 million records about humans (instances of class Q5), in the form of person – property – value triples, from a downloaded Wikidata json dump (09/02/2015). We focus on the properties country of citizenship (P27) and occupation (P106) (numbers see Table 1A), restricting our analysis to nationalities with at least 10 and occupations with at least 100 occurrences. We construct and project 1 the bipartite person-value affiliation matrices to uni-partite matrices of value-co-occurrence. To identify relevant co-occurrences, of nationalities or occupations respectively, the projected matrices are compared against a null model. Applying an established approach [11] [12], we derive expected co-occurrence weights from an ensemble of 10,000 degree-preserving random affiliation matrices. Co-occurrences with positive Pearson residuals are considered for further analysis (numbers see Table 1B). The resulting co-occurrence networks, with residuals as edge weights, are subsequently examined for community structure using the Louvain method [13] [14]. Detecting communities at different granularities, we perform modularity optimization at different resolutions [15], resulting in multiple partitions with varying numbers of communities. Using these partitions we can replace the plain co-occurrence weights in the original value-matrices with the probabilities of two values mutually co-occurring in the same community. The resulting mutual community matrix (Figures 2 and 3) is then hierarchically clustered, with the resulting tree cut into a preset number of clusters (Figures 4 and 5). The preset number – 28 for nationalities, and 24 for occupations – is based on visual inspection of repeated clusterings. Visualizations of the backbones of the co-occurrence networks (Figures 6 and 7) show the resulting community structure in context. The network backbones are created by iteratively adding edges with the largest residuals until the maximal giant connected components (GCC) of the original networks are restored. For comparison, we plot the occurrence of nationalities and occupations over time, ordered by their first occurrence while disregarding outliers in terms of ordering (Figures 8 and 9). Next, the clusters of nationalities and occupations are used to partition Wikipedia biographies into national community and domain specific sub-sets. Hyperlinks connecting Wikipedia articles about individuals are obtained from DBpedia [16] and filtered to approximate contemporary relationships by excluding links between individuals with birth dates more than 75 years apart. Using hyperlinks from the English Wikipedia, we visualize the giant connected component of the partition of individuals connected to occupations in the community of ”arts, architecture, crafts, and design” (Figure 10). Colored by nationality cluster (cf. Figures 2,4,6), the visualization connects 22,825 nodes with 78,447 edges. We also visualize the giant connected component of the partition of individuals connected to nationalities in the community of ”predominantly English speaking countries” (Figure 11). Colored by occupational domain (cf. Figures 3,5,7) the visualization connects 160,913 nodes with 1,004,415 edges. While the arts domain (Figure 10) seems to reflect the established narrative of art history where a sequence of nationalities dominates at different points of time, the predominantly English speaking partition (Figure 11) is clearly characterized by a more complex structure that excludes the construction of a simple narrative. 2 3 Conclusion In sum, we characterize networks of co-occurring nationalities and occupations related to Wikidata individuals. Our quantifications indicate that communities of nations derived from cooccurrence are much more complex than the rather clear-cut communities of occupational domain. This may be due to substantially more complex social processes leading to co-citizenship, as we observe in (post)colonial ties, due to the potentially vague concept of citizenship/nationality itself [3], as found in references to bygone and transient national constructs, or due to the considerable difference in the amount of available data (93,661 citizenship vs. 585,407 occupation co-references). Our approach can be used to group synonyms and attributions of differing granularity, occurring due to the free nature of Wikidata. Algorithmically mining occupational domains from a large set of individuals, we create an alternative to manually curated meta-domains of occupation, as used in multiple strains of recent research [5] [8]. Deriving domain specific groups of individuals directly from a crowd-sourced ecosystem, such as Wikipedia, we also provide a useful alternative (Figure 10) to using expert curated datasets, such as the Getty Union List of Artist Names [17] as used to analyze the domain of art history in previous work [9]. Visualizing the Wikipedia hyperlink sub-networks of such domain specific groups of individuals reveals network patterns that would be obscured when using the network as a whole. 3 References [1] J. A. Priestley: Chart of Biography. (London: J. Johnson, 1765) [2] D. Vrandečić, M. Krötzsch: Wikidata: A Free Collaborative Knowledgebase. Communications of the ACM 57,10 (2014) 78-85 [3] B. R. O. G. Anderson: Imagined communities: Reflections on the origin and spread of nationalism. (London: Verso, 1991) [4] S. Ronen, B. Gonçalves, K.Z. Hu, A. Vespignani, S. Pinker, C.A. Hidalgo: Links that speak: the global language network and its association with global fame. Proc. Natl. Acad. Sci. 111,52 (2014) E5616-E5622 [5] A. Z. Yu, S. Ronen, K. Hu, T. Lu, C. A. Hidalgo: Pantheon: A Dataset for the Study of Global Cultural Production. ArXiv preprint, arXiv:1502.07310v1 (2015) [6] M. Klein, P. Konieczny: Gender Gap Through Time and Space: A Journey Through Wikipedia Biographies and the ”WIGI” Index. ArXiv preprint, arXiv:1502.03086v1 (2015) [7] Y.-H. Eom, P. Aragon, D. Laniado, A. Kaltenbrunner, S. Gigna and D. L. Shepelyansky: Interactions of culture and top people of wikipedia from ranking 24 language editions. Plos ONE, (2014) [8] M. Schich, C. Song, Y.-Y. Ahn, A. Mirsky, M. Martino, A.-L. Barabási, D. Helbing: A network framework of cultural history. Science 345, 6196 (2014) 558-562 [9] D. Goldfarb, M. Arends, J. Froschauer, D. Merkl: Art History on Wikipedia, a Macroscopic Observation, Proceedings of the 3rd Annual ACM Web Science Conference, (2012) 163-168 [10] M. Krötzsch, F. Erxleben, M. Günther, J. Mendez: Wikidata Toolkit: A Java library for working with Wikidata. http://korrekt.org/talks/2014/wikimania-wikidata-toolkit.pdf, accessed May 20th, 2015 [11] K. A. Zweig, M. Kaufmann: A systematic approach to the one-mode projection of bipartite graphs. Social Network Analysis and Mining 1,3 (2011) 187-218 [12] Y. Gu: Ein neues Empfehlungssystem mit FDSM-basierter einseitiger Projektion und Link Community Clustering. (Thesis: Heidelberg University, 2013) [13] V. D. Blondel, J.-L. Guillaume, R. Lambiotte, E. Lefebvre: Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment 2008,10 (2008) 10008 [14] V.A. Traag: Implementation of the Louvain algorithm for various methods for use with igraph in python. https://github.com/vtraag/louvain-igraph, accessed May 20th, 2015 [15] J. Reichardt, S. Bornholdt: Partitioning and modularity of graphs with arbitrary degree distribution. Physical Review E 76, 1 (2007) 015102 [16] S. Auer, C. Bizer, G. Kobilarov, J. Lehmann, R. Cyganiak, Z. Ives: DBpedia: A nucleus for a web of open data. Proceedings of the 6th International Semantic Web Conference (2007) 722-735 [17] Getty Vocabulary Program: Union List of Artist Names (The J. Paul Getty Trust, Los Angeles, 2010) http://www.getty.edu/research/tools/vocabularies/ulan/, accessed May 20th, 2015 4 Visualization of semantic co-occurrences 1 0 1 0 0 0 1 0 0 0 0 1 1 0 0 0 Person 5 0 0 0 1 1 Person 6 1 1 0 0 0 Person 4 Triples 0 0 0 1 Affiliation Matrix Person to Occupation/Citizenship Wikidata / DBpedia Attr. value 5 Person 1 Person 2 Person 3 JSON Dump Attr. value 4 Table to bipartite matrix Attr. value 3 Triples to tables Attr. value 2 Extraction of person properties Attr. value 1 Network Backbone Repeated community detection at different resolutions Fixed Degree Sequence Model comparison Hierarchical clustering Mutual community Matrix Residual Matrix Community structure ID mapping Basic node information / Hyperlinks Wikidata / DBpedia hybrid attribute / hyperlink dataset Visualization of Wikipedia hyperlink network Figure 1: Data processing pipeline Table 1: Person links to Nation/Occupation (A) and Nation/Occupation Co-occurrences (B) A Raw data Reduced data 1:1 References 1:n References B All Positive Backbone Nationalities #persons #P27 links #nationalities 1,318,484 1,366,777 833 1,317,676 1,365,600 282 1,271,939 1,271,939 282 45,737 93,661 282 #co-occurrences #nodes 2,100 282 1,565 282 996 282 5 Occupations #persons #P106 links #occupations 1,363,032 1,706,766 3,419 1,352,909 1,685,000 431 1,099,593 1,099,593 431 253,316 585,407 430 #co-occurrences #nodes 13,846 430 7,641 430 2,964 430 Figure 2: Louvain communities of co-occurring nationalities at different resolutions Figure 3: Louvain communities of co-occurring occupations at different resolutions 6 et So ci an al is da tR to e ry Pe Pa Ye pub Re Q op le's le m li pu in bli g d I stin en c Re pu H c of ynasrae e o bli n C st l c o g K hin y f C on a So uth C hin g Su hina a d S ud an Sie L r ib an A ra Tur fghanLeon ya km ista e enis n Sau ta Pers d Sy n ian i Ara ria E bia Leb mpire ano Jord n a Iraqn Mon Iran golia Azerb Arab aija Kingdo Palest n ine Republi m of Naple s c of Flor en Papal St ce ates Togo Kingdom of Bohemia Czechoslovakia Austria−Hungary Slovakia Czech Republic So vi M an en i m go on ru a s C au yan pine N u ilip G h ta o ica P al oth Afr M es th a nd L ou trali ala S us Ze A ew a N ong oa T m e Saiji pir F rma Em Buritish bwe ago B ba na d Tob Zimtswa d an Boinida os Tr rbad a B iwan aj Ta tish R Bri onesia Ind gapore te es nadin Sin or−Les e Gre Tim and th India t Vincent d Nevis Sain Kitts an Saint t Kuwai Laos s Maldive Malaysia Nepal Pakistan Sri Lanka Thailand Bangladesh Tanzania United Kingdom of Great Britain and Ireland Kingdom of Great Britain Kingdom of England Abbasid calipha te Rashidun Umayya Caliphate Japan d Caliphate Ashika ga sh Empire og of Ja unate Tokuga wa shpan The ogunat An Baham e Ancciieent Gre as e Byz nt Eg ce Romantine ypt Anc an Re Empire pub Ro ient Re man Rom lic Eg publi Emp e Eth ypt c of V ire enic Ga iopia e Kin mb Ic gd ia E ela om N ston nd of Ak sum Finorwa ia S la y D we nd G en de E ra ma n K rit nd rk Ko ing rea Duc S s d hy M lo ov om of C ac ven o of Fi ro e ia Yu nla at do go nd ia ni sla a via na vi go ze er H ia ia ro d rb lav eg os an ro Se en ug ia eg of nt fY o sn ten m ia co Bo on gdo lav nd M ia bli M n s a rb pu Ki ugo bia onia Se Re of ral Y er ed ion S ac ia ality ede rat de M erb cip t F nfe S rin ialis Co ire P c ia So lban vo rmanEmp ation hine A oso Ge an der e R K or th Rom onfe of th N oly an C tion lic H rm era pub Ge nfedar Re pire Coeim n Em ny W rma erma y Geest G rman xony W i Ge of Sa Nazgdom Empire ria Kin rman of Bava ver Ge om ano d H King dom of russia rg King dom of P L nebu wick− King ia Bruns Pruss ate of of Lithuania alth hy Elector uc monwe D Grand ithuanian Com PolishL Empire Austrian Mauritius of Tuscany Grand DuchyHungary Kingdom of Kingdom of Italy Second Polish Republic Kingdom of LombardyVenetia Ar Suriname Qatar Central African Repu blic Burundi Comoro Republics Democra of the Congo Djibout tic Republic of the Cong Switz i o Liec erland Ger htenstein Eastmany G e King rma Mo dom o ny Kin ldova f France Ro gdom Hunmania of Rom ania g ary Au Ha stria S iti Nigaint L Nig eria ucia R er Ugwan M an da P ala da Unuer t wi C ite o E an d KRic S ng ad ing o do G co lan a m W rea tlan d N a t d I o les Brit ain U rela rthe ni n rn te d Ire d St la nd at es of Am er ic a Netherlands Bahrain Luxembourg Belgium Flanders public Dutch Re nya Ke a Somalico Moroc al Seneg d Cha la Angond Pola so a ina F in Burk Ben ia is TunMali m tna Vie irates da m b E ena tu Ara Gr anua ds n V ited la ia r Un the bod s Ne am land ia ern C ther lger co e A a h uth N n c So the Mo ren ce of F ran es om F hell nia gd yc ita car Kin Se ur as ire a M dag 'Ivo bon a a d a e M ote G uin na G ha C G Ja Sa m ua V Nn M It aica to ati a ar aly ria ca m in i n Ca l Gu C bia o i m in ty e e Lit Lib roo a So Lithu hua eria n uth an nia Ko ian Fa E roe cu re Is ad a No Por land or r th tug s Ko al re K o M r a Guin ozam Brazea biq il e Capa−Biss ue e Ve au rde NicaMexico ra Pan gua Urugama uay Bo Colomlivia Paragubia Venezu ay Catalonela ia Peru Argentina Chile Spain Lübeck Andorra Cuba Costa Rica El Salvador Dominican Republica Dominic Guatemalas Honduraia Zamb istria Transn istan Tajik tan zs Kyrgy mpire E n lic ia Russ t Repub n alis ista Soci Uzbekpublic t e vi o e 's R khazia ian S le ss p ru Ab hstan Peo Byelo ak sia inian Kaz t Rusrg ia Ukra o ie Sov Ge publice in Re list Ukraatvia ia c L So ssia tive Ru us of elar sia era d B e m s tF do Ru nionlic r ie a v o Ts t U ub c nS vie ep bli ssia So st R epu ens y Ru li R th e cia n l A urk ce So ssia ica T ree nia s iet Ru lass G me ru s ov C Ar Cyp pru ria nS ia n y a re er C ulg pi rain B Em r th Uk o N an of m lic tto ub O p Re h is rk Tu Eq Figure 4: Hierarchical clustering of communities of co-occurring nationalities 7 ty illuminator club DJ choir director clarinetist violinist saxophon ist disc jock cellist ey music con ologist jazzductor sician oboismu t trum peter banjoi st sing er record orga oducer n mprak rapp er mataer dor voca op ralist mue s sin urcich exegcer ccohm po m utive d p rum serusician sinianismer b ge t mas r−s s us sist ongw rite stroengic n wia r ri luth org ieet ar titer a r gban nis st s uit dle t r po aris ad praokugr ts jot er mix fe ok ur s jurik e sioa nalis t b d ish d m n v ea ok i ar t al w ial res deolle ch a ar t tle f bo nt yb vo ist r teieldxerist all plleyb nn h la a bsawim ll o y is c er pla baice seb m plakey ye r h go d ho a er ye p ha an lferminckell pla r laye nd db ton y p ye r ba all p la r ll pla lay yer co y er e ac r h au potter affichiste comics artist art director draughtsman ratorr illustigne graphic des r designe artist ct architeris t atu caric pher gra photoen graverr strato cal illu itect botaniape arch rapher or landscscenogtauratith smaka Res gold g mancarvert d alis r woomedp he gra rsonr e lithhotspera v ug engsigneerr drala te de nn r perp me pla lpto t cop costuurban scuoonis tert cartpain r tis e a atorr anc nim nce r orm a da phe r perf balleeot graanceent r d ag to r cho lent d acntesr t ta chil esear ti or r p up ditiyu r ion − ese ce r e vis ak film oduacto el tele m pr ic od tor r ion ph m acucetor r visogra od ta e t pr en rit os r teleorn m m nw h nteer r p il e m f e io e co cre ad es et ke or ts s r prupp maect ianor or ws p m dir ed ct sp ne grailm om e a o c pr f cvoi be st r na r te ye te r lif ym plathle laye ht g p ig stic r nis a waerti ce er ten do tball n n fearcbhle wo r foo er ta aeknoe on layter r t ca idir r h p oo ye pla er gr iveuast sh dsq or r oloplay e spurleter spselayermpir er cwa ro l p ll u r lay e lauctsaebaplayllo p r f as ts pa ye b ar ae pla er d es dy ep n p an lke b oa ree ersoh yer er g fe r tsp ac pla nag rsepo d co all ma e tb achtball refeyreer heaaskeh all cofo b oac etb on ootbaallll pla c ask ciati on fo otb nner b sso ciati on foe ru r a so ciati nc etito ner assso distacomp e run a ng− cs tanc r er) lothleti −dis nne runn a iddle on ru er nce m arath throwort−dista m velin r (sh ja rinte sp rfer su bsledder boger r footballer lu eton race les skelstralian−ru Auicketer yer cr gby pla player ru by union player rug by league player rug football Gaelicler netbal ire cricket ump bined skier ic com Nord er ski jump cross−country skier biathlete skier freestyle alpine skier musher cultural historian military historian econom regional ic historian an historian histori of the Middle Ages historian of modern age libra numrianatis historism ian t legal his tor arc ist ian ar hiv ctural his egch ypite tolog preh torian oriaist ar ist n storia ar t hiae anch olognis thropo cano t abbo n logist t m th uha caeolodgdith m non ian p inis haastorter c ugiogra clehrg rch phe h V m ica y istorira C iss r n b ath ion p ish oli ary preriestop c prie st a acracrdinche Ca hb al r is th m o o h ppre nk lic op invarssobyte bish op pwa en n r a hy tc to m strosichismar tohistathenomt ker p o asstat oloria mat er civ tr is g n ic craud il eophticiaist of mian at maer ypt io enginysicn he et os og n e is m al pa ra gin er t at lu c p e ics rg e he er is en r t g in ee r fig te ht s as a er pt pi tr vi i lo re ofonaatolot t f s ig mo ur tau c chfice ut r ep tiv m e ra o e r id atio e emsc sk te ok f nateo ioloien ateur ph p l sprologgistist r ar ha ch ea is t a rm em ke t ne m uroprocoloacisist r s fe g t ne biourgessoisr t ur lo o ne uro suor loggistn p sc geoist g hy ien n bio enestiiciatinst psynutrcithemcist psycholoionisist chia gis t psy vete c tr t medhothrienariaist ica rapisn o b write t moleioinfo ncol lo r cula rma gis ticia t immr biolo n u g n is milita viroologis t ry ph logis t ysicia t cardio ophtha n lo lmologgist ist omis phan ysatlo gistt beekioee per gecu oloratort zoologgis entomolo ist t mycologgis ist botanist naturalist speleologis paleontologistt pteridologist bryologist mineralogist er t ne tis r gi r en e ien rive er g m c d in am r r s wn in gr ee te r ne r o t mpro gin pu rive O rive amcer en encomlly d ular dR te rar rliam e a ra ormecaCAcycle F ac S or drivof p t rian r Aot g er nis to Nm cin b io is ist ra emcat al h log m u ic r ilo ed edcheer l pher mea ch ica h t ea ss op t t cla ilos is g r phfolocato u du i e b t nt rabctivoisunta t a is acacilornomist n st s gro omticia nti a con oli scie t e eop cal gis g oliti olo t or p riminlogis fess c ocio t s en ity pro donciversyer er u o pla play r g hess playe yer c oker r pla r p ooke playe er sn dge play ser briau ghts dr ess compo chwer ro eed skatster sp ck cycli cyclist traclo−cross cy le racer bicyc logist ethno lorer her exptog rap pher car ano gra oce geographer e mountain guid r mountainee gardener horticulturist winemaker ornithologist pa te cine gelev m an isi ato t on g ca con di rap ba l te rec he e s an maret ctu tantor r pr no gicar t rer t u ia is e c tele f olu se nc n t vis railm amnnteer r ion dio ct ist blo ac D or co the drain mten gg torJ ntr a ibutre diratudr aner rad tingdireectoge t io reped cto r pro o ito r wa jo d r te r r co rr fi urnaucer r lis caellsplm ig ondcritic t typ praphent ograrin er te mupublispherr sic her c im ly riti Esppresriacisct era ri librentisot ttis nove faculistt child lty ren's poe write t r literar auth y itior publcr c ist woman ar t cricitic of lette rs wr ite translat r or ywright autobipla ograph er troubadour literary tre specialistthea in literacritic ture publicist short story writer lexicographer linguist philologist essayist chansonnier salon−holder Lady−in−waiting rian histo ry litera editor officer general politician er barrist ly mb se As ldier gislative sotrate of the Le r magis Member lawye man ce states an Fr in anatt rvant vil serv ci diplomhe civil se r nctary anking ra −r o gh hi law n jurisst ar civil mis e com judagtor ical sednmiraelr polit a rm ist fabby r lo sadopy s el bas am rsonenror e p e y p emNobilro itar ttie g mil do keinign r n co ver nke r so ba fficeon oers err tive p ok e cu ess kbrnag ant r exe sin tocma ch cie r ief bu s mer anealerer t ch fint d ctu piser arufahrolay tor r n nt p c eu mahila olocollreen wifese r p p r t p id ur ce y a tre m n ffi kean r o c en e jo tri e lic es ain po equ e tr rs ho Figure 5: Hierarchical clustering of communities of co-occurring occupations 8 Burma British Empire Kingdom of England Kingdom of Great Britain United Kingdom of Great Britain and Ireland Zimbabwe Nauru Malawi Lesotho Botswana Uganda Tonga Samoa Fiji South Africa Wales Northern Ireland New Zealand Trinidad and Tobago Barbados Great Britain Macedonia Scotland England Serbia and Montenegro Serbia Bosnia and Herzegovina United Kingdom Socialist Federal Republic of Yugoslavia Kingdom of Yugoslavia Principality of Serbia Kosovo Kingdom of France Kingdom of Serbia South Sudan Guyana Libya Macedonia Kingdom of Bavaria German Empire Hong Kong Malta Sudan Pakistan Sierra Leone Electorate of Brunswick-L_neburg Saint Vincent and the Grenadines British Raj Sri Lanka Republic of China Kingdom of Saxony Kingdom of Hanover Kosovo Cyprus Bangladesh Malaysia Canada Qing dynasty Kingdom of Prussia Puerto Rico Tanzania Thailand Albania North German Confederation Mauritius Prussia People's Republic of China India Bulgaria German Empire Nepal Greece China Laos German Confederation West Germany Turkish Republic of Northern Cyprus Indonesia Jamaica Confederation of the Rhine Timor-Leste Weimar Republic East Germany Denmark Holy Roman Empire Saint Lucia Nazi Germany Lübeck Empire of Japan Kingdom of Bohemia Nigeria Haiti The Bahamas Kingdom of Aksum Korea Austrian Empire Kingdom of Hungary Niger Hungary South Korea Ashikaga shogunate Eritrea Austria-Hungary Turkey Slovakia North Korea Japan Ethiopia Philippines Finland Grand Duchy of Tuscany Kingdom of LombardyVenetia Nicaragua Tokugawa shogunate Kingdom of Italy Grand Duchy of Finland Iceland Norway Mongolia Rwanda Czechoslovakia Ottoman Empire Sweden Ecuador Ghana Gambia Chile Dominican Republic Czech Republic Austria PolishLithuanian Commonwealth Vietnam Yemen Dominica Grand Duchy of Lithuania Papal States Liechtenstein Argentina Bolivia Switzerland Kingdom of Naples San Marino Ukrainian People's Republic Paraguay Zambia Republic of Florence Namibia Cuba Vatican City Brazil Estonia Egypt Costa Rica Uruguay Second Polish Republic Mandatory Palestine Poland Somalia Venezuela Mexico Togo Republic of Venice Armenian Soviet Socialist Republic El Salvador Andorra Colombia Liberia Ancient Egypt Romania Peru Spain Faroe Islands Luxembourg Byzantine Empire Ukrainian Soviet Socialist Republic Benin Catalonia Israel Guatemala Ancient Greece Kenya Roman Empire Honduras Ancient Rome Cameroon Djibouti Roman Republic Panama Equatorial Guinea Armenia Jordan Comoros Tsardom of Russia Iran Lithuanian Portugal Iraq Qatar Lebanon Lithuania Bahrain Mozambique Angola Latvia Classical Athens Afghanistan Suriname Russian Empire Cape Verde Palestine Moldova Morocco Chad Byelorussian Soviet Socialist Republic Guinea-Bissau Azerbaijan Syria Democratic Republic of the Congo Belarus Arab Russian Soviet Federative Socialist Republic Netherlands Republic of the Congo Saudi Arabia Belgium Russian Republic Ukraine Uzbekistan Senegal Georgia Soviet Russia Soviet Union Central African Republic Kingdom of Romania Algeria Monaco Persian Empire Flanders Gabon French Russia Mali Southern Netherlands Burkina Faso Dutch Republic Cote d'Ivoire Tajikistan Turkmenistan Tunisia United Arab Emirates Kazakhstan Kyrgyzstan Abkhazia Guinea Abbasid caliphate Transnistria Cambodia Burundi Umayyad Caliphate Vanuatu Rashidun Caliphate Madagascar Seychelles Grenada Kingdom of the Netherlands Taiwan Maldives Saint Kitts and Nevis Congo Singapore Ireland Montenegro Australia Slovenia Kuwait United States of America Yugoslavia Croatia Germany Italy France Mauritania Figure 6: Network of national overlap through co-occurrence, colored by community 9 aichiste cartoonist comics artist potter mangaka lithographer woodcarver illustrator caricaturist art director copperplate engraver graphic designer illuminator painter artist costume designer bandleader choir director animator oboist conductor engraver scenographer saxophonisttrumpeter church musician sculptor draughtsman musicologist cellist organist clarinetist medalist jazz musician pianist director printer lyricist draughtsperson vocalist composer intendant performance artisttypographer canon music critic calligrapher violinist puppeteer guitaristmusician canonarchbishop goldsmith chansonnier ilm editor bishop cardinal parson designer singer-songwriter theatre director Catholic bishop street artistchoreographer bassist photographer dramaturge cinematographer librettist Catholic priest botanical illustrator songwriter ballet dancer publisher children's writer troubadour television director Vicar banjoist luthier ilm director theatre critic pastor drummer publicist monk abbot television producerscreenwriter singer dancer art critic poet hagiographer matador specialist in literature opera singer ilm producer playwright literary television actor rapper minister presbyter ilm actor radio producer priest ilm critic short story writer editor literary critic preacher voice actor impresario comedian translator lexicographer theologian club DJ publicist disc jockey philologist essayist cabaret artist clergy child actor record producer novelist linguist librarian church historian make-up artist music executive magicianwar correspondent architect literary historian urban planner columnist numismatist seiyu program maker radio DJ cultural historian missionary art dealer radio host woman of letters beauty pageant contestant landscape architect historian archivist Restaurator television presenterorgan maker architectural historian regional historian faculty model art historian historian of the Middle Ages educator announcer presenter talent agent muhaddith reporter Lady-in-waiting historian of modern age rabbi news presenter salon-holder egyptologist teacher military historian autobiographer archaeologist Esperantist blogger lecturer pornographic actor economic historian classical philologist prehistorian educationist chef anthropologist teacher philosopher cook legal historian art collector rakugoka activist ufologist emperor contributing editor political scientist surfer curator king accountant sociologist restaurateur ethnologist sovereign docent chess composer barrister sailor geopolitician sports commentator cartographer watchmaker merchant condottiero horticulturist gardener mountaineer professional wrestler chess player speleologist philanthropist magistrate manager general oicer mountain guide economist rikishi audio engineer member of parliament diplomat poker player inancier statesman geographer jurist diver banker Nobile university professorprofessor ambassador political commissar polo player entrepreneurmilitary personnel criminologist civil law notary mixed martial artist civil servant winemaker businessperson judge judoka civil engineer canoer historian of mathematics medical historian bridge player go player spy lawyer chief executive oicer explorer farmer manufacturer paleontologist motivational speaker high-ranking civil servant in France programmer sports journalist engineer lobbyistMember of the Legislative Assembly psychotherapist geologist water polo player pteridologist stockbroker admiral soldier police oicer botanist rancher psychologist igure skater naturalist mycologist draughts player horse trainer mining engineer equestrian boxer mathematician agronomist mineralogist bryologist cryptographer aerospace engineer jockey astronomer referee swimmer oceanographer ornithologist senator inventor meteorologist table tennis player snooker player rally driver beekeeper computer scientist topologist entomologist association football referee squash player astrophysicist statistician psychiatrist metallurgist zoologist coach badminton player Formula One driver baseball player racing driver oicer physicist neurologist artistic gymnast ield hockey player NASCAR team owner test pilot ophthalmologist lacrosse player motorcycle racer baseball umpire astronaut scientist midwife physician aviator racecar driver ice hockey player basketball coach head coachtennis player nurse ighter pilot bioinformatician golferGaelic football player basketball player futsal player pharmacist surgeon anatomist association football manager gridiron football playerAustralian-rules footballer chemist pharmacologist biologist league player sportsperson rugbyrugby dentist bandy player neuroscientist union player goalkeeper netballer cricketer veterinarian epidemiologist neurosurgeon geneticist handball player pesaepallo player rugby player handball coach cardiologistbiochemist darts player weightlifter cricket umpire molecular biologist volleyball player archer freestyle skier sport shooter medical writer physiologist beach volleyball player military physician cyclo-cross cyclist musher fencer lugeralpine skier bicycle racer sprinter (short-distance runner) taekwondo athlete virologist nutritionist javelin thrower immunologist bobsledder curler track cyclist skeleton racer middle-distance runner athletics competitor ski jumper speed skater oncologist long-distance runner biathlete marathon runner Nordic combined skier cross-country skier rower actor writer journalist author politician association football player Figure 7: Network of co-occurring occupations, colored by community 10 Figure 8: Nationalities over time based on person life-spans 11 Figure 9: Occupations over time based on person life-spans 12 Figure 10: Hyperlink network of English Wikipedia biographies having occupations in ”arts, architecture, crafts and design”, colored by nationality community corresponding to the colors in figures 2,4,6 13 Figure 11: Hyperlink network of English Wikipedia biographies with a nationality in the ”predominantly english speaking” community, colored by occupation community corresponding to the colors in figures 3,5,7 14