Modules

name

The Name module handles all functionality with an author name, including formatting, gender, and affiliated bib information

class name.Name(name, title=None, gender=None, **kwargs)

Bases: object

The Name object holds all information related to an author of a publication, such as full name, title, gender, affiliation, and identifications for bib databases such as Scopus and Google Scholar

Variables
  • fullname (str) – Full name

  • firstname (str) – First name

  • middlename (str) – Middle name

  • surname (str) – Last name

  • originalname (str) – Original inputted name to object

  • title (str) – Title associated with name (e.g. researcher, technician)

  • gender (str) – Gender associated with name (female, male, or non-binary)

  • orcid (str) – ORCID identification

  • hindex_scopus (int) – H-index from Scopus

  • hindex_scholar (int) – H-index from Scholar

  • scopusid (str) – Scopus author identification

  • scholarid (str) – Scholar author idenitification

  • affiliation (str) – Affiliation institute or university

getAllNameFormats()

Return all name formats - fullname, all initials, single initials, and name and middle name initials

Returns

All versions of name [full name, all initials, single initials, first name and initials]

Return type

list

getFullInitials()

Get name with all initials formatting e.g. “Jane Emily Doe” >> “J. E. Doe”

Returns

Full initials version of name

Return type

str

getGender()

Return gender attribute. If not given, guess gender based on first name

Returns

Gender attribute of object

Return type

str

getNameAndInitials()

Get full first name and initialled middle names formatting e.g. “Jane Emily Doe” >> “Jane E. Doe”

Returns

First name and initials version of name

Return type

str

getSingleInitials()

Get name with single initials formatting e.g. “Jane Emily Doe” >> “J. Doe”

Returns

Single initials version of name

Return type

str

getSingleName()

Get name with single first name formatting e.g. “Jane Emily Doe” >> “Jane Doe”

Returns

Single firt name version of name

Return type

str

getTitle()

Return title associated with name

Returns

Title attribute of object

Return type

str

matchName(n)

Check if name matches formatted names, with a boolean output

Parameters

n (str) – Name to match with

Returns

Flag denoting whether name matches (True) or not (False)

Return type

bool

populateFromScholar()

Populate Name attributes using Scholar search

populateFromScopus()

Populate Name attributes using Scopus AuthorSearch

name.defineGender(fullname)

User-define gender of fullname with prompted input

Parameters

fullname (str) – Full name

Returns

gname – “male”/”female”/”non-binary”

Return type

str

name.fetchScholarAuthor(firstname, lastname)

Fetch Scholar author object using name search with Google Scholar API

Parameters
  • firstname (str) – First (and middle) name string

  • lastname (str) – Last name string

Returns

scholar_author – Google Scholar author attributes

Return type

dict

name.fetchScopusAuthor(firstname, lastname)

Fetch Scopus author object using name search with Scopus API

Parameters
  • firstname (str) – First (and middle) name string

  • lastname (str) – Last name string

  • Returns

  • scopus_author (AuthorRetrieval) – Scopus author retrieval object (scopus.author_retrieval.AuthorRetrieval)

name.getInitial(name)

Get initial from name e.g. “Jane” >> “J. “

Parameters

name (str) – Name to retrieve initial from

Returns

Initialised name

Return type

str

name.getKeyValue(kwarg, key)

Return key value from dictionary if present

Parameters
  • kwarg (dict) – Dict object to find key from

  • key (str) – Key to retrieve value from

Returns

Keyword value, None if key is invalid

Return type

str or int or None

name.guessGender(guesser, fullname, country=None)

Guess gender from name using the gender_guesser package

Parameters
  • guesser (detector.Detector) – Gender guesser object

  • fullname (str) – Full name

Returns

gname – Guessed gender of name

Return type

str

name.splitFullName(fullname)

Split full name into first, middle and last name e.g. “Jane Emily Doe” >> [“Jane”, “Emily”, “Doe”]

Parameters

fullname (str) – Full name string

Returns

  • first (str) – First name string

  • middle (str) – Middle name string

  • last (str) – Last name string

organisation

The Organisation module handles all functionality with a collection of author names

class organisation.Organisation(names, titles=None, genders=None, **kwargs)

Bases: object

The Organisation object holds a collection of Name objects, representing an institution or department

Variables

names (list) – List of Name objects

addName(n, t=None, g=None, **kwargs)

Add name to Organisation

Parameters
  • n (Name or str or list) – Name to add, given as eiter Name, fullname string, or fullname list [firstname, middlename, lastname]

  • t (str, optional) – Title

  • g (str, optional) – Gender

  • **kwargs (dict) – Keyword arguments (valid keywords: orcid, scholarid, scopusid, hindex_scopus, hindex_scholar, affiliation)

asDataFrame()

Export Organisation as dataframe

Returns

df – Organisation attributes as dataframe

Return type

pandas.DataFrame

checkNames()

Checker and user editor for names and genders in Organisation

checkOrgName(n)

Check if name is in Organisation

Parameters

n (str) – Name to check

Returns

check – Name string that input name matches with, or None if there is no match

Return type

str or None

getAllNames(all_formats=True)

Retrieve all names in Organisation

Parameters

all_formats (bool, default True) – Flag to signify if all name formats should be returned (True), or full names only (False)

Returns

all_names – List of all organisation names

Return type

list

populateOrg(scopus=True, scholar=True)

Populate Organisation with additional information gathered from Scopus and/or Scholar

Parameters
  • scopus (bool, default True) – Flag to denote if Scopus authors should be used to populate object

  • scholar (bool, default True) – Flag to denote if Scholar authors should be used to populate object

organisation.checkAffiliation(name, organisation)

Check if name appears in Organisation and if so, return affiliation

Parameters
  • name (str) – Name to check

  • organisation (Organisation) – Organisation object to check if name and genderappears in

Returns

aff – Affiliation of name, or None if name does not appear in Organisation object

Return type

str or None

organisation.checkGender(name, organisation)

Check if name appears in Organisation and if so, return gender

Parameters
  • name (str) – Name to check

  • organisation (Organisation) – Organisation object to check if name and genderappears in

Returns

gender – Gender of name, or None if name does not appear in Organisation object

Return type

str or None

organisation.lookupName(n, names)

Check if name is in list of names

Parameters
  • n (str) – Name to check

  • names (list) – List of names to check in

Returns

Flag denoting if name has been found in list (True) or not (False)

Return type

bool

organisation.orgFromCSV(csv_file)

Import organisation from csv file

Parameters

csv_file (str) – Filepath to csv organisation

Returns

org – Organisation object

Return type

Organisation

bib

The Bib module handles all functionality with a publication, or bib item, such as information retrieval from a bib database and authorship analysis

class bib.Bib(**kwargs)

Bases: object

The Bib object holds all attributes associated with a publication,

such as publication and journal information, citations and altmetrics. Each co-author is linked to the publication as a Name object

doistr

DOI identification of publication

titlestr

Publication title

authorslist

List of authors (given as Name objects)

datedatetime

Date of publication

ptypestr

Publication type

journalstr

Name of journal published in

citationsint

Number of citations

altmetricsint

Altmetric score

genderslist

List of author genders

aff_instituteslist

List of author institutes

aff_countrieslist

List of author countries

checkBibDate(dt)

Return if bib was published after a given date

Parameters

dt (datetime) – Given date which bib publication date will be compared to

Returns

Flag denoting if bib was published before (False) or after (True) given date

Return type

bool

checkOrgFirstAuthor(organisation)

Return flag for if first author is within organisation

Parameters

organisation (Organisation) – Organisation object

Returns

out – Flag denoting if first author is within organisation (True) or external to organisation (False)

Return type

bool

checkOrgLastAuthor(organisation)

Return flag for if last author is within organisation

Parameters

organisation (Organisation) – Organisation object

Returns

out – Flag denoting if last author is within organisation (True) or external to organisation (False)

Return type

bool

getFirstAuthor()

Return first author

getLastAuthor()

Return last author

getOrgAuthors(organisation)

Return bib authors within organisation

Parameters

organisation (Organisation) – Organisation object to compare Bib authors to

Returns

org_names – List of bib authors that are within Organisation

Return type

list

getOrgGender(org_names)

Return genders of organisation authors

Parameters

org_names (list) – Name objects within organisation

Returns

org_gender – List of organisation author genders

Return type

list

getStrAuthors()

Return all author full names as a comma-delineated string

populateBib(search)

Populate Bib attributes from search hit

Parameters

search (list) – List of bib information to populate Bib object with [doi, title, authors, journal, ptype, date, citations]

retrieveDOIFromTitle()

Retrieve DOI using a CrossRef search of the Bib title, and append to Bib attributes

retrieveFromAMetric()

get Altmetrics of Bib object. If Altmetrics are not already a Bib attribute, Altmetrics will be retrieved using a DOI search from the Altmetrics API

Returns

Altmetric score

Return type

int

bib.checkBibAuthors(authors, organisation_names)

Check if author name/s appear in organisation names

Parameters
  • authors (list) – List of str author names

  • organisation_names (list) – List of str organisation names

Returns

Flag denoting if name/s are found

Return type

check (bool)

bib.countGenders(genders)

Count genders in list

Parameters

genders (list) – Gender list to count from

Returns

  • female (int) – Female count

  • male (int) – Male count

  • nb (int) – Non-binary count

bib.extractFromCRitem(item)

Get bib information from CrossRef search item

Parameters

item (dict) – CrossRef search hit

Returns

List containing doi, title, authors, journal name, publication type, date, citation count

Return type

list

bib.extractScholarItem(pub, item)

Extract item from Scholar bib object

Parameters
  • pub (dict) – Scholar bib dictionary

  • item (str) – Keyword to obtain value from

Returns

out – Keyword value

Return type

str or int

bib.fetchAltmetrics(doi)

Fetch altmetrics from DOI

Parameters

doi (str) – DOI string to search with

Returns

result – Altmetrics result

Return type

dict

bib.fromCrossRef(author)

Get bib records using CrossRef based on inputted sauthor search

Parameters

author (str) – Author name

Returns

out – List containing search hit information [doi, title, authors, journal name, publication type, date, citation count]

Return type

list

bib.fromScholar(scholar_author)

Fetch all publications associated with Scopus author

Parameters

scholar_author (dict) – Scholar author dictionary

Returns

bibs – List of Scholar bibs

Return type

list

bib.fromScopus(scopus_author)

Fetch all publications associated with Scopus author

Parameters

scopus_author (AuthorRetrieval) – Scopus author retrieval object (scopus.author_retrieval.AuthorRetrieval)

Returns

bibs – List of Scopus search publications (scopus.scopus_search.Document)

Return type

list

bib.listToStr(in_list)

Return string with comma separation from list

Parameters

in_list (list) – List to merge and comma delineate

Returns

Comma delineated string or None if input is invalid

Return type

str or None

bibcollection

The BibCollection module handles all functionality with a collection of publications associated with an author or group of authors

class bibcollection.BibCollection(args)

Bases: object

A collection of Bib objects, representing a database of publications

Variables
  • organisation (Organisation) – Organisation associated with BibCollection

  • bibs (list) – List of Bib objects

addBibs(bibs_list)

Add list of Bib objects to BibCollection object

bibs_listlist

List of Bib objects to add to BibCollection object

addOrganisation(org)

Add affiliated Organisation object to BibCollection

Parameters

org (Organisation) – Organisation to add to object

asDataFrame()

Retrieve BibCollection attributes as dataframe

Returns

df – Dataframe containing all attributes of BibCollection object

Return type

pandas.DataFrame

checkBibs()

Checker for Bibs in BibCollection

getAllGenders(database)

Fetch genders of all co-authors in BibCollection, using database to retain user defined co-author genders

Parameters

database (Organisation or str) – Database of names and genders, either as an Organisation object or as a .csv filepath to an Organisation dataframe

getCRBibs()

Retrieve CrossRef bibs associated with authors in organisation

getOrganisation()

Return organisation

Returns

Organisation – Organisation object affiliated with BibCollection object

Return type

Organisation

getRecent(dt)

Return all bibs in BibCollection that were published after a certain date (given as a datetime object)

Parameters

dt (datetime) – Datetime to filter bibs

Returns

new_bibs – BibCollection with only recent Bibs

Return type

BibCollection

getScholarBibs()

Retrieve all Scholar bibs associated with authors in organisation

getScopusBibs()

Retrieve all Scopus bibs associated with authors in organisation

removeAbstracts()

Remove conference abstract bibs from BibCollection based on journal title containinng the word “abstract” (case non-specific)

removeBib(idx)

Remove bib from BibCollection based on index position

Parameters

idx (int) – Index position of bib to delete

removeDiscussions()

Remove discussion paper bibs from BibCollection based on journal title containinng the word “discussion” (case non-specific)

removeDuplicates()

Remove duplicate bib objects from BibCollection based on doi and title

removeFromKeyword(bib_att, keyword)

Remove bibs from BibCollection based on keyword (case non-specific) in specified bib attribute journal

Parameters
  • bib_att (list) – Bib attribute list to base removal on

  • keyword (str) – Word to classify removal

bibcollection.bibsFromCSV(csv_file)

Import BibCollection from csv file

Parameters

csv_file (str) – Filepath to csv bibcollection

Returns

bibs – BibCollection object

Return type

BibCollection

bibcollection.calcDivIdx(name, years, scopus=True, scholar=False, crossref=False, check=True)

Determine the diversity index of an individual

Parameters
  • name (str) – Full name of individual to determine diversity index for

  • years (int) – Number of years (from current date) to determine diversity index from

  • scopus (bool, optional) – Flag to signify if Scopus database should be used. The default is True

  • scholar (bool, optional) – Flag to signify if Scholar database should be used. The default is False

  • crossref (bool, optional) – Flag to signify if Crossref database should be used. The default is False

  • check (bool, optional) – Flag to signify if name and bib results should be checked by users

bibcollection.countByYear(df)

Count publications in dataframe by year

Parameters

df (pandas.Dataframe) – A dataframe representing an exported BibCollection

Returns

A dataframe of only co-authors publication entries

Return type

pandas.Dataframe

bibcollection.findDuplicates(l)

Find index of duplicates in list, disregarding nan values

Parameters

l (list) – List to find duplicates in

Returns

idx – List of duplicate indices

Return type

list

bibcollection.firstFromDF(df, first=True)

Get either Organisation-led (i.e. first author) or co-author publication entries from dataframe

Parameters
  • df (pandas.Dataframe) – A dataframe representing an exported BibCollection

  • first (bool) – Flag to denote if first author or co-author publications should be retrieved

Returns

df1 – A dataframe of only Organisation-led publication entries

Return type

pandas.Dataframe

bibcollection.getGenderDistrib(df, first=True)

Get gender distribution of women, men and non-binary authors as a percentage. This is derived from the gender count columns from a given dataframe

Parameters
  • df (pandas.Dataframe) – A dataframe representing an exported BibCollection

  • first (bool) – Flag to denote if first author gender should be included or not

Returns

  • f (list) – List of percentage of female authors for each publication

  • m (list) – List of percentage of male authors for each publication

  • nb (list) – List of percentage of non-binary authors for each publication