Modules
name
The Name module handles all functionality with an author name, including formatting, gender, and affiliated bib information
- class name.Name(name, title=None, gender=None, **kwargs)
Bases:
object
The Name object holds all information related to an author of a publication, such as full name, title, gender, affiliation, and identifications for bib databases such as Scopus and Google Scholar
- Variables
fullname (str) – Full name
firstname (str) – First name
middlename (str) – Middle name
surname (str) – Last name
originalname (str) – Original inputted name to object
title (str) – Title associated with name (e.g. researcher, technician)
gender (str) – Gender associated with name (female, male, or non-binary)
orcid (str) – ORCID identification
hindex_scopus (int) – H-index from Scopus
hindex_scholar (int) – H-index from Scholar
scopusid (str) – Scopus author identification
scholarid (str) – Scholar author idenitification
affiliation (str) – Affiliation institute or university
- getAllNameFormats()
Return all name formats - fullname, all initials, single initials, and name and middle name initials
- Returns
All versions of name [full name, all initials, single initials, first name and initials]
- Return type
list
- getFullInitials()
Get name with all initials formatting e.g. “Jane Emily Doe” >> “J. E. Doe”
- Returns
Full initials version of name
- Return type
str
- getGender()
Return gender attribute. If not given, guess gender based on first name
- Returns
Gender attribute of object
- Return type
str
- getNameAndInitials()
Get full first name and initialled middle names formatting e.g. “Jane Emily Doe” >> “Jane E. Doe”
- Returns
First name and initials version of name
- Return type
str
- getSingleInitials()
Get name with single initials formatting e.g. “Jane Emily Doe” >> “J. Doe”
- Returns
Single initials version of name
- Return type
str
- getSingleName()
Get name with single first name formatting e.g. “Jane Emily Doe” >> “Jane Doe”
- Returns
Single firt name version of name
- Return type
str
- getTitle()
Return title associated with name
- Returns
Title attribute of object
- Return type
str
- matchName(n)
Check if name matches formatted names, with a boolean output
- Parameters
n (str) – Name to match with
- Returns
Flag denoting whether name matches (True) or not (False)
- Return type
bool
- populateFromScholar()
Populate Name attributes using Scholar search
- populateFromScopus()
Populate Name attributes using Scopus AuthorSearch
- name.defineGender(fullname)
User-define gender of fullname with prompted input
- Parameters
fullname (str) – Full name
- Returns
gname – “male”/”female”/”non-binary”
- Return type
str
- name.fetchScholarAuthor(firstname, lastname)
Fetch Scholar author object using name search with Google Scholar API
- Parameters
firstname (str) – First (and middle) name string
lastname (str) – Last name string
- Returns
scholar_author – Google Scholar author attributes
- Return type
dict
- name.fetchScopusAuthor(firstname, lastname)
Fetch Scopus author object using name search with Scopus API
- Parameters
firstname (str) – First (and middle) name string
lastname (str) – Last name string
Returns –
scopus_author (AuthorRetrieval) – Scopus author retrieval object (scopus.author_retrieval.AuthorRetrieval)
- name.getInitial(name)
Get initial from name e.g. “Jane” >> “J. “
- Parameters
name (str) – Name to retrieve initial from
- Returns
Initialised name
- Return type
str
- name.getKeyValue(kwarg, key)
Return key value from dictionary if present
- Parameters
kwarg (dict) – Dict object to find key from
key (str) – Key to retrieve value from
- Returns
Keyword value, None if key is invalid
- Return type
str or int or None
- name.guessGender(guesser, fullname, country=None)
Guess gender from name using the gender_guesser package
- Parameters
guesser (detector.Detector) – Gender guesser object
fullname (str) – Full name
- Returns
gname – Guessed gender of name
- Return type
str
- name.splitFullName(fullname)
Split full name into first, middle and last name e.g. “Jane Emily Doe” >> [“Jane”, “Emily”, “Doe”]
- Parameters
fullname (str) – Full name string
- Returns
first (str) – First name string
middle (str) – Middle name string
last (str) – Last name string
organisation
The Organisation module handles all functionality with a collection of author names
- class organisation.Organisation(names, titles=None, genders=None, **kwargs)
Bases:
object
The Organisation object holds a collection of Name objects, representing an institution or department
- Variables
names (list) – List of Name objects
- addName(n, t=None, g=None, **kwargs)
Add name to Organisation
- Parameters
n (Name or str or list) – Name to add, given as eiter Name, fullname string, or fullname list [firstname, middlename, lastname]
t (str, optional) – Title
g (str, optional) – Gender
**kwargs (dict) – Keyword arguments (valid keywords: orcid, scholarid, scopusid, hindex_scopus, hindex_scholar, affiliation)
- asDataFrame()
Export Organisation as dataframe
- Returns
df – Organisation attributes as dataframe
- Return type
pandas.DataFrame
- checkNames()
Checker and user editor for names and genders in Organisation
- checkOrgName(n)
Check if name is in Organisation
- Parameters
n (str) – Name to check
- Returns
check – Name string that input name matches with, or None if there is no match
- Return type
str or None
- getAllNames(all_formats=True)
Retrieve all names in Organisation
- Parameters
all_formats (bool, default True) – Flag to signify if all name formats should be returned (True), or full names only (False)
- Returns
all_names – List of all organisation names
- Return type
list
- populateOrg(scopus=True, scholar=True)
Populate Organisation with additional information gathered from Scopus and/or Scholar
- Parameters
scopus (bool, default True) – Flag to denote if Scopus authors should be used to populate object
scholar (bool, default True) – Flag to denote if Scholar authors should be used to populate object
- organisation.checkAffiliation(name, organisation)
Check if name appears in Organisation and if so, return affiliation
- Parameters
name (str) – Name to check
organisation (Organisation) – Organisation object to check if name and genderappears in
- Returns
aff – Affiliation of name, or None if name does not appear in Organisation object
- Return type
str or None
- organisation.checkGender(name, organisation)
Check if name appears in Organisation and if so, return gender
- Parameters
name (str) – Name to check
organisation (Organisation) – Organisation object to check if name and genderappears in
- Returns
gender – Gender of name, or None if name does not appear in Organisation object
- Return type
str or None
- organisation.lookupName(n, names)
Check if name is in list of names
- Parameters
n (str) – Name to check
names (list) – List of names to check in
- Returns
Flag denoting if name has been found in list (True) or not (False)
- Return type
bool
- organisation.orgFromCSV(csv_file)
Import organisation from csv file
- Parameters
csv_file (str) – Filepath to csv organisation
- Returns
org – Organisation object
- Return type
bib
The Bib module handles all functionality with a publication, or bib item, such as information retrieval from a bib database and authorship analysis
- class bib.Bib(**kwargs)
Bases:
object
- The Bib object holds all attributes associated with a publication,
such as publication and journal information, citations and altmetrics. Each co-author is linked to the publication as a Name object
- doistr
DOI identification of publication
- titlestr
Publication title
- authorslist
List of authors (given as Name objects)
- datedatetime
Date of publication
- ptypestr
Publication type
- journalstr
Name of journal published in
- citationsint
Number of citations
- altmetricsint
Altmetric score
- genderslist
List of author genders
- aff_instituteslist
List of author institutes
- aff_countrieslist
List of author countries
- checkBibDate(dt)
Return if bib was published after a given date
- Parameters
dt (datetime) – Given date which bib publication date will be compared to
- Returns
Flag denoting if bib was published before (False) or after (True) given date
- Return type
bool
- checkOrgFirstAuthor(organisation)
Return flag for if first author is within organisation
- Parameters
organisation (Organisation) – Organisation object
- Returns
out – Flag denoting if first author is within organisation (True) or external to organisation (False)
- Return type
bool
- checkOrgLastAuthor(organisation)
Return flag for if last author is within organisation
- Parameters
organisation (Organisation) – Organisation object
- Returns
out – Flag denoting if last author is within organisation (True) or external to organisation (False)
- Return type
bool
- getFirstAuthor()
Return first author
- getLastAuthor()
Return last author
- getOrgAuthors(organisation)
Return bib authors within organisation
- Parameters
organisation (Organisation) – Organisation object to compare Bib authors to
- Returns
org_names – List of bib authors that are within Organisation
- Return type
list
- getOrgGender(org_names)
Return genders of organisation authors
- Parameters
org_names (list) – Name objects within organisation
- Returns
org_gender – List of organisation author genders
- Return type
list
- getStrAuthors()
Return all author full names as a comma-delineated string
- populateBib(search)
Populate Bib attributes from search hit
- Parameters
search (list) – List of bib information to populate Bib object with [doi, title, authors, journal, ptype, date, citations]
- retrieveDOIFromTitle()
Retrieve DOI using a CrossRef search of the Bib title, and append to Bib attributes
- retrieveFromAMetric()
get Altmetrics of Bib object. If Altmetrics are not already a Bib attribute, Altmetrics will be retrieved using a DOI search from the Altmetrics API
- Returns
Altmetric score
- Return type
int
- bib.checkBibAuthors(authors, organisation_names)
Check if author name/s appear in organisation names
- Parameters
authors (list) – List of str author names
organisation_names (list) – List of str organisation names
- Returns
Flag denoting if name/s are found
- Return type
check (bool)
- bib.countGenders(genders)
Count genders in list
- Parameters
genders (list) – Gender list to count from
- Returns
female (int) – Female count
male (int) – Male count
nb (int) – Non-binary count
- bib.extractFromCRitem(item)
Get bib information from CrossRef search item
- Parameters
item (dict) – CrossRef search hit
- Returns
List containing doi, title, authors, journal name, publication type, date, citation count
- Return type
list
- bib.extractScholarItem(pub, item)
Extract item from Scholar bib object
- Parameters
pub (dict) – Scholar bib dictionary
item (str) – Keyword to obtain value from
- Returns
out – Keyword value
- Return type
str or int
- bib.fetchAltmetrics(doi)
Fetch altmetrics from DOI
- Parameters
doi (str) – DOI string to search with
- Returns
result – Altmetrics result
- Return type
dict
- bib.fromCrossRef(author)
Get bib records using CrossRef based on inputted sauthor search
- Parameters
author (str) – Author name
- Returns
out – List containing search hit information [doi, title, authors, journal name, publication type, date, citation count]
- Return type
list
- bib.fromScholar(scholar_author)
Fetch all publications associated with Scopus author
- Parameters
scholar_author (dict) – Scholar author dictionary
- Returns
bibs – List of Scholar bibs
- Return type
list
- bib.fromScopus(scopus_author)
Fetch all publications associated with Scopus author
- Parameters
scopus_author (AuthorRetrieval) – Scopus author retrieval object (scopus.author_retrieval.AuthorRetrieval)
- Returns
bibs – List of Scopus search publications (scopus.scopus_search.Document)
- Return type
list
- bib.listToStr(in_list)
Return string with comma separation from list
- Parameters
in_list (list) – List to merge and comma delineate
- Returns
Comma delineated string or None if input is invalid
- Return type
str or None
bibcollection
The BibCollection module handles all functionality with a collection of publications associated with an author or group of authors
- class bibcollection.BibCollection(args)
Bases:
object
A collection of Bib objects, representing a database of publications
- Variables
organisation (Organisation) – Organisation associated with BibCollection
bibs (list) – List of Bib objects
- addBibs(bibs_list)
Add list of Bib objects to BibCollection object
- bibs_listlist
List of Bib objects to add to BibCollection object
- addOrganisation(org)
Add affiliated Organisation object to BibCollection
- Parameters
org (Organisation) – Organisation to add to object
- asDataFrame()
Retrieve BibCollection attributes as dataframe
- Returns
df – Dataframe containing all attributes of BibCollection object
- Return type
pandas.DataFrame
- checkBibs()
Checker for Bibs in BibCollection
- getAllGenders(database)
Fetch genders of all co-authors in BibCollection, using database to retain user defined co-author genders
- Parameters
database (Organisation or str) – Database of names and genders, either as an Organisation object or as a .csv filepath to an Organisation dataframe
- getCRBibs()
Retrieve CrossRef bibs associated with authors in organisation
- getOrganisation()
Return organisation
- Returns
Organisation – Organisation object affiliated with BibCollection object
- Return type
- getRecent(dt)
Return all bibs in BibCollection that were published after a certain date (given as a datetime object)
- Parameters
dt (datetime) – Datetime to filter bibs
- Returns
new_bibs – BibCollection with only recent Bibs
- Return type
- getScholarBibs()
Retrieve all Scholar bibs associated with authors in organisation
- getScopusBibs()
Retrieve all Scopus bibs associated with authors in organisation
- removeAbstracts()
Remove conference abstract bibs from BibCollection based on journal title containinng the word “abstract” (case non-specific)
- removeBib(idx)
Remove bib from BibCollection based on index position
- Parameters
idx (int) – Index position of bib to delete
- removeDiscussions()
Remove discussion paper bibs from BibCollection based on journal title containinng the word “discussion” (case non-specific)
- removeDuplicates()
Remove duplicate bib objects from BibCollection based on doi and title
- removeFromKeyword(bib_att, keyword)
Remove bibs from BibCollection based on keyword (case non-specific) in specified bib attribute journal
- Parameters
bib_att (list) – Bib attribute list to base removal on
keyword (str) – Word to classify removal
- bibcollection.bibsFromCSV(csv_file)
Import BibCollection from csv file
- Parameters
csv_file (str) – Filepath to csv bibcollection
- Returns
bibs – BibCollection object
- Return type
- bibcollection.calcDivIdx(name, years, scopus=True, scholar=False, crossref=False, check=True)
Determine the diversity index of an individual
- Parameters
name (str) – Full name of individual to determine diversity index for
years (int) – Number of years (from current date) to determine diversity index from
scopus (bool, optional) – Flag to signify if Scopus database should be used. The default is True
scholar (bool, optional) – Flag to signify if Scholar database should be used. The default is False
crossref (bool, optional) – Flag to signify if Crossref database should be used. The default is False
check (bool, optional) – Flag to signify if name and bib results should be checked by users
- bibcollection.countByYear(df)
Count publications in dataframe by year
- Parameters
df (pandas.Dataframe) – A dataframe representing an exported BibCollection
- Returns
A dataframe of only co-authors publication entries
- Return type
pandas.Dataframe
- bibcollection.findDuplicates(l)
Find index of duplicates in list, disregarding nan values
- Parameters
l (list) – List to find duplicates in
- Returns
idx – List of duplicate indices
- Return type
list
- bibcollection.firstFromDF(df, first=True)
Get either Organisation-led (i.e. first author) or co-author publication entries from dataframe
- Parameters
df (pandas.Dataframe) – A dataframe representing an exported BibCollection
first (bool) – Flag to denote if first author or co-author publications should be retrieved
- Returns
df1 – A dataframe of only Organisation-led publication entries
- Return type
pandas.Dataframe
- bibcollection.getGenderDistrib(df, first=True)
Get gender distribution of women, men and non-binary authors as a percentage. This is derived from the gender count columns from a given dataframe
- Parameters
df (pandas.Dataframe) – A dataframe representing an exported BibCollection
first (bool) – Flag to denote if first author gender should be included or not
- Returns
f (list) – List of percentage of female authors for each publication
m (list) – List of percentage of male authors for each publication
nb (list) – List of percentage of non-binary authors for each publication