Search term selection with litsearchr v1.0.0 for an example systematic review of the effects of fire on black-backed woodpeckers Eliza M. Grames and Emily A. Hennessy
The litsearchr package for R is designed to partially automate search term selection and writing search strategies for systematic reviews. This vignette demonstrates its utility through a mock, example review examining the effects of fire on black-backed woodpeckers by demonstrating how the package: (1) Identifies potential keywords through the naive search input, (2) Builds a keyword co-occurrence network to assist with building a more precise search strategy, (3) Uses a cutoff function to identify important changes in keyword importance, (4) Assists with grouping terms into concepts, and (5) Writes a Boolean search as a result of completion of the four previous steps.
Write and conduct naive search In our empirical example, we begin with a naive search intended to capture a set of relevant articles. Naive search terms: ((“black-backed woodpecker” OR “picoides arcticus” OR “picoides tridactylus” AND (burn* OR fire*)). We ran the search in Scopus and Zoological Record (Web of Science), exporting results in .ris and .txt, respectively. These exported search results are then imported to litsearchr using the import_results function and next deduplicated using the remove_duplicates function. In some cases, it is best to run the remove_duplicates function two or more times, for example starting with exact matches and moving on to fuzzy matching.
# Note: system.file() is only used to identify where the example datasets are stored
# If litsearchr and its dependencies were successfully installed, this directory exists on your computer
# If you are using your own bibliographic files, you should not use system.file
# You should instead give it the full path (or relative path from your current working directory) to the directory where your files are stored
search_directory <- system.file("extdata", package="litsearchr")
naiveimport <-
litsearchr::import_results(directory = search_directory, verbose = TRUE)
## Reading file C:/Users/mlolita/AppData/Local/R/win-library/4.4/litsearchr/extdata/scopus.ris ... done
## Reading file C:/Users/mlolita/AppData/Local/R/win-library/4.4/litsearchr/extdata/zoorec.txt ... done
naiveresults <-
litsearchr::remove_duplicates(naiveimport, field = "title", method = "string_osa")
rakedkeywords <-
litsearchr::extract_terms(
text = paste(naiveresults$title, naiveresults$abstract),
method = "fakerake",
min_freq = 2,
ngrams = TRUE,
min_n = 2,
language = "English"
)
## Loading required namespace: stopwords
taggedkeywords <-
litsearchr::extract_terms(
keywords = naiveresults$keywords,
method = "tagged",
min_freq = 2,
ngrams = TRUE,
min_n = 2,
language = "English"
)
all_keywords <- unique(append(taggedkeywords, rakedkeywords))
naivedfm <-
litsearchr::create_dfm(
elements = paste(naiveresults$title, naiveresults$abstract),
features = all_keywords
)
naivegraph <-
litsearchr::create_network(
search_dfm = naivedfm,
min_studies = 2,
min_occ = 2
)
cutoff <-
litsearchr::find_cutoff(
naivegraph,
method = "cumulative",
percent = .80,
imp_method = "strength"
)
reducedgraph <-
litsearchr::reduce_graph(naivegraph, cutoff_strength = cutoff[1])
searchterms <- litsearchr::get_keywords(reducedgraph)
head(searchterms, 20)
## [1] "black-backed woodpecker" "black hill"
## [3] "black hills" "boreal forest"
## [5] "breeding season" "burned forest"
## [7] "cavity-nesting birds" "cavity nesters"
## [9] "certhia americana" "colaptes auratus"
## [11] "conifer forests" "coniferous forest"
## [13] "coniferous forests" "dendroctonus ponderosae"
## [15] "dryocopus pileatus" "fire severity"
## [17] "food availability" "forest management"
## [19] "habitat selection" "habitat suitability"
#> [1] "black hill" "black hills"
#> [3] "black-backed woodpecker" "boreal forest"
#> [5] "breeding season" "burned forest"
#> [7] "cavity nesters" "cavity-nesting birds"
#> [9] "certhia americana" "colaptes auratus"
#> [11] "conifer forests" "coniferous forest"
#> [13] "coniferous forests" "dendroctonus ponderosae"
#> [15] "dryocopus pileatus" "fire severity"
#> [17] "food availability" "forest management"
#> [19] "habitat selection" "habitat suitability"
In our example, all keywords that relate to woodpeckers would be in a similar concept group (e.g., “three-toed woodpeckers”, “cavity-nesting birds” etc.) while terms relating to fire (e.g. “post-fire”, “burned forest”, etc.) would be in their own concept group.
Terms that fit multiple concept groups can be added to both without changing the logic of the Boolean connections. For example, a term like “post-fire woodpecker ecology” would be added to both the woodpecker and fire concept groups by labeling its group “woodpecker, fire”. We recommend saving the search terms to a .csv file, adding a new column called “group”, and entering the group names in it, then reading in the .csv file. Although this can be done in R, adding tags to 300+ suggested search terms is generally quicker in a .csv file. Example code for this is commented out below as it cannot be run without the .csv file.
# write.csv(searchterms, "./search_terms.csv")
# manually group terms in the csv
# grouped_terms <- read.csv("./search_terms_grouped.csv")
# extract the woodpecker terms from the csv
# woodpecker_terms <- grouped_terms$term[grep("woodpecker", grouped_terms$group)]
# join together a list of manually generated woodpecker terms with the ones from the csv
# woodpeckers <- unique(append(c("woodpecker")), woodpecker_terms)
# repeat this for all concept groups
# then merge them into a list, using the code below as an example
# mysearchterms <- list(woodpeckers, fire)
# Note: these search terms are a shortened example of a full search for illustration purposes only
mysearchterms <-
list(
c(
"picoides arcticus",
"black-backed woodpecker",
"cavity-nesting birds",
"picoides tridactylus",
"three-toed woodpecker"),
c(
"wildfire",
"burned forest",
"post-fire",
"postfire salvage logging",
"fire severity",
"recently burned"
)
)
my_search <-
litsearchr::write_search(
groupdata = mysearchterms,
languages = "English",
stemming = TRUE,
closure = "none",
exactphrase = TRUE,
writesearch = FALSE,
verbose = TRUE
)
## [1] "English is written"
#> [1] "English is written"
# when writing to a plain text file, the extra \ are required to render the * and " properly
# if copying straight from the console, simply find and replace them in a text editor
my_search
## [1] "((\"picoid* arcticus*\" OR \"black-back* woodpeck*\" OR \"cavity-nest* bird*\" OR \"picoid* tridactylus*\" OR \"three-to* woodpeck*\") AND (wildfir* OR \"burn* forest*\" OR post-fir* OR \"postfir* salvag* logging\" OR \"fire* sever*\" OR \"recent* burn*\"))"
#> [1] "((\"picoid* arcticus*\" OR \"black-back* woodpeck*\" OR \"cavity-nest* bird*\" OR \"picoid* tridactylus*\" OR \"three-to* woodpeck*\") AND (wildfir* OR \"burn* forest*\" OR post-fir* OR \"postfir* salvag* logging\" OR \"fire* sever*\" OR \"recent* burn*\"))"
gold_standard <-
c(
"Black-backed woodpecker occupancy in burned and beetle killed forests: disturbance agent matters",
"Nest site selection and nest survival of Black-backed Woodpeckers after wildfire",
"Cross scale occupancy dynamics of a postfire specialist in response to variation across a fire regime"
)
title_search <- litsearchr::write_title_search(titles=gold_standard)
We then read in our full search results and compare them to our gold standard to determine which gold standard articles we retrieved. Note: in this case I am using the naive search results from earlier because this is just a demonstration and this is not a real systematic review, so I did not run the full searches. You will want to do this with your actual full search results.
results_directory <- system.file("extdata", package="litsearchr")
retrieved_articles <-
litsearchr::import_results(directory = results_directory, verbose = TRUE)
## Reading file C:/Users/mlolita/AppData/Local/R/win-library/4.4/litsearchr/extdata/scopus.ris ... done
## Reading file C:/Users/mlolita/AppData/Local/R/win-library/4.4/litsearchr/extdata/zoorec.txt ... done
#> Reading file /tmp/RtmpGp7mOt/temp_libpath605c6719a3ad/litsearchr/extdata/scopus.ris ... done
#> Reading file /tmp/RtmpGp7mOt/temp_libpath605c6719a3ad/litsearchr/extdata/zoorec.txt ... done
retrieved_articles <- litsearchr::remove_duplicates(retrieved_articles, field="title", method="string_osa")
articles_found <- litsearchr::check_recall(true_hits = gold_standard,
retrieved = retrieved_articles$title)
articles_found
## Title
## [1,] "Black-backed woodpecker occupancy in burned and beetle killed forests: disturbance agent matters"
## [2,] "Nest site selection and nest survival of Black-backed Woodpeckers after wildfire"
## [3,] "Cross scale occupancy dynamics of a postfire specialist in response to variation across a fire regime"
## Best_Match
## [1,] "Black-backed woodpecker occupancy in burned and beetle-killed forests: Disturbance agent matters"
## [2,] "Nest site selection and nest survival of Black-backed Woodpeckers after wildfire"
## [3,] "The ecological importance of severe wildfires: Some like it hot"
## Similarity
## [1,] "0.588235294117647"
## [2,] "1"
## [3,] "0.134969325153374"
#> Title
#> [1,] "Black-backed woodpecker occupancy in burned and beetle killed forests: disturbance agent matters"
#> [2,] "Nest site selection and nest survival of Black-backed Woodpeckers after wildfire"
#> [3,] "Cross scale occupancy dynamics of a postfire specialist in response to variation across a fire regime"
#> Best_Match
#> [1,] "Black-backed woodpecker occupancy in burned and beetle-killed forests: Disturbance agent matters"
#> [2,] "Nest site selection and nest survival of Black-backed Woodpeckers after wildfire"
#> [3,] "The ecological importance of severe wildfires: Some like it hot"
#> Similarity
#> [1,] "0.588235294117647"
#> [2,] "1"
#> [3,] "0.134969325153374"
The check indicates that all three of our gold standard articles were included in our search results, so we would go ahead with our final search and use it for our systematic review.