Title: | Literature Matrix Synthesis Tools for Epidemiology and Health Science Research |
---|---|
Description: | An easy-to-use workflow that provides tools to create, update and fill literature matrices commonly used in research, specifically epidemiology and health sciences research. The project is born out of need as an easy–to–use tool for my research methods classes. |
Authors: | JP Monteagudo [aut, cre, cph] |
Maintainer: | JP Monteagudo <[email protected]> |
License: | AGPL (>= 3) |
Version: | 1.0.1 |
Built: | 2025-03-09 05:57:07 UTC |
Source: | https://github.com/jpmonteagudo28/matriz |
Adds one or more records to a literature matrix at a specified position. Records can be provided as lists or data frames, and can be inserted before or after specific rows.
add_batch_record(.data, ..., .before = NULL, .after = NULL)
add_batch_record(.data, ..., .before = NULL, .after = NULL)
.data |
A data frame to which records will be added |
... |
One or more records to add. Each record can be either:
|
.before |
Row number before which to insert the new records. If NULL (default), and '.after' is also NULL, records are appended to the end. |
.after |
Row number after which to insert the new records. If NULL (default), and '.before' is also NULL, records are appended to the end. |
A data frame with the new records added at the specified position
# Create sample data frame df <- data.frame( name = c("John", "Jane"), age = c(25, 30) ) # Add a single record as a list df <- add_batch_record(df, list(name = "Bob", age = 35)) # Add multiple records as data frames new_records <- data.frame( name = c("Alice", "Charlie"), age = c(28, 40) ) df <- add_batch_record(df, new_records, .before = 2)
# Create sample data frame df <- data.frame( name = c("John", "Jane"), age = c(25, 30) ) # Add a single record as a list df <- add_batch_record(df, list(name = "Bob", age = 35)) # Add multiple records as data frames new_records <- data.frame( name = c("Alice", "Charlie"), age = c(28, 40) ) df <- add_batch_record(df, new_records, .before = 2)
Adds a single row of NA values to a data frame
add_empty_row(.data)
add_empty_row(.data)
.data |
A data frame to which an empty row will be added |
Modified data frame with an additional empty row
Adds a new row to a data frame at a specified position
add_record(.data, ..., .before = NULL, .after = NULL)
add_record(.data, ..., .before = NULL, .after = NULL)
.data |
A data frame to which a record will be added |
... |
New record to be added (vector, list, or data frame) |
.before |
Optional. Row number before which to insert the new record |
.after |
Optional. Row number after which to insert the new record |
Modified data frame with the new record inserted
df <- data.frame(x = 1:3, y = 4:6) add_record(df, c(4, 7)) add_record(df, c(4, 7), .before = 2)
df <- data.frame(x = 1:3, y = 4:6) add_record(df, c(4, 7)) add_record(df, c(4, 7), .before = 2)
Deletes specific rows from a data frame or clears the entire data frame by leveraging the 'truncate' function. If no position is provided, it will issue a message and either return the unchanged data or use 'truncate' to empty the data frame, depending on additional arguments.
delete_record(.data, position = NULL, ...)
delete_record(.data, position = NULL, ...)
.data |
A data frame from which records will be deleted. |
position |
A numeric vector specifying the row positions to be deleted. If 'NULL', behavior is determined by the number of rows in the data frame and additional arguments passed to the 'truncate' function. |
... |
Additional arguments passed to the 'truncate' function. Specifically, the 'keep_rows' argument can be used to decide whether non-NA cells in the data frame are cleared when truncating. |
- If 'position' is 'NULL' and the data frame has more than one row, a message is issued, and no records are deleted. - If 'position' is a numeric vector, the specified rows are deleted using 'dplyr::slice()'. - If 'position' is empty or invalid (e.g., not numeric), the function stops with an appropriate error message. - When no rows remain after deletion, the function calls 'truncate' to handle the data frame, with behavior controlled by the 'keep_rows' argument passed through '...'.
A modified data frame with the specified rows removed. If 'position' is 'NULL', the function either returns the original data frame or an empty data frame, based on the 'keep_rows' argument in the 'truncate' function.
df <- data.frame(A = 1:5, B = letters[1:5]) # Delete a specific row delete_record(df, position = 2) # Delete multiple rows delete_record(df, position = c(2, 4)) # Use truncate to clear the data frame delete_record(df, position = NULL, keep_rows = FALSE) # Keep non-NA cells but empty rows delete_record(df, position = NULL, keep_rows = TRUE)
df <- data.frame(A = 1:5, B = letters[1:5]) # Delete a specific row delete_record(df, position = 2) # Delete multiple rows delete_record(df, position = c(2, 4)) # Use truncate to clear the data frame delete_record(df, position = NULL, keep_rows = FALSE) # Keep non-NA cells but empty rows delete_record(df, position = NULL, keep_rows = TRUE)
This function exports a data frame to a specified file format, including CSV, TSV, RDS, XLSX, and TXT. If the format is not provided, it is inferred from the file extension.
export_matrix( .data, file, format = NULL, drop_extra = FALSE, extra_columns = NULL, silent = FALSE, ... )
export_matrix( .data, file, format = NULL, drop_extra = FALSE, extra_columns = NULL, silent = FALSE, ... )
.data |
A data frame or tibble to be exported. |
file |
A character string specifying the file name and path. |
format |
A character string specifying the file format. If 'NULL', the format is inferred from the file extension. Supported formats: '"csv"', '"tsv"', '"rds"', '"xlsx"', '"txt"'. |
drop_extra |
Logical. If 'TRUE', removes columns not listed in 'extra_columns' before exporting. Default is 'FALSE'. |
extra_columns |
A character vector specifying additional columns to retain if 'drop_extra = TRUE'. Default is 'NULL'. |
silent |
Logical. If 'TRUE', suppresses messages. Default is 'FALSE'. |
... |
Additional arguments passed to the underlying export functions ('write.csv', 'writexl::write_xlsx', etc.). |
Exports the data to a file and returns 'NULL' invisibly.
This function imports a matrix (data frame) from various file formats (CSV, TSV, RDS, XLSX, XLS, TXT) and ensures it contains the required columns. It also allows the user to control whether extra columns should be dropped or kept.
import_matrix( path, format = NULL, drop_extra = FALSE, extra_columns = NULL, remove_dups = TRUE, silent = FALSE, ... )
import_matrix( path, format = NULL, drop_extra = FALSE, extra_columns = NULL, remove_dups = TRUE, silent = FALSE, ... )
path |
A character string specifying the path to the file to be imported. |
format |
A character string specifying the file format. If not provided, the format is automatically detected based on the file extension. Supported formats: "csv", "tsv", "rds", "xlsx", "xls", "txt". |
drop_extra |
A logical value indicating whether extra columns (not in the list of required columns) should be dropped. Default is 'FALSE'. |
extra_columns |
A character vector of column names that are allowed in addition to the required columns. By default, no extra columns are allowed. |
remove_dups |
A logical value indicating whether to remove duplicate columns before merging. Default is 'TRUE'. |
silent |
A logical value indicating whether to suppress messages. Default is 'FALSE'. |
... |
Additional arguments passed to the specific file-reading functions (e.g., 'read.csv', 'read.delim', 'readRDS', 'readxl::read_xlsx', 'readxl::read_xls', 'read.table'). Refer to the documentation of the corresponding read function for the list of valid arguments. |
The matrix includes the following predefined columns:
- 'year': Numeric. Year of publication. - 'citation': Character. Citation or reference details. - 'keywords': Character. Keywords or tags for the study. - 'profession': Character. Profession of the study participants or target audience. - 'electronic': Logical. Indicates whether the study is available electronically. - 'purpose': Character. Purpose or objective of the study. - 'study_design': Character. Study design or methodology. - 'outcome_var': Character. Outcome variables measured in the study. - 'predictor_var': Character. Predictor variables considered in the study. - 'sample': Numeric. Sample size. - 'dropout_rate': Numeric. Dropout or attrition rate. - 'setting': Character. Study setting (e.g., clinical, educational). - 'inclusion_criteria': Character. Inclusion criteria for participants. - 'ethnicity': Character. Ethnic background of participants. - 'age': Numeric. Age of participants. - 'sex': Factor. Sex of participants. - 'income': Factor. Income level of participants. - 'education': Character. Educational background of participants. - 'measures': Character. Measures or instruments used for data collection. - 'analysis': Character. Analytical methods used. - 'results': Character. Summary of results or findings. - 'limitations': Character. Limitations of the study. - 'implications': Character. Implications or recommendations from the study. - 'ethical_concerns': Character. Ethical concerns addressed in the study. - 'biases': Character. Potential biases in the study. - 'notes': Character. Additional notes or observations.
Extra columns beyond the required ones are handled via the 'extra_columns' argument. If the 'drop_extra' argument is set to 'TRUE', extra columns will be removed. If 'drop_extra' is 'FALSE', extra columns will remain in the imported data, and a message will be shown.
The '...' argument allows you to pass additional parameters directly to the read functions. For instance: - For 'read.csv', '...' could include 'header = TRUE', 'sep = ","', or 'stringsAsFactors = FALSE'. - For 'read.delim', '...' could include 'header = TRUE', 'sep ', or 'stringsAsFactors = FALSE'. - For 'readRDS', '...' could include 'refhook = NULL'. - For 'readxl::read_xlsx', '...' could include 'sheet = 1' or 'col_names = TRUE'. - For 'readxl::read_xls', '...' could include 'sheet = 1' or 'col_Names = TRUE'. - For 'read.table', '...' could include 'header = TRUE', 'sep', or 'stringsAsFactors = FALSE'.
A data frame containing the imported matrix, with the required columns and any allowed extra columns.
Creates a standardized data frame for systematic literature review with predefined columns, allowing the addition of custom columns if needed.
init_matrix(...)
init_matrix(...)
... |
Optional. Additional column names (as character strings) to be appended to the matrix. |
The matrix includes the following predefined columns: - 'year': Numeric. Year of publication. - 'citation': Character. Citation or reference details. - 'keywords': Character. Keywords or tags for the study. - 'profession': Character. Profession of the study participants or target audience. - 'electronic': Logical. Indicates whether the study is available electronically. - 'purpose': Character. Purpose or objective of the study. - 'study_design': Character. Study design or methodology. - 'outcome_var': Character. Outcome variables measured in the study. - 'predictor_var': Character. Predictor variables considered in the study. - 'sample': Numeric. Sample size. - 'dropout_rate': Numeric. Dropout or attrition rate. - 'setting': Character. Study setting (e.g., clinical, educational). - 'inclusion_criteria': Character. Inclusion criteria for participants. - 'ethnicity': Character. Ethnic background of participants. - 'age': Numeric. Age of participants. - 'sex': Factor. Sex of participants. - 'income': Factor. Income level of participants. - 'education': Character. Educational background of participants. - 'measures': Character. Measures or instruments used for data collection. - 'analysis': Character. Analytical methods used. - 'results': Character. Summary of results or findings. - 'limitations': Character. Limitations of the study. - 'implications': Character. Implications or recommendations from the study. - 'ethical_concerns': Character. Ethical concerns addressed in the study. - 'biases': Character. Potential biases in the study. - 'notes': Character. Additional notes or observations.
Custom columns can also be added by passing their names via the '...' argument.
A data frame with predefined columns for literature review analysis.
# Create a basic literature review matrix lit_matrix <- init_matrix()
# Create a basic literature review matrix lit_matrix <- init_matrix()
matriz_message()
produces a message about the package version
and the version of R making use of this package.
matriz_message()
matriz_message()
dmatriz_message()
returns a message about the install version
of matriz.
JP Monteagudo
matriz_message()
matriz_message()
This function calls init_matrix()
to obtain a matrix or data frame,
then extracts the class of each column. It returns a data frame containing
the class information for each column.
matriz_names(...)
matriz_names(...)
... |
extra arguments to pass as column names for the literature matrix |
The purpose of this function is to provide the user with a quick way to check the default names and classes as the matrix is being filled instead of having to type 'str(init_matrix())' every time the user forgets a category in the default matrix.
A data frame with one column named class
that lists the class
of each column from the matrix or data frame returned by init_matrix()
.
matriz_names()
matriz_names()
This function merges two literature matrices based on specified key columns, with options for full or inner joins and duplicate column removal.
merge_matrix( .data, .data2, by = NULL, all = FALSE, remove_dups = TRUE, suffixes = c(".x", ".y"), silent = FALSE )
merge_matrix( .data, .data2, by = NULL, all = FALSE, remove_dups = TRUE, suffixes = c(".x", ".y"), silent = FALSE )
.data |
A data frame to be merged. |
.data2 |
A second data frame to be merged with '.data'. |
by |
A character vector specifying the column(s) to merge by. Must exist in both data frames. |
all |
A logical value indicating whether to perform a full join ('TRUE') or an inner join ('FALSE', default). |
remove_dups |
A logical value indicating whether to remove duplicate columns before merging. Default is 'TRUE'. |
suffixes |
A character vector of length 2 specifying suffixes to apply to overlapping column names from '.data' and '.data2', respectively. Default is 'c(".x", ".y")'. |
silent |
A logical value indicating whether to suppress messages about duplicate column removal. Default is 'FALSE'. |
The function first ensures that '.data' and '.data2' are valid data frames and checks that the 'by' columns exist in both. If 'remove_dups = TRUE', duplicate columns are removed before merging. The function then performs either a full or inner join using 'dplyr::full_join()' or 'dplyr::inner_join()', respectively.
A merged data frame with specified join conditions applied.
df1 <- data.frame(id = c(1, 2, 3), value1 = c("A", "B", "C")) df2 <- data.frame(id = c(2, 3, 4), value2 = c("X", "Y", "Z")) # Inner join (default) merge_matrix(df1, df2, by = "id") # Full join merge_matrix(df1, df2, by = "id", all = TRUE) # Remove duplicate columns before merging df3 <- data.frame(id = c(1, 2, 3), value1 = c("A", "B", "C"), extra = c(1, 2, 3)) df4 <- data.frame(id = c(2, 3, 4), value2 = c("X", "Y", "Z"), extra = c(4, 5, 6)) merge_matrix(df3, df4, by = "id", remove_dups = TRUE)
df1 <- data.frame(id = c(1, 2, 3), value1 = c("A", "B", "C")) df2 <- data.frame(id = c(2, 3, 4), value2 = c("X", "Y", "Z")) # Inner join (default) merge_matrix(df1, df2, by = "id") # Full join merge_matrix(df1, df2, by = "id", all = TRUE) # Remove duplicate columns before merging df3 <- data.frame(id = c(1, 2, 3), value1 = c("A", "B", "C"), extra = c(1, 2, 3)) df4 <- data.frame(id = c(2, 3, 4), value2 = c("X", "Y", "Z"), extra = c(4, 5, 6)) merge_matrix(df3, df4, by = "id", remove_dups = TRUE)
Reads multiple BibTeX citations from files and updates the corresponding rows in a literature matrix with formatted citations, keywords, and years.
process_batch_citation(.data, citations, where = NULL)
process_batch_citation(.data, citations, where = NULL)
.data |
A data frame containing at least three columns:
|
citations |
Character vector of file paths to BibTeX citation files |
where |
Numeric vector indicating which rows to update. If NULL (default), all rows will be updated. |
A data frame with updated citation information in the specified rows
format_batch_ama_citation
, parse_batch_citation
Takes a record list and a citation string, processes the citation into AMA format, and updates the record with the formatted citation, keywords, and year.
process_citation(.record, citation)
process_citation(.record, citation)
.record |
A list containing the record to be updated |
citation |
A character string containing a BibTeX citation |
An updated list containing the original record with added fields:
citation |
The formatted AMA citation |
keywords |
A vector of keywords from the citation |
year |
The publication year |
Filters a literature matrix based on a specified condition, with the option to restrict the search to a specific column. The function supports both column names and numeric indices for column selection.
search_record(.data, column = NULL, where = NULL)
search_record(.data, column = NULL, where = NULL)
.data |
A data frame to search within. |
column |
Optional. The column to search in, specified either by name or numeric index. If NULL (default), the search is performed across all columns. |
where |
A logical expression that defines the search condition. Must evaluate to a logical vector of the same length as the number of rows in '.data'. |
A filtered data frame containing only the rows that match the search condition. If a specific column was selected, only that column is returned.
df <- data.frame( id = 1:5, name = c("John", "Jane", "Bob", "Alice", "John"), age = c(25, 30, 35, 28, 40) ) # Search across all columns where age > 30 search_record(df, where = age > 30) # Search only in the name column for "John" search_record(df, column = "name", where = name == "John") # Search using column index search_record(df, column = 2, where = name == "Jane")
df <- data.frame( id = 1:5, name = c("John", "Jane", "Bob", "Alice", "John"), age = c(25, 30, 35, 28, 40) ) # Search across all columns where age > 30 search_record(df, where = age > 30) # Search only in the name column for "John" search_record(df, column = "name", where = name == "John") # Search using column index search_record(df, column = 2, where = name == "Jane")
Remove all rows from a literature matrix but preserve the general structure. Mimics SQL's TRUNCATE operation by clearing data while preserving structure.
truncate(.data, keep_rows = FALSE)
truncate(.data, keep_rows = FALSE)
.data |
A data frame or matrix to be truncated |
keep_rows |
Logical. If TRUE, replaces non-NA values with NA instead of removing all data |
An empty data frame or matrix with the same structure as the input
# Completely empty a data frame df <- data.frame(x = 1:3, y = 4:6) truncate(df) # Replace non-NA values with NA while keeping structure truncate(df, keep_rows = TRUE)
# Completely empty a data frame df <- data.frame(x = 1:3, y = 4:6) truncate(df) # Replace non-NA values with NA while keeping structure truncate(df, keep_rows = TRUE)
Modifies the values in a specified column of a data frame for rows that meet a given condition.
update_record(.data, column = NULL, where = NULL, set_to = NULL, ...)
update_record(.data, column = NULL, where = NULL, set_to = NULL, ...)
.data |
A data frame. The dataset to modify. |
column |
A column in the data frame to update. Can be specified as a column name, index, or unquoted column symbol. |
where |
A condition that determines which rows to update. Must evaluate to a logical vector of the same length as the number of rows in '.data'. |
set_to |
The value to assign to the rows in the specified column where the 'where' condition is 'TRUE'. |
... |
Additional arguments (currently unused, reserved for future use). |
This function updates values in a specified column of a data frame for rows that satisfy the given condition. The 'column' parameter can be provided as: - A numeric column index (e.g., '2'). - A column name (e.g., '"value"'). - An unquoted column symbol (e.g., 'value').
The modified data frame with updated values.
# Example data frame df <- data.frame( id = 1:5, value = c(10, 20, 30, 40, 50) ) # Update rows where id > 3 updated_df <- update_record(df, column = value, where = id > 3, set_to = 100) print(updated_df) # Using column as a string updated_df <- update_record(df, column = "value", where = id == 2, set_to = 99) print(updated_df)
# Example data frame df <- data.frame( id = 1:5, value = c(10, 20, 30, 40, 50) ) # Update rows where id > 3 updated_df <- update_record(df, column = value, where = id > 3, set_to = 100) print(updated_df) # Using column as a string updated_df <- update_record(df, column = "value", where = id == 2, set_to = 99) print(updated_df)
This function ensures that the imported data contains all required columns, optionally removes unwanted extra columns, and provides informative messages about the dataset's structure.
validate_columns( data, extra_columns = NULL, drop_extra = FALSE, silent = FALSE )
validate_columns( data, extra_columns = NULL, drop_extra = FALSE, silent = FALSE )
data |
A data frame containing the imported matrix. |
extra_columns |
A character vector of allowed additional columns beyond the required ones. Defaults to NULL. |
drop_extra |
A logical value indicating whether to remove extra columns that are not in 'extra_columns'. Defaults to FALSE. |
silent |
A logical value indicating whether to suppress messages. Defaults to FALSE. |
The function checks whether all required columns are present in the data. If any required columns are missing, it stops execution and informs the user.
It also identifies extra columns beyond the required set and compares them against the allowed 'extra_columns'. If 'drop_extra = TRUE', it removes any extra columns not listed in 'extra_columns'. If 'drop_extra = FALSE', it retains the extra columns but issues a message unless 'silent = TRUE'.
A cleaned data frame with required columns intact and, optionally, extra columns removed.
The function assumes that column names in 'data' are correctly formatted and case-sensitive.