creation_df.Rd
Prepares a cleaned and processed dataset ready for Bayesian analysis of a specified entomological variable. It applies geographic and temporal filters, computes proportions, cleans insecticide data, applies study-specific filters, harmonizes taxonomy, and computes the final outcome variable.
creation_df(
varname,
geo = c("Africa-E", "Africa-W", "Americas", "Asia-Pacific", NA),
year_min = -Inf,
year_max = +Inf,
extern_data = NULL
)
Character string. The variable of interest. Supported values: `"parous_rate"`, `"endophagy"`, `"endophily"`, `"indoor_HBI"`, `"outdoor_HBI"`, `"sac_rate"`, `"HBI"`
Character vector. Geographic regions or continents to include in the analysis. Defaults to WHO major regions: `c("Africa-E", "Africa-W", "Americas", "Asia-Pacific", NA)`.
Integer or numeric. Minimum year of data collection (inclusive). Default is `-Inf` (no limit).
Integer or numeric. Maximum year of data collection (inclusive). Default is `+Inf` (no limit).
Data frame. Optional user-supplied dataset to append to the internal repository. Must follow the same structure as the internal dataset used in `create_repo()`. Default is `NULL`.
A named list containing:
The fully prepared and filtered dataset.
The variable name used in the processing.
Human-readable label of the variable of interest.
Reference table mapping species and complexes to numeric identifiers.
This function orchestrates multiple preprocessing steps to transform raw entomological survey data into a format suitable for Bayesian analysis. If the user provides an external dataset via `extern_data`, it is appended to the internal dataset before filtering and processing.
The steps are as follows:
Loads raw data using `create_repo()`. If `extern_data` is provided, it is appended using `adding_data_extern()`.
Sets variable-specific parameters via `set_var_params()`.
Filters dataset by specified geographic regions and years using `region_period_filter()`.
Computes numerators and denominators for proportions with `augment_withProportion_modif()`.
Removes or keeps rows based on unknown insecticide intervention using `create_datareq()`.
Applies variable-specific epidemiological filters using `filter_bz_studies()`.
Harmonizes taxonomy and generates numeric IDs using `augment_with_taxonomy()`.
Calculates the outcome variable as the ratio of numerator to denominator.
Internal checks stop execution early if filtering steps remove all observations.