Prepares a cleaned and processed dataset ready for Bayesian analysis of a specified entomological variable. It applies geographic and temporal filters, computes proportions, cleans insecticide data, applies study-specific filters, harmonizes taxonomy, and computes the final outcome variable.

creation_df(
  varname,
  geo = c("Africa-E", "Africa-W", "Americas", "Asia-Pacific", NA),
  year_min = -Inf,
  year_max = +Inf,
  extern_data = NULL
)

Arguments

varname

Character string. The variable of interest. Supported values: `"parous_rate"`, `"endophagy"`, `"endophily"`, `"indoor_HBI"`, `"outdoor_HBI"`, `"sac_rate"`, `"HBI"`

geo

Character vector. Geographic regions or continents to include in the analysis. Defaults to WHO major regions: `c("Africa-E", "Africa-W", "Americas", "Asia-Pacific", NA)`.

year_min

Integer or numeric. Minimum year of data collection (inclusive). Default is `-Inf` (no limit).

year_max

Integer or numeric. Maximum year of data collection (inclusive). Default is `+Inf` (no limit).

extern_data

Data frame. Optional user-supplied dataset to append to the internal repository. Must follow the same structure as the internal dataset used in `create_repo()`. Default is `NULL`.

Value

A named list containing:

`data.req`

The fully prepared and filtered dataset.

`varname`

The variable name used in the processing.

`nice_varname`

Human-readable label of the variable of interest.

`species_complex`

Reference table mapping species and complexes to numeric identifiers.

Details

This function orchestrates multiple preprocessing steps to transform raw entomological survey data into a format suitable for Bayesian analysis. If the user provides an external dataset via `extern_data`, it is appended to the internal dataset before filtering and processing.

The steps are as follows:

  1. Loads raw data using `create_repo()`. If `extern_data` is provided, it is appended using `adding_data_extern()`.

  2. Sets variable-specific parameters via `set_var_params()`.

  3. Filters dataset by specified geographic regions and years using `region_period_filter()`.

  4. Computes numerators and denominators for proportions with `augment_withProportion_modif()`.

  5. Removes or keeps rows based on unknown insecticide intervention using `create_datareq()`.

  6. Applies variable-specific epidemiological filters using `filter_bz_studies()`.

  7. Harmonizes taxonomy and generates numeric IDs using `augment_with_taxonomy()`.

  8. Calculates the outcome variable as the ratio of numerator to denominator.

Internal checks stop execution early if filtering steps remove all observations.