augment_with_taxonomy.Rd
Enriches a dataset with harmonized taxonomic information by merging a compatibility table and generating synthetic entries when needed. This ensures that all observations are mapped at the species level—even when only complex-level data is available—and prevents imbalances in downstream statistical models.
augment_with_taxonomy(data, compat)
A named list with two elements:
The augmented dataset, with harmonized species/complex names and corresponding numeric IDs.
A reference tibble mapping unique species and complex combinations to numeric codes.
The following steps are applied to ensure taxonomic consistency:
Taxonomic Join: The `compat` table is joined to the input dataset via the `survey` column.
Synthetic Complex Names: If `complex` is missing (`""`) for a given species, a synthetic complex is created as `"[species]_complex"`.
Synthetic Species Names: If an entry is missing `species` but has a valid `complex`, a pseudo-species is generated with the label `"unlabel_[complex]"` to ensure all rows are treated as species-level data.
Numeric Identifiers:
`speciesNb`: numeric ID for each (real or synthetic) species.
`complexNb`: numeric ID for each complex, offset by 1 to avoid overlap.
These steps normalize taxonomic granularity in the dataset, preventing over- or under-representation of records during model estimation (e.g., of intercepts like `beta0`).