Changelog • plotsdatabase

plotsdatabase 1.5

Breaking Changes

query_plots() now returns a list by default instead of a flat data frame
- Output is automatically structured based on inventory method using the new output styles system
- Different styles organize data into separate tables: metadata, individuals, censuses, height-diameter, etc.
- Action required: To preserve old behavior (flat data frame), use output_style = "full"
- Rationale: Structured output makes it easier to work with complex plot data without overwhelming column counts
- See documentation for ?query_plots for details on available output styles

New Features

Configurable output styles system for query_plots()
- 6 predefined output styles: minimal, standard, permanent_plot, permanent_plot_multi_census, transect, full
- Auto-detection of appropriate style based on method field (e.g., “1 ha plot” → permanent_plot)
- Manual style selection via output_style parameter
- Each style returns a structured list with relevant tables (e.g., $metadata, $individuals, $censuses)
- Column renaming from database names to user-friendly names (e.g., ddlat → latitude, tax_sp_level → species)
- New configuration files: R/output_styles_config.R, R/output_styles_helpers.R
Specialized output tables for permanent plots
- $censuses table: plot_name, census_number, census_date, team_leader, principal_investigator
- $height_diameter table: Paired height-diameter measurements (id_n, D, H, POM) with issue filtering
- Handles multiple censuses with automatic pivoting from wide to long format
- Census-specific column renaming (e.g., stem_diameter_census_1 → dbh_census_1)
Custom print method for query results
- New S3 class plot_query_list with informative print method
- Shows table dimensions, column names, and geometry type for sf objects
- Makes it easy to understand query result structure
Preservation of spatial data
- coordinates_sf table automatically included when show_all_coordinates = TRUE
- Print method detects and displays sf geometry information

Code Refactoring

Modular output style configuration
- Centralized style definitions in .plot_output_styles list
- Method-to-style mapping in .method_to_style_map
- Style auto-detection function .detect_style_from_method()
- Easy to add new output styles by extending configuration
Improved metadata extraction
- Uses res_meta_data table (created before individual extraction) for metadata source
- Ensures all plot-level columns available even when extract_individuals = TRUE
- Consistent variable naming and error handling

Bug Fixes

Fixed commented @export tag causing roxygen2 errors
- Removed @export from commented-out subplot_list() function in R/subsplots_features_function.R
- Prevents documentation build failures

plotsdatabase 1.4 (development version)

New Features

Traits enrichment module in taxonomic matching Shiny app
- New tab “Enrich with Traits” allows enriching matched taxonomic names with trait data from the taxa database
- Aggregates multiple input names that match to the same taxon into a single row
- Concatenates all input names (e.g., “cola edulis | coula edrulis” → “Coula edulis”)
- Configurable options for categorical trait aggregation (mode vs concatenation)
- User can select which columns to include (original names, corrected names, IDs, metadata)
- Downloads enriched data as Excel file
- Filters out id_trait_measures columns for cleaner output
- Module: mod_traits_enrichment_ui() and mod_traits_enrichment_server()
Enhanced file upload in taxonomic matching Shiny app
- CSV file support added (in addition to Excel .xlsx and .xls)
- Excel sheet selector allows choosing which sheet to import from multi-sheet workbooks
- Sheet selector appears dynamically after Excel file upload
- Default sheet selection is the first sheet
- CSV files are loaded directly without sheet selection

Bug Fixes

Fixed NA input names appearing in trait enrichment
- Enrichment module now filters out rows where the input taxonomic name is NA or empty
- Prevents invalid NA entries from being matched to taxa or included in enriched output
- Applied in both trait fetching and result aggregation steps
Fixed incorrect input names in enrichment output
- Enrichment now correctly uses the user-selected taxonomic name column (not first column of dataset)
- column_name parameter now passed from main app to enrichment module
- Ensures input_names column shows actual taxonomic names from the selected column

Code Refactoring

Optimized taxonomic name cleaning for faster matching
- Name cleaning (removing “sp.”, “cf.”, “aff.”, etc.) now happens before batch exact matching
- Previously, cleaning only occurred during slow fuzzy matching phase
- Names like “Coula edulis sp.” now match exactly to “Coula edulis” in fast batch step
- Significantly reduces number of names sent to slower fuzzy matching
- Cleaning happens once at beginning, benefiting all matching steps (species, genus, family)
- Both original and cleaned names preserved in matching pipeline
- Added underscore replacement in clean_taxonomic_name() (e.g., “Coula_edulis” → “Coula edulis”)

Breaking Changes

query_taxa() default behavior changed: exact_match parameter now defaults to TRUE (was FALSE)
- Exact matching is now the default for family/genus/order queries to prevent unexpected fuzzy matching results
- For species queries, if exact match fails, the function automatically falls back to intelligent fuzzy matching
- Action required: Code relying on fuzzy matching by default should explicitly set exact_match = FALSE
- Rationale: Higher taxonomic ranks are standardized names where fuzzy matching rarely helps and can introduce errors

New Features

Intelligent taxonomic name matching with genus-constrained fuzzy search
- New match_taxonomic_names() function implements hierarchical matching strategy:
  1. Exact matching (fastest)
  2. Genus-constrained fuzzy matching (searches species only within matched genus)
  3. Full fuzzy matching (last resort)
- Dramatically improves match quality by restricting fuzzy search space
- Includes synonym detection and resolution
- Supports scoring and ranking of multiple matches
- New helper functions: parse_taxonomic_name(), .match_exact_sql(), .match_genus_constrained_sql(), .match_fuzzy_sql()
Auto fuzzy fallback for species queries
- query_taxa() automatically retries with fuzzy matching when exact species match fails
- Transparent user feedback shows match quality (similarity score)
- Handles typos and spelling variations automatically
- Only applies to species queries; family/genus/order use exact matching only
Database enhancement: tax_level field added to table_taxa
- New column explicitly indicates taxonomic level: “species”, “genus”, “family”, “order”, “infraspecific”, “higher”
- Indexed for query performance
- Eliminates ambiguity between missing data and genus/family-level taxa
- Script provided: add_tax_level_field.R for database migration
- All query functions updated to use new field for cleaner, more reliable filtering

Code Refactoring

Complete rewrite of query_taxa() to use new intelligent matching functions
- Eliminated redundancy with helpers.R functions
- 8 new modular helper functions replace complex inline logic
- Cleaner separation of concerns: matching, filtering, synonym resolution, formatting, trait addition
- ~160 lines of code removed through better abstraction
- Better maintainability and extensibility
- Deprecated query_fuzzy_match() and query_exact_match() in favor of match_taxonomic_names()
Simplified taxonomic level filtering using tax_level field
- Replaced complex multi-column checks (e.g., is.na(tax_esp) & is.na(tax_gen)) with simple tax_level == "family"
- Applied in query_taxa() for clearer intent and better performance via index usage

Bug Fixes

Fixed query_taxa() empty results with only_family = TRUE
- Previously, fuzzy matching by default caused empty results when filtering for family-level taxa
- Now uses exact matching by default for higher taxonomic ranks

Dependencies

Added new package dependencies to DESCRIPTION:
- cli - User-friendly command line interfaces (moved from Suggests to Imports)
- lifecycle - Manage function lifecycle (deprecation warnings)
- data.table - High-performance data manipulation
- glue - String interpolation for SQL queries
- RecordLinkage - String similarity calculations

plotsdatabase 1.0

Breaking Changes

Database schema change: Renamed column ind_num_sous_plot to tag in data_individuals and followup_updates_individuals tables
- All R package functions updated to use new column name
- Action required: External scripts accessing ind_num_sous_plot must be updated to use tag
- Updated files: R/functions_manip_db.R, R/individual_features_function.R, R/functions_divid_plot.R, R/generate_plot_summary.Rmd, structure.yml
- Default parameter in approximate_isolated_xy() changed from tag = "ind_num_sous_plot" to tag = "tag"

New Features

Initial release of package structure with comprehensive database query functions
Enhanced update_ident_specimens(): Now shows summary of linked individuals before updating specimen identification
- Displays which plots and how many individuals will inherit the new identification
- Shows current taxonomic identification of linked individuals
- Provides better context for informed decision-making before confirmation
- New helper function .get_linked_individuals_summary() queries and summarizes impact

Bug Fixes

Connection error with complex home paths: Fixed create_db_config() function that failed when home directory path contained spaces or special characters (e.g., OneDrive paths like C:/Users/NOBUS CAPITAL/OneDrive/Documents/)
- Added proper error handling with tryCatch() for file creation
- Creates parent directories if they don’t exist
- Falls back to in-memory configuration if file cannot be written
- Users now get informative warnings instead of connection failures

Documentation

Added comprehensive README.md with package overview, quick start guide, and function reference
README includes prominent link to NEWS.md for tracking updates

Infrastructure

Added NEWS.md to track package changes and updates
Established git branching workflow for all code modifications

Code Refactoring

Major refactoring: Reorganized R/functions_manip_db.R (previously 10,528 lines) into modular, domain-specific files
- Created R/growth_census_functions.R (556 lines) - Growth computation and census analysis functions
- Created R/specimen_linking_functions.R (406 lines) - Herbarium specimen linking and querying functions
- Created R/taxonomic_query_functions.R (944 lines) - Taxonomic query functions with synonym resolution
- Created R/taxonomic_update_functions.R (838 lines) - Taxonomic data update and entry functions
- Expanded R/connections_db.R with database query utilities (func_try_fetch, try_open_postgres_table)
- Removed ~6,800 lines from R/functions_manip_db.R through extraction to specialized modules
- All functions verified as moved (not duplicated) to new locations
- Improved code maintainability and discoverability