BioMetaHarmonizer Documentation
BioMetaHarmonizer is a universal Python library and command-line tool for fetching, harmonizing, and standardizing NCBI BioSample metadata at scale. It resolves both BioSample accessions (SAMN/SAME/SAMD) and assembly accessions (GCF_/GCA_) to a fixed 51-column output schema, parses collection dates to ISO 8601 format, resolves geographic locations to ISO 3166-1 country codes, classifies isolation sources into One Health categories (Human, Animal, Food, Environmental, etc.), and extracts antibiogram drug-susceptibility tables from NCBI Pathogen records. The tool is aimed at computational biologists and epidemiologists working with large-scale genomic metadata from public repositories.
Getting Started
User Guide
Reference