BRONX: Barcode Recognition Obtained with Nucleotide eXposés version 2.0

Damon P. Little¹

¹Lewis B. and Dorothy Cullman Program for Molecular Systematics, The New York Botanical Garden, Bronx, NY, USA

description

This pair of scripts is designed to transform a set of FASTA formated sequences into a queriable BRONX reference database for DNA barcoding. This updated version of BRONX is noticeably faster (particularly for database creation) and more portable (it is a pure PERL implementation). The increased speed comes at the cost increased memory usage, but this should not be an issue for most users. BRONX2 retains, and in some cases, improves upon the performance of the original BRONX. The original version of BRONX remains available.

BRONX2 features new output options that provide better explanation of the barcode identification and the reference sequences used to make the identification (in html and plain text format). There is also an output format that is more amenable to use in pipelines etc.

script usage

(1) fasta2bdb.pl -i in-file.fasta -o out-file-stem

The “in-file.fasta” is assumed to be DNA sequences in FASTA format. Labels in the .fasta file must begin with a unique identifier (alphanumeric characters only; uniqueness enforced, only the last occurrence is retained) followed by an underscore and two or more taxonomic descriptors (ordered from the most inclusive to the least inclusive) separated by underscores. The last two descriptors should form the species name. If the specific epithet is unknown use “sp.” or the like. Unique identifiers can be automatically assigned using “NO-UNIQUE-IDENTIFIER” as the identifier string. The user is responsible for maintaining a consistent and balanced taxonomy.

Pretext and postext are fixed at six bases. One, two, three, four, five, and six bases of text are stored in the BRONX database. Queries attempt to use six bases of text, but automatically decrement as needed.

To use multiple markers, concatenate the reference sequences with a padding of at least 18 “Ns” between markers. No function to update the database exists, simply update the original .fasta file and create a new database. Indel characters (“-”) may be included in the .fasta file, but they are ignored completely.

The file “sample.fasta” contains reference sequences that can be processed into a BRONX database. The files “sample.bdb”, “sample.dbi”, “sample.id”, “sample.seq”, and “sample.term” collectively are a complete processed database.

(2) bronx.pl -d bronx-database [-o x] -q query.fasta

The database file (“-d”) stem name should be the same as that given to fasta2bdb.pl with the -o option. Run fasta2bdb.pl to create a set of five BRONX database files.

Optionally specify an output format: “-o 0” produces traditional BRONX 1.0 output; “-o 1” produces a comma separated variable output; “-o 2” produces a detailed html output (the additional detail requires approximately twice the processing time); and “-o 3” produces a detailed plain text output similar to the html output (with the corresponding longer processing time).

The query (“-q”) is assumed to be DNA sequences in FASTA format.

requirement

PERL interpreter

citation

Little, D. P. 2012. BRONX2: Barcode Recognition Obtained with Nucleotide eXposés. Program distributed by the author.

Little, D. P. 2011. DNA barcode sequence identification incorporating taxonomic hierarchy and within taxon variability. PLoS ONE. 6 (8): e20552.

download

BRONX2