degenbar: a simple SIDE (sequence identification engine)
Damon P. Little1
1Lewis B. and Dorothy Cullman Program for Molecular Systematics Studies, The New York Botanical Garden, Bronx, NY, USA
This pair of scripts is designed to transform a set of FASTA formated sequences into a queriable DNA–BAR reference database for DNA barcoding. These scripts were first used by Little and Stevenson (2007). The script “degenbar-in.pl” generates an input file from FASTA formated sequences for the “degenbar” executable* which implements the DNA–BAR method of DasGupta et al. (2005). The resulting matrix of distinguishers is then queried by the “degenbar” script. The query sequence is scored for the presence or absence of each distinguisher (10–50 nucleotide in length). The reference sequence(s) with the greatest number of matching presence/absence scores is(are) taken to be the identification.
(1) degenbar-in.pl in_file
The “in_file” is assumed to be DNA sequences in FASTA format. The parameters used by the degenbar executable can be changed in the output of degenbar-in.pl or permanently in the script itself. Note that the script prints to standard out, so a redirect operator (e.g. '>') should be used to capture the output in a file.
(2) degenbar in_file out_file
This step uses the compiled c exicutable* and the file produced in the previous step as “in_file”.
(3) degenbar degenbar.out query.fasta
This step uses the PERL script, the file produced in the previous step as “degenbar.out”, and a query file containing a single FASTA formated sequence.
Little, D. P. 2007. degenbar: a simple SIDE (sequence identification engine). Program distributed by the author.
*The line “#define MAX_NUM_TARGETS 100” in “degenbar.c” may need to be modified to accept a greater number of sequences prior to compiling.