ATIM (Alignment-free Tree-based Identification Method): a simple SIDE (sequence identification engine)



Damon P. Little1

1Lewis B. and Dorothy Cullman Program for Molecular Systematics, The New York Botanical Garden, Bronx, NY, USA


description

This set of scripts is designed to transform a set of FASTA formated sequences into a queriable DNA barcoding reference database. These scripts were first used by Little and Stevenson (2007).

script usage

(1) Create a MySQL database: “mysql -u root -p barcode < db-tables.sql”.

(2) Create a table of motifs: “patterns.pl size motifs”.
8-10 bp motifs are recommended.

(3) Import fasta formatted sequences: “fst2mysql.pl sequences.fasta division locus”.
The file “sequences.fasta” is assumed to be DNA sequences in FASTA format (either GenBank FASTA, or FASTA with id_genus_species).

(4) Barcode sequences: “count.pl locus”.

(5) Export matrix: “mysql2ss.pl locus”.
Note that the script prints to standard out, so a redirect operator (e.g. '>') should be used to capture the output in a file.

(6) Analyze the resulting matrix using TNT.

requirements

PERL interpreter
MySQL
TNT

citation

Little, D. P. 2007. ATIM (Alignment-free Tree-based Identification Method): a simple SIDE (sequence identification engine). Program distributed by the author.

download

ATIM