ATIM (Alignment-free Tree-based Identification Method): a simple SIDE (sequence identification engine)
Damon P. Little1
1Lewis B. and Dorothy Cullman Program for Molecular Systematics, The New York Botanical Garden, Bronx, NY, USA
This set of scripts is designed to transform a set of FASTA formated sequences into a queriable DNA barcoding reference database. These scripts were first used by Little and Stevenson (2007).
(1) Create a MySQL database: “mysql -u root -p barcode < db-tables.sql”.
(2) Create a table of motifs: “patterns.pl size motifs”.
8-10 bp motifs are recommended.
(3) Import fasta formatted sequences: “fst2mysql.pl sequences.fasta division locus”.
The file “sequences.fasta” is assumed to be DNA sequences in FASTA format (either GenBank FASTA, or FASTA with id_genus_species).
(4) Barcode sequences: “count.pl locus”.
(5) Export matrix: “mysql2ss.pl locus”.
Note that the script prints to standard out, so a redirect operator (e.g. '>') should be used to capture the output in a file.
(6) Analyze the resulting matrix using TNT.
Little, D. P. 2007. ATIM (Alignment-free Tree-based Identification Method): a simple SIDE (sequence identification engine). Program distributed by the author.