ATIM (Alignment-free Tree-based Identification Method): a simple SIDE (sequence identification engine)

Damon P. Little¹

¹Lewis B. and Dorothy Cullman Program for Molecular Systematics, The New York Botanical Garden, Bronx, NY, USA

description

This set of scripts is designed to transform a set of FASTA formated sequences into a queriable DNA barcoding reference database. These scripts were first used by Little and Stevenson (2007).

script usage

(1) Create a MySQL database: “mysql -u root -p barcode < db-tables.sql”.

(2) Create a table of motifs: “patterns.pl size motifs”.
8-10 bp motifs are recommended.

(3) Import fasta formatted sequences: “fst2mysql.pl sequences.fasta division locus”.
The file “sequences.fasta” is assumed to be DNA sequences in FASTA format (either GenBank FASTA, or FASTA with id_genus_species).

(4) Barcode sequences: “count.pl locus”.

(5) Export matrix: “mysql2ss.pl locus”.
Note that the script prints to standard out, so a redirect operator (e.g. '>') should be used to capture the output in a file.

(6) Analyze the resulting matrix using TNT.

requirements

PERL interpreter
MySQL
TNT

citation

Little, D. P. 2007. ATIM (Alignment-free Tree-based Identification Method): a simple SIDE (sequence identification engine). Program distributed by the author.

download

ATIM