How to use the Seeker webtool

Step 1: Input DNA sequences

UPLOAD FILE- Upload a FASTA DNA sequences file. The server will process your input and return results pertaining to each sequence in the input.
PASTE SEQUENCES - Directly paste FASTA DNA sequences in the text box. PLEASE NOTE – the server is limited to 25 sequences per upload. For larger files, please download the Seeker package.

15

or RUN AN EXAMPLE - select the "sample input" checkbox

15

Step 2: Press the "Submit" button

Step 3: Results
Once Seeker has been applied to your sequences, the following results pages will be shown.
A. Stats: General statistics of the input sequences

2

Upper left and middle panels – histograms of Seeker scores assigned to all 1K segments in the input data vs. normal and log-normal distributions, respectively.
Upper right panel – overlaid histogram of the Seeker scores for each sequence in the input FASTA.
Middle panel – barplots showing the mean, median, maximum and minimum Seeker scores assigned to each sequence in the input FASTA.
Bottom panel – dotplots showing Seeker scores assigned to each sequence in the input FASTA.

B. Circular: A circular plot showing the mean Seeker score assigned to each sequence in the input FASTA
3

(each wedge in the circle is a single sequence) with a color scale (presented to the right of the plot). The highlighted line (white) is a Seeker score of 0.5
(used as threshold to distinguish phage (Seeker>0.5) from bacteria (Seeker<0.5)). The grey bars show the minimun
(light grey) and maximum (dark grey) Seeker scores assigned to each sequence in the input FASTA.
The right table shows the mean Seeker scores assigned to each sequence in the input FASTA,
and can be downloaded using the ‘Download Table’ button
C. Boxplots: Upper panel - Boxplots showing the distribution of the Seeker scores assigned to

4

each sequence in the input FASTA. Box center lines indicate medians, box edges represent the interquartile range,
whiskers extend to the most extreme data points not considered outliers, and the outliers are plotted individually.
Bottom panel (if 10 or fewer input sequences) – an interactive visualization of the Seeker scores assigned to each
sequence in the input FASTA by the location in the genome.