Sequence Alignment
Enter your DNA sequences to perform global and local sequence alignment.
DNA Sequence Alignment Algorithms
Overview
This application implements two fundamental algorithms for DNA sequence alignment:
- Needleman-Wunsch Algorithm - For global alignment
- Smith-Waterman Algorithm - For local alignment
These algorithms enable biologists to score the similarity between DNA sequences and view the aligned portions.
Needleman-Wunsch Algorithm (Global Alignment)
The Needleman-Wunsch algorithm aligns two DNA sequences over their entire length. Here's how it works:
- Create a matrix where nucleotides of sequence 1 are along the vertical axis and sequence 2 along the horizontal axis
- Initialize the first row and column with gap penalties (default: -2)
- Fill each cell with the maximum score from:
- Cell to the left + gap penalty (horizontal movement)
- Cell above + gap penalty (vertical movement)
- Diagonal cell + match reward (+1) or mismatch penalty (-1)
- Traceback from the bottom-right cell to top-left, following the path that generated each score
- The alignment score is the value in the bottom-right cell
Use case: When you need to align entire sequences to compare overall similarity.
Smith-Waterman Algorithm (Local Alignment)
The Smith-Waterman algorithm finds the most similar subsequences within two DNA sequences:
- Create a matrix similar to Needleman-Wunsch
- Fill each cell with the maximum score from:
- Cell to the left + gap penalty
- Cell above + gap penalty
- Diagonal cell + match reward or mismatch penalty
- Zero (0) - allows alignment to start fresh
- Traceback from the highest scoring cell until reaching zero or a cell with no direction
- The alignment score is the highest value in the entire matrix
Use case: When you need to find conserved regions or similar subsequences within larger DNA sequences.
Scoring Parameters
| Parameter | Value | Description |
|---|---|---|
| Match Score | +1 | Reward for matching nucleotides |
| Mismatch Penalty | -1 | Penalty for non-matching nucleotides |
| Gap Penalty | -2 | Penalty for inserting a gap in alignment |
Input Format
DNA sequences should be provided as strings containing only the nucleotide letters:
- A - Adenine
- T - Thymine
- C - Cytosine
- G - Guanine
The input is case-insensitive. Sequences can be entered manually (separated by newlines) or uploaded as a CSV file.
Output Explanation
For each pair of sequences, the application provides:
- Global Alignment 1 & 2: The two sequences aligned over their entire length, with gaps (-) inserted for optimal alignment
- Global Score: Integer score representing overall similarity (can be negative)
- Local Alignment 1 & 2: The most similar subsequences from each sequence, aligned with gaps
- Local Score: Positive integer representing the similarity of the aligned subsequences
- Timestamp: When the alignment was computed