Sequence Alignment

This application implements two fundamental algorithms for DNA sequence alignment:

These algorithms enable biologists to score the similarity between DNA sequences and view the aligned portions.

The Needleman-Wunsch algorithm aligns two DNA sequences over their entire length. Here's how it works:

Create a matrix where nucleotides of sequence 1 are along the vertical axis and sequence 2 along the horizontal axis
Initialize the first row and column with gap penalties (default: -2)
Fill each cell with the maximum score from:
- Cell to the left + gap penalty (horizontal movement)
- Cell above + gap penalty (vertical movement)
- Diagonal cell + match reward (+1) or mismatch penalty (-1)
Traceback from the bottom-right cell to top-left, following the path that generated each score
The alignment score is the value in the bottom-right cell

Use case: When you need to align entire sequences to compare overall similarity.

The Smith-Waterman algorithm finds the most similar subsequences within two DNA sequences:

Create a matrix similar to Needleman-Wunsch
Fill each cell with the maximum score from:
- Cell to the left + gap penalty
- Cell above + gap penalty
- Diagonal cell + match reward or mismatch penalty
- Zero (0) - allows alignment to start fresh
Traceback from the highest scoring cell until reaching zero or a cell with no direction
The alignment score is the highest value in the entire matrix

Use case: When you need to find conserved regions or similar subsequences within larger DNA sequences.

Parameter	Value	Description
Match Score	+1	Reward for matching nucleotides
Mismatch Penalty	-1	Penalty for non-matching nucleotides
Gap Penalty	-2	Penalty for inserting a gap in alignment

DNA sequences should be provided as strings containing only the nucleotide letters:

The input is case-insensitive. Sequences can be entered manually (separated by newlines) or uploaded as a CSV file.

For each pair of sequences, the application provides:

Global Alignment 1 & 2: The two sequences aligned over their entire length, with gaps (-) inserted for optimal alignment
Global Score: Integer score representing overall similarity (can be negative)
Local Alignment 1 & 2: The most similar subsequences from each sequence, aligned with gaps
Local Score: Positive integer representing the similarity of the aligned subsequences
Timestamp: When the alignment was computed

DNA Sequence Alignment Algorithms