Glossary

Protein splicing: Protein splicing is an intramolecular reaction of a particular protein in which an internal protein segment (called an intein) is removed from a precursor protein with a ligation of C-terminal and N-terminal external proteins (called exteins) on both sides.

Post translation: Alteration of protein after translation: may include cleavage from a larger precursor proteins, the removal of amino acid and attachment of other molecules to the protein.

Pattern: Molecular biological patterns usually occur at the level of the characters making up the gene or protein sequence. A pattern language must be defined in order to apply different criteria to different positions of a sequence. In order to have position-specific comparison done by a computer, a pattern-matching algorithm must allow alternative residues at a given position, repetitions of a residue, exclusion of alternative residues, weighting, and ideally, combinatorial representation.

Peptide: A short stretch of amino acids each covalently coupled by a peptide (amide) bond.

Peptide bond (amide bond): A covalent bond formed between two amino acids when the amino group of one is linked to the carboxy group of another (resulting in the elimination of one water molecule).

Polypeptide: A single chain of covalently attached amino acids joined by peptide bonds. Polypeptide chains usually fold into a compact, stable form (a domain) that is part (or all) of the final protein.

Post-transcriptional modification: Alterations made to pre-mRNA before it leaves the nucleus and becomes mature mRNA.

Post-translational modification: Alterations made to a protein after its synthesis at the ribosome. These modifications, such as the addition of carbohydrate or fatty acid chains, may be critical to the function of the protein.

Primary sequence (protein): The linear sequence of a polypeptide or protein.

Protein: A molecule composed of one or more chains of amino acids in a specific order; the order is determined by the base sequence of nucleotides in the gene coding for the protein.

Profile: Sequence profiles are usually derived from multiple alignments of sequences with a known relationship, and consist of tables of position-specific scores and gap-penalties. Each position in the profile contains scores for all of the possible amino acids, as well as one penalty score for opening and one for continuing a gap at the specified position. Attempts have been made to further improve the sensitivity of the profile by refining the procedures to construct a profile starting from a given multiple alignment. Other representations for sequence domains or motifs do not necessarily require the presence of a correct and complete multiple alignment, such as hidden Markov models.

Prokaryote: An organism or cell that lacks a membrane-bounded nucleus. Bacteria and blue-green algae are the only surviving prokaryotes (cf. Eukaryote).

Protein families: Sets of proteins that share a common evolutionary origin reflected by their relatedness in function which is usually reflected by similarities in sequence, or in primary, secondary or tertiary structure. Subsets of proteins with related structure and function.

P value: The probability of an alignment occurring with the score in question or better.

Pairwise alignment: In a pairwise alignment, two sequences are padded by gaps, to achieve same length, and to display maximum similarity/conservation on a character-by-character basis.

Paralogous: Homologous sequences (sequences that share a common evolutionary ancestor) that diverged by gene duplication, as opposed to orthologs, which diverged by speciation.

Perl: An interpreted computer language for easily manipulating text, files and processes.

Phylogeny: A classification scheme that indicates the evolutionary relationships between organisms.

PROSITE: A database of "patterns" (regular expressions) specific for various protein motifs.