EMBOSS is "European Molecular Biology Open Software Suite".
It is a free Open Source software analysis package specially developed for the needs of the molecular biology
user community.
Within EMBOSS are more than 100 programs. Some of the areas covered are:
There is a large overlap in the functionality available in EMBOSS and the commercial
GCG (Wisconsin Package). This
table is a mapping between the programs of the two packages (Full details of all the programs that make up the
GCG package can be found here).
| GCG program |
EMBOSS program |
Description/Comments |
| Assemble |
merger |
Construct new sequences from pieces of existing sequences; merger only accepts
2 sequences while assemble accepts several. |
| BackTranslate |
backtranseq |
Backtranslate protein -> nucleotide sequence |
| BestFit |
water
matcher |
Bestfit uses the Smith-Waterman algorithm to find the best local alignment
between 2 sequences. water uses Smith-Waterman, matcher uses Pearson's lalign algorithm. |
BLAST
Psiblast |
dbiBlast |
NCBI homology search between query and database |
| Breakup |
splitter |
Splits a sequence into (overlapping) smaller sequences |
| Chopup |
- |
Helps to convert a non-GCG sequence format.
Not needed in EMBOSS because it reads most sequence formats without conversion |
| CodonFrequency |
chips
compseq
cusp |
CodonFrequency -- tabulates codon usage.
compseq -- counts composition of dimer/trimer in sequence.
chips -- calculates codon usage stats
cusp -- creates a codon usage table. |
| CodonPreference |
syco
wobble |
Recognize protein coding sequences |
| CoilScan |
pepcoil |
Predicts coiled-coil regions |
| Compare + Dotplot |
dottup +
dotmatcher |
2-sequence comparison |
| Composition |
compseq
pepstats |
Sequence composition |
| CompressText |
- |
Removes extra whitespace in text files. Can be done via Unix shell script. |
| CompTable |
- |
Creates a scoring matrix |
| Consensus |
prophecy |
Creates a consensus sequence or matrices/profiles from multiple alignments |
| Correspond |
codcmp |
Codon usage table comparison |
| Corrupt |
msbar |
Randomly mutate sequence |
| DataSet |
dbiflat
dbiblast
dbigcg |
Creates searchable sequence database. GCG's Dataset requires sequences in
GCG format, whereas dbiflat, dbiblast, dbigcg will take most formats between them. |
| Detab |
- |
Replaces tabs with spaces in sequence files. Can be performed by Unix shell command. |
| Distances |
- |
Calculates pairwise evolutionary distances between aligned sequences. The Phylip package can do this. |
| Diverge |
- |
Estimates pairwise substitutions per site between 2 or more coding sequences.
The Phylip package can do this. |
| DotPlot |
dottup
dotmatcher |
|
| ExtractPeptide |
transeq |
ExtractPeptide takes the output of Map and can write one or more of the reading-frame
translations. transeq translates one or more of the frames or specific
regions directly from an input nucleotide sequence. |
FastA
FastX
Tfasta
TfastX |
- |
Pearson's homology-search program, available as a standalone program on molbiol. |
| Fetch |
seqret
seqretsplit |
Pull one or more sequences out of the databases. seqret/seqretsplit can
save output in various sequence formats. |
| Figure |
- |
Generates plots from other GCG programs. |
| FindPatterns |
fuzznuc
fuzzpro |
searches for patterns in a sequence or database |
| FingerPrint |
- |
Finds the products of T1 ribonuclease digestion. |
| FitConsensus |
- |
Use after Consensus to find the best fits. |
| FrameAlign |
- |
Finds best local alignment including frame shifts between a protein and nucleotide sequence. |
| Frames |
plotorf
showorf |
Show open reading frames. plotorf does this graphically |
| Framesearch |
- |
Homology searches including frameshifts between protein and nucleotide sequences |
FromEMBL
FromFasta
FromGenbank
FromIG
FromPIR
FromStaden
Fromtrace |
- |
Converts from various formats to GCG sequence format. Unnecessary in EMBOSS
because it can accept most sequence formats, but seqret can convert between formats if desired. |
| Gap |
needle
stretcher |
Needleman-Wunsch algorithm to compare 2 sequences. stretcher uses the Myers-Miller
algorithm which is more memory-efficient. |
| Gapshow |
plotcon |
Graphical representation of similarity of 2 sequences. |
| GCGtoBlast |
- |
Makes a BLAST database. Use NCBI's 'formatdb' instead. |
GelAssemble
GelDisassemble
GelEnter
GelMerge
GelStart
GelView
|
megamerger
merger
union |
Parts of GCG's gel assembly suite. |
| Getseq |
seqret |
Type in a new sequence |
| Growtree |
- |
Creates phylogenetic tree. Can use Phylip or Clustal instead. |
| HelicalWheel |
pepwheel |
Plots peptide sequence as helical wheel to help recognize amphiphilic regions. |
HmmerAlign
HmmerBuild
HmmerCalibrate
HmmerEmit
HmmerFetch
HmmerIndex
HmmerPfam
HmmerSearch |
- |
The HMMER package is available on molbiol.
|
| HTHScan |
helixturnhelix |
Finds HTH motifs in protein sequences. |
| Isoelectric |
iep |
Calculates isoelectric point of a protein. |
| Lineup |
- |
Edits multiple sequence alignments |
| Listfile |
- |
for printing. Can use Unix psprint command instead. |
| Lookup |
- |
Versatile program for finding sequences in a database. "whichdb" in EMBOSS can
search for accession numbers, but lookup is much more sophisticated.
Can use NCBI tool at http://www.ncbi.nlm.nih.gov/ |
Map
Mapplot
Mapsort |
restrict
remap
restover |
finds restriction enzyme cleavage sites. |
| MeltTemp |
dan |
Computes melting temperature of oligos |
| MEME |
- |
Finds conserved motifs in a group of unaligned sequences. |
| MFold |
- |
Predicts nucleotide secondary structure. GCG's version is an old version
of Zuker's MFOLD. |
| Moment |
pepnet
octanol |
Makes a contour plot of the helical hydrophobic moment of a peptide sequence |
| Motifs |
patmatmotifs |
Finds common Prosite motifs in a sequence. Note that not all Prosite motifs
will be found due to a bug in the GCG and EMBOSS programs. Use Interproscan
instead (http://www.ebi.ac.uk/InterProScan/) |
| Meme + Motifsearch |
prophecy +
profit |
Search a sequence or database with a matrix or profile. |
| Names |
infoseq |
provides some info about sequence specifications. |
NetBlast
Netfetch |
- |
remote access to NCBI's Blast. Use standalone BLAST on molbiol instead. |
| NoOverlap |
diffseq |
Finds differences between 2 sequences. NoOverlap can work with a group of sequences. |
| OldDistances |
- |
Makes a table of the pairwise similarities within a group of sequenes.
|
| onecase |
- |
converts sequence into lower or upper case. Can be performed by Unix shell command. |
| Overlap |
- |
Compares 2 sets of sequences using Wilbur-Lipman algorithm. |
Paupdisplay
Paupsearch |
- |
PAUP Phylogenetic Analysis. |
| Pepdata |
getorf
sixpack |
Translates in all 6 reading frames. sixpack displays the DNA sequence with
6-frame translations and orfs. |
| Pepplot |
pepinfo |
Pepplot plots protein 2ndary structure and hydrophobicity. pepinfo plots
hydrophobicity, and garnier does protein 2ndary structure prediction. |
| Peptidemap |
digest |
Enzyme/reagent cleavage map of a protein. |
| Peptidesort |
- |
Sorts fragments from an enzyme/reagent cleavage of a protein according
to position, mol. wt., and HPLC retention. |
Peptidestructure
Plotstructure |
garnier |
Secondary structure prediction. |
| Pileup |
(emma) |
Multiple sequence alignment. ("emma" interface to the clustalw alignment program.
Can also use the standalone Clustal on molbiol, or ClustalW at the EBI.)
|
| PlasmidMap |
cirdna
lindna |
Plot DNA constructs. |
| PlotFold |
- |
Plots MFold output. |
| PlotSimilarity |
plotcon |
Graphical representation of the similarity along a set of aligned sequences.
|
Pretty
prettybox |
cons
prettyplot
showalign |
Calculates consensus sequence from a multiple sequence alignment, and displays them prettily. |
| Prime |
eprimer3 |
Selects oligonucleotide primers. |
Profilegap
Profilemake |
prophecy
prophet |
Creates matrices/profiles from multiple alignments. Gapped alignment for profiles and sequences. |
| PrimePair |
primersearch |
Evaluates individual primers to determine their compatibility for use as PCR primer pairs. |
| Profilescan |
patmatdb |
Searches sequences or db for protein motifs. Profilescan uses Gribskov method.
|
| Profilesearch |
profit |
Scans a sequence or database with a matrix or profile. |
| Profilesegments |
- |
Alignments for results of Profilesearch |
| Publish |
seqret
showseq |
Makes publication-quality displays of sequences. |
| Reformat |
seqret |
GCG requires input sequences to be in GCG format, hence other formats
need to be converted with 'reformat'. EMBOSS programs accept most
sequence formats, but 'seqret' can be used to convert between formats if desired. |
| Repeat |
equicktandem
etandem
einverted
palindrome |
Finds tandem repeats in sequences. The equivalent group of EMBOSS programs
will also look for inverted or palindromic repeats. |
| Replace |
biosed
degapseq |
Replaces characters in a text file. Degapseq is specific for replacing gap
characters. Can be performed with Unix shell utilities like sed, awk or tr. |
| Reverse |
revseq |
Reverse/complement a sequence. |
| Sample |
extractseq |
Extract regions from a sequence. |
| Seg |
maskseq |
Masks off low-complexity regions from a sequence. |
| Seqed |
biosed
cutseq
degapseq
descseq
entret
extractfeat
extractseq
listor
maskfeat
maskseq
newseq
noreturn
notseq
nthseq
pasteseq
revseq
seqret
seqretsplit
skipseq
splitter
trimest
trimseq
union
vectorstrip
yank |
Sequence editor. EMBOSS has many tools for specific editting tasks. Or use
a text editor (but not a word processor). |
| SeqLab |
- |
X-windows interface to GCG. |
| Setkeys |
- |
Redefines keyboard keys, mainly used for GCG's gel assembly programs. |
| Shiftover |
- |
Moves text by column. Use the Unix nedit editor instead. |
| Shuffle |
shuffleseq |
Shuffles a sequence. |
| Simplify |
- |
Reduce the number of symbols in a sequence. |
| Spew |
- |
Sends a sequence from a remote computer to your desktop.
Oldfashioned way of file transfer, rarely used now. |
| SPScan |
sigcleave |
Predicts signal peptides in protein sequences. |
| Ssearch |
- |
Part of Pearson's Fasta package, available as a standalone program on molbiol. |
| StatPlot |
- |
Plotting program. Rarely used. |
| StemLoop |
palindrome
etandem |
Finds inverted repeats. |
| Stringsearch |
textsearch |
Finds text phrases in sequence or database. Use NCBI
Entrez instead. |
| Terminator |
- |
searches for prokaryotic factor-independent RNA polymerase terminators according
to the method of Brendel and Trifonov. |
| Testcode |
wobble |
Plots 3rd-position variability as an indicator of potential coding regions.
|
ToFastA
ToIG
ToPIR
ToStaden |
seqret |
EMBOSS accepts most sequence formats, therefore format conversion is rarely required. |
| Translate |
transeq |
Translates nucleotide -> Protein sequences |
| Transmem |
- |
predicts transmembrane helices. |
| Window + Statplot |
freak |
Residue/base frequency table or plot. |
| Wordsearch/Segments |
- |
Homology search using Wilbur/Lipman algorithm. Segments displays the result.
|
| Xnu |
- |
Masks tandem repeats for future BLAST search. Available as a standalone program on molbiol |
| - |
abiview |
Reads ABI file and displays trace |
| - |
antigenic |
Finds antigenic sites in proteins |
| - |
banana |
Bending and curvature plot in B-DNA |
| - |
btwisted |
Calculates the twisting in a B-DNA sequence |
| - |
cai |
CAI codon adaptation index, to measure synonymous codon usage bias.
|
| - |
chaos |
Create a chaos game representation plot for a sequence |
| - |
charge |
Protein charge plot. |
| - |
checktrans |
Reports STOP codons and ORF statistics of a protein |
| - |
coderet |
Extract CDS, mRNA and translations from feature tables |
| - |
cpgplot
cpgreport
newcpgreport
newcpgseek |
Plots and reports CpG-rich regions. |
| seqed |
cutseq |
Removes a specified section from a sequence. seqed is interactive, cutseq is command-line. |
| seqed |
degapseq |
Alter name/description of sequence. |
| Findpatterns |
dreg |
Regular expression search of a sequence. Findpatterns is an approximate equivalent. |
| - |
emowse |
Protein identification by Mass spectrometry. |
| - |
est2genome |
Align EST and genomic DNA sequences. |
| - |
extractfeat |
Extract features from a sequence. |
| - |
findkm |
Find Km and Vmax for an enzyme reaction by a Hanes/Woolf plot |
| - |
fuzztran |
Protein pattern search after translation |
| - |
geecee |
Calculates the fractional GC content of nucleic acid sequences |
| - |
isochore |
Plots isochores in large DNA sequences |
| - |
listor |
Writes a list file of the logical OR of two sets of sequences |
| - |
marscan |
Finds MAR/SAR sites in nucleic sequences |
| - |
maskfeat |
Mask off features of a sequence. |
| - |
mwcontam |
Shows mol wts that match across a set of files |
| - |
mwfilter |
Filter noisy mol wts from mass spec output |
| - |
noreturn |
remove carriage return from a ASCII files. Can be performed by Unix utilities like 'tr'. |
| Reformat |
nthseq |
Pulls one sequence out of a multiple set. Reformat will pull a sequence
out of an MSF or RSF file. |
| - |
oddcomp |
Finds protein sequence regions with a biased composition |
| - |
pestfind |
Finds PEST motifs as potential proteolytic cleavage sites |
| - |
polydot |
Displays all-against-all dotplots of a set of sequences |
| - |
printsextract |
Extract data from PRINTS |
| - |
pscan |
Scans proteins using PRINTS |
| - |
rebaseextract
redata |
Search and extract from REBASE. |
| - |
recoder |
Remove restriction sites but maintain the same translation |
| - |
seqmatchall |
all-against-all comparison of a set of sequences. |
| - |
showdb |
Shows info about currently available databases. |
| - |
showfeat |
Shows features of a sequence |
| - |
silent |
Silent mutation restriction enzyme scan |
| - |
sirna |
Finds siRNA duplexes in mRNA |
| - |
stssearch |
Searches a DNA database for matches with a set of STS primers |
| - |
supermatcher |
Finds a match of a large sequence against one or more sequences |
| - |
tfextract |
Extract data from TRANSFAC database. |
| gcghelp |
tfm |
shows documentation for a program. |
| - |
tfscan |
Scans DNA sequences for transcription factors |
| - |
tmap |
Displays membrane spanning regions |
| - |
tranalign |
Align nucleic coding regions given the aligned proteins |
| - |
trimest
trimseq |
Trim bits off ends of sequences. Can be done interactively with GCG's seqed. |
| - |
twofeat |
finds neighbouring pairs of features in sequences |
| - |
vectorstrip |
Strips out DNA between a pair of vector sequences |
| - |
wordcount |
Counts words of a specified size in a DNA sequence |
| - |
wordmatch |
Finds all exact matches of a given size between 2 sequences |