Computational Biology Research Group University of Oxford
EMBOSS suite
 
CBRG Home
CBRG Accounts (molbiol)
Analysis tools  - ANALOG  - BASE (microarrays)  - BLAST  - EMBOSS  - GBrowse  - Proteomics (Mascot)  - Unix analysis software
Training courses
Tutorials
Unix help
Examples
Papers
Collaborative data
Presentations
Oxford-only section
FAQ: CBRG + UNIX
FAQ: Bioinformatics
Links
 
 
 

SITE MAP

EMBOSS

EMBOSS is "European Molecular Biology Open Software Suite". It is a free Open Source software analysis package specially developed for the needs of the molecular biology user community.

The EMBOSS suite is available via Unix or online.

Within EMBOSS are more than 100 programs. Some of the areas covered are:

  • Sequence alignment
  • Rapid database searching with sequence patterns
  • Protein motif identification, including domain analysis
  • Nucleotide sequence pattern analysis, for example to identify CpG islands or repeats
  • Codon usage analysis for small genomes
  • Rapid identification of sequence patterns in large scale sequence sets
  • Presentation tools for publication
  • And much more...


Run EMBOSS programs online

Go to EMBOSS Explorer.


Comparison between the programs in GCG and EMBOSS

There is a large overlap in the functionality available in EMBOSS and the commercial GCG (Wisconsin Package). This table is a mapping between the programs of the two packages (Full details of all the programs that make up the GCG package can be found here).

Much of the content of this section was originally conceived at Helix Systems, CIT, NIH (http://helix.nih.gov/) and has been reproduced here with kind permission.



GCG program EMBOSS program Description/Comments
Assemble merger Construct new sequences from pieces of existing sequences; merger only accepts 2 sequences while assemble accepts several.
BackTranslate backtranseq Backtranslate protein -> nucleotide sequence
BestFit water
matcher
Bestfit uses the Smith-Waterman algorithm to find the best local alignment between 2 sequences. water uses Smith-Waterman, matcher uses Pearson's lalign algorithm.
BLAST
Psiblast
dbiBlast NCBI homology search between query and database
Breakup splitter Splits a sequence into (overlapping) smaller sequences
Chopup - Helps to convert a non-GCG sequence format.
Not needed in EMBOSS because it reads most sequence formats without conversion
CodonFrequency chips
compseq
cusp
CodonFrequency -- tabulates codon usage.
compseq -- counts composition of dimer/trimer in sequence.
chips -- calculates codon usage stats
cusp -- creates a codon usage table.
CodonPreference syco
wobble
Recognize protein coding sequences
CoilScan pepcoil Predicts coiled-coil regions
Compare + Dotplot dottup + dotmatcher 2-sequence comparison
Composition compseq
pepstats
Sequence composition
CompressText - Removes extra whitespace in text files. Can be done via Unix shell script.
CompTable - Creates a scoring matrix
Consensus prophecy Creates a consensus sequence or matrices/profiles from multiple alignments
Correspond codcmp Codon usage table comparison
Corrupt msbar Randomly mutate sequence
DataSet dbiflat
dbiblast
dbigcg
Creates searchable sequence database. GCG's Dataset requires sequences in GCG format, whereas dbiflat, dbiblast, dbigcg will take most formats between them.
Detab - Replaces tabs with spaces in sequence files. Can be performed by Unix shell command.
Distances - Calculates pairwise evolutionary distances between aligned sequences. The Phylip package can do this.
Diverge - Estimates pairwise substitutions per site between 2 or more coding sequences. The Phylip package can do this.
DotPlot dottup
dotmatcher
 
ExtractPeptide transeq ExtractPeptide takes the output of Map and can write one or more of the reading-frame translations. transeq translates one or more of the frames or specific regions directly from an input nucleotide sequence.
FastA
FastX
Tfasta
TfastX
- Pearson's homology-search program, available as a standalone program on molbiol.
Fetch seqret
seqretsplit
Pull one or more sequences out of the databases. seqret/seqretsplit can save output in various sequence formats.
Figure - Generates plots from other GCG programs.
FindPatterns fuzznuc
fuzzpro
searches for patterns in a sequence or database
FingerPrint - Finds the products of T1 ribonuclease digestion.
FitConsensus - Use after Consensus to find the best fits.
FrameAlign - Finds best local alignment including frame shifts between a protein and nucleotide sequence.
Frames plotorf
showorf
Show open reading frames. plotorf does this graphically
Framesearch - Homology searches including frameshifts between protein and nucleotide sequences
FromEMBL
FromFasta
FromGenbank
FromIG
FromPIR
FromStaden
Fromtrace
- Converts from various formats to GCG sequence format. Unnecessary in EMBOSS because it can accept most sequence formats, but seqret can convert between formats if desired.
Gap needle
stretcher
Needleman-Wunsch algorithm to compare 2 sequences. stretcher uses the Myers-Miller algorithm which is more memory-efficient.
Gapshow plotcon Graphical representation of similarity of 2 sequences.
GCGtoBlast - Makes a BLAST database. Use NCBI's 'formatdb' instead.
GelAssemble
GelDisassemble
GelEnter
GelMerge
GelStart
GelView
megamerger
merger
union
Parts of GCG's gel assembly suite.
Getseq seqret Type in a new sequence
Growtree - Creates phylogenetic tree. Can use Phylip or Clustal instead.
HelicalWheel pepwheel Plots peptide sequence as helical wheel to help recognize amphiphilic regions.
HmmerAlign
HmmerBuild
HmmerCalibrate
HmmerEmit
HmmerFetch
HmmerIndex
HmmerPfam
HmmerSearch
- The HMMER package is available on molbiol.
HTHScan helixturnhelix Finds HTH motifs in protein sequences.
Isoelectric iep Calculates isoelectric point of a protein.
Lineup - Edits multiple sequence alignments
Listfile - for printing. Can use Unix psprint command instead.
Lookup - Versatile program for finding sequences in a database. "whichdb" in EMBOSS can search for accession numbers, but lookup is much more sophisticated. Can use NCBI tool at http://www.ncbi.nlm.nih.gov/
Map
Mapplot
Mapsort
restrict
remap
restover
finds restriction enzyme cleavage sites.
MeltTemp dan Computes melting temperature of oligos
MEME - Finds conserved motifs in a group of unaligned sequences.
MFold - Predicts nucleotide secondary structure. GCG's version is an old version of Zuker's MFOLD.
Moment pepnet
octanol
Makes a contour plot of the helical hydrophobic moment of a peptide sequence
Motifs patmatmotifs Finds common Prosite motifs in a sequence. Note that not all Prosite motifs will be found due to a bug in the GCG and EMBOSS programs. Use Interproscan instead (http://www.ebi.ac.uk/InterProScan/)
Meme + Motifsearch prophecy + profit Search a sequence or database with a matrix or profile.
Names infoseq provides some info about sequence specifications.
NetBlast
Netfetch
- remote access to NCBI's Blast. Use standalone BLAST on molbiol instead.
NoOverlap diffseq Finds differences between 2 sequences. NoOverlap can work with a group of sequences.
OldDistances - Makes a table of the pairwise similarities within a group of sequenes.
onecase - converts sequence into lower or upper case. Can be performed by Unix shell command.
Overlap - Compares 2 sets of sequences using Wilbur-Lipman algorithm.
Paupdisplay
Paupsearch
- PAUP Phylogenetic Analysis.
Pepdata getorf
sixpack
Translates in all 6 reading frames. sixpack displays the DNA sequence with 6-frame translations and orfs.
Pepplot pepinfo Pepplot plots protein 2ndary structure and hydrophobicity. pepinfo plots hydrophobicity, and garnier does protein 2ndary structure prediction.
Peptidemap digest Enzyme/reagent cleavage map of a protein.
Peptidesort - Sorts fragments from an enzyme/reagent cleavage of a protein according to position, mol. wt., and HPLC retention.
Peptidestructure
Plotstructure
garnier Secondary structure prediction.
Pileup (emma) Multiple sequence alignment. ("emma" interface to the clustalw alignment program. Can also use the standalone Clustal on molbiol, or ClustalW at the EBI.)
PlasmidMap cirdna
lindna
Plot DNA constructs.
PlotFold - Plots MFold output.
PlotSimilarity plotcon Graphical representation of the similarity along a set of aligned sequences.
Pretty
prettybox
cons
prettyplot
showalign
Calculates consensus sequence from a multiple sequence alignment, and displays them prettily.
Prime eprimer3 Selects oligonucleotide primers.
Profilegap
Profilemake
prophecy
prophet
Creates matrices/profiles from multiple alignments. Gapped alignment for profiles and sequences.
PrimePair primersearch Evaluates individual primers to determine their compatibility for use as PCR primer pairs.
Profilescan patmatdb Searches sequences or db for protein motifs. Profilescan uses Gribskov method.
Profilesearch profit Scans a sequence or database with a matrix or profile.
Profilesegments - Alignments for results of Profilesearch
Publish seqret
showseq
Makes publication-quality displays of sequences.
Reformat seqret GCG requires input sequences to be in GCG format, hence other formats need to be converted with 'reformat'. EMBOSS programs accept most sequence formats, but 'seqret' can be used to convert between formats if desired.
Repeat equicktandem
etandem
einverted
palindrome
Finds tandem repeats in sequences. The equivalent group of EMBOSS programs will also look for inverted or palindromic repeats.
Replace biosed
degapseq
Replaces characters in a text file. Degapseq is specific for replacing gap characters. Can be performed with Unix shell utilities like sed, awk or tr.
Reverse revseq Reverse/complement a sequence.
Sample extractseq Extract regions from a sequence.
Seg maskseq Masks off low-complexity regions from a sequence.
Seqed biosed
cutseq
degapseq
descseq
entret
extractfeat
extractseq
listor
maskfeat
maskseq
newseq
noreturn
notseq
nthseq
pasteseq
revseq
seqret
seqretsplit
skipseq
splitter
trimest
trimseq
union
vectorstrip
yank
Sequence editor. EMBOSS has many tools for specific editting tasks. Or use a text editor (but not a word processor).
SeqLab - X-windows interface to GCG.
Setkeys - Redefines keyboard keys, mainly used for GCG's gel assembly programs.
Shiftover - Moves text by column. Use the Unix nedit editor instead.
Shuffle shuffleseq Shuffles a sequence.
Simplify - Reduce the number of symbols in a sequence.
Spew - Sends a sequence from a remote computer to your desktop. Oldfashioned way of file transfer, rarely used now.
SPScan sigcleave Predicts signal peptides in protein sequences.
Ssearch - Part of Pearson's Fasta package, available as a standalone program on molbiol.
StatPlot - Plotting program. Rarely used.
StemLoop palindrome
etandem
Finds inverted repeats.
Stringsearch textsearch Finds text phrases in sequence or database. Use NCBI Entrez instead.
Terminator - searches for prokaryotic factor-independent RNA polymerase terminators according to the method of Brendel and Trifonov.
Testcode wobble Plots 3rd-position variability as an indicator of potential coding regions.
ToFastA
ToIG
ToPIR
ToStaden
seqret EMBOSS accepts most sequence formats, therefore format conversion is rarely required.
Translate transeq Translates nucleotide -> Protein sequences
Transmem - predicts transmembrane helices.
Window + Statplot freak Residue/base frequency table or plot.
Wordsearch/Segments - Homology search using Wilbur/Lipman algorithm. Segments displays the result.
Xnu - Masks tandem repeats for future BLAST search. Available as a standalone program on molbiol
- abiview Reads ABI file and displays trace
- antigenic Finds antigenic sites in proteins
- banana Bending and curvature plot in B-DNA
- btwisted Calculates the twisting in a B-DNA sequence
- cai CAI codon adaptation index, to measure synonymous codon usage bias.
- chaos Create a chaos game representation plot for a sequence
- charge Protein charge plot.
- checktrans Reports STOP codons and ORF statistics of a protein
- coderet Extract CDS, mRNA and translations from feature tables
- cpgplot
cpgreport
newcpgreport
newcpgseek
Plots and reports CpG-rich regions.
seqed cutseq Removes a specified section from a sequence. seqed is interactive, cutseq is command-line.
seqed degapseq Alter name/description of sequence.
Findpatterns dreg Regular expression search of a sequence. Findpatterns is an approximate equivalent.
- emowse Protein identification by Mass spectrometry.
- est2genome Align EST and genomic DNA sequences.
- extractfeat Extract features from a sequence.
- findkm Find Km and Vmax for an enzyme reaction by a Hanes/Woolf plot
- fuzztran Protein pattern search after translation
- geecee Calculates the fractional GC content of nucleic acid sequences
- isochore Plots isochores in large DNA sequences
- listor Writes a list file of the logical OR of two sets of sequences
- marscan Finds MAR/SAR sites in nucleic sequences
- maskfeat Mask off features of a sequence.
- mwcontam Shows mol wts that match across a set of files
- mwfilter Filter noisy mol wts from mass spec output
- noreturn remove carriage return from a ASCII files. Can be performed by Unix utilities like 'tr'.
Reformat nthseq Pulls one sequence out of a multiple set. Reformat will pull a sequence out of an MSF or RSF file.
- oddcomp Finds protein sequence regions with a biased composition
- pestfind Finds PEST motifs as potential proteolytic cleavage sites
- polydot Displays all-against-all dotplots of a set of sequences
- printsextract Extract data from PRINTS
- pscan Scans proteins using PRINTS
- rebaseextract
redata
Search and extract from REBASE.
- recoder Remove restriction sites but maintain the same translation
- seqmatchall all-against-all comparison of a set of sequences.
- showdb Shows info about currently available databases.
- showfeat Shows features of a sequence
- silent Silent mutation restriction enzyme scan
- sirna Finds siRNA duplexes in mRNA
- stssearch Searches a DNA database for matches with a set of STS primers
- supermatcher Finds a match of a large sequence against one or more sequences
- tfextract Extract data from TRANSFAC database.
gcghelp tfm shows documentation for a program.
- tfscan Scans DNA sequences for transcription factors
- tmap Displays membrane spanning regions
- tranalign Align nucleic coding regions given the aligned proteins
- trimest
trimseq
Trim bits off ends of sequences. Can be done interactively with GCG's seqed.
- twofeat finds neighbouring pairs of features in sequences
- vectorstrip Strips out DNA between a pair of vector sequences
- wordcount Counts words of a specified size in a DNA sequence
- wordmatch Finds all exact matches of a given size between 2 sequences

top | back




Search CBRG web site:

CBRG support

This file last modified Friday September 10, 2010