Computational Biology Research Group University of Oxford
BioPivot
 
CBRG Home
CBRG Accounts (molbiol)
Analysis tools
Training courses
Tutorials
Unix help
Examples
Papers
Collaborative data
Presentations
Oxford-only section
FAQ: CBRG + UNIX
FAQ: Bioinformatics
Links
 
 
 

SITE MAP

BioPivot

  - Introduction
  - Download
  - Install
  - Tutorial
  - Examples
  - Manuals

Introduction

This tutorial will teach you how to create a Deep Zoom Collection that can be viewed and filtered in Microsoft Live Labs Pivot using a ChIP-Seq example dataset. This can be applied to any GFF3 based data set that has corresponding GBrowse or UCSC views.

Prerequisites

  • Pivot application for Windows Vista/7 (local download from CBRG site) or the PivotViewer plugin.
  • BioPivot tools installed.
  • A directory to on a UNIX based file system to store a collection of images to and the CXML metadata file. Ideally the directory should be mounted and visible on a web server to allow you do view the data in Pivot at each stage. For example, you can create the example below under a public_html directory if you are using the Apache web server and then copy paste the URL into the Pivot Browser.
  • A GFF3 file. An example is included in the BioPivot tools download. This in the data directory.
  • UCSC or GBrowse database that contains the data you want to view. In the example, our CBRG Human GBrowse database is used.

Creating a basic collection

Download the example GFF3 file and config file. This config file actually has more definitions than you need when you run the first analysis. However, these are ignored until they are needed. Other examples are available in the data directory when you install BioPivot. Please note the config file must be tab separated. Multiple spaces between name and type are not allowed.

Run the following command in the same directory:

gff32pivot.pl example.gff3 -dzi -generateimage -conf types.cfg -browser gbrowse2 -cxml example.cxml

This will create (if it doesn't exist) a directory called img and using the URL link specified in the types.cfg file, download images based on the URL and GFF3 file coordinates which define the interest (ROI). The URL referenced should be based on:

  • A UCSC session
  • A bookmarked GBrowse session

You can use http, https and password protected URLs. The extra information you need to put in the URL is $chromosome, $start and $end where the ROI coordinates would usually go. Other examples of a UCSC config and GBrowse2 config are here.

Once downloaded, in the image directory, the -dzi flag will create an xml file for each image and a subdirectory containing a pyramid of the image tiles at the different zoom levels. Finally the example.cxml data file is created. This the file that is loaded into the Pivot viewer and contains facet data, derived from the gff file.

When you run gff32pivot.pl for the first time is a good idea to specify the option -test, for example

gff32pivot.pl example.gff3 -test 1 -generateimage -conf types.cfg -browser gbrowse2 -cxml example.cxml

To check you image has been pulled down ok (look in the img directory to view it).

The example only contains 5 genome regions. Of course, using real data you may have have several thousand genome regions you want to visualise, in which case the command may take several hours to run and you should use something like nohup so your process won't be terminated when you log off.

Adding further annotations

If your GFF3 file contains the standard 8 columns and the ninth column is empty (as in the example.gff3 this doesn't really show the power of using Pivot. Adding name various name/value pairs to the ninth column that describes further your genome region can be very powerful. For example, you may want to filter all the genome regions that contain CpG islands. In which case you can add "overlaps_CPG=TRUE;" or "overlaps_CPG=FALSE;" to the ninth column. Or you may want to be able to "filter for all genome regions that are 1000bp upstream from the nearest gene". For this you can add the calculated distance from the nearest gene to the region e.g. "distance_from_exon1=950;". You can add just about any criteria to the ninth column that can be filtered for later.

There are few tools in BioPivot that make this easier to do. Run the following command:

annotate.pl example.gff3 hg18 > example.annotated.gff3

This uses cisgenome (ref) refgene_getnearestgene and refgene_getlocationsummary to get the nearest gene to the genome region and whether it falls in various gene specific elements such as introns and exons. Using the precalculated refseq/GO index it adds relevant go terms which can be very useful when filtering data.

If you look at the GFF3 file now you will see that various fields in the ninth column have been added.

Adding overlaps

We will now test to see if our set of intervals overlap CpG islands. This is done using the following script:

intersectappend.pl example.annotated.gff3 cpg.gtf -facet overlap_CPG > example.annotated.overlap.GFF3

which is a wrapper around BEDTools(ref) which adds "overlaps_CPG=TRUE;" or "overlaps_CPG=FALSE;" to the ninth column. Any gtf, gff or BED file can be used either from publicly available features downloaded from the UCSC Table Browser or BioMart.

Updating the config file

Any new name/value pair is added to the GFF3 file, an entry must be added into the config file and the type of data stated. When this is translated to the CXML file it allows the Pivot viewer to use the correct graphical component. For example, a String type is selected using a checkbox and a Number type is filtered using a slider bar. There are also other types including LINK which allows Pivot to link to another site via URL. Here is an example config file.

Make a copy of the types.cfg file (call it newtypes.cfg) and add a new row called

overlaps_CPG String

Updating the cxml file

Once you have done all the changes it is necessary to rerun the gff32pivot.pl script, but this time leaving out the -generateimage and -dzi. If you don't omit these options, you will redownload and reindex all the images. You only need to include these latter two options if you add/remove data. So, in this case, run:

gff32pivot.pl example.annotated.overlap.gff3 -conf newtypes.cfg -cxml example.cxml

This will then regenerate the cxml containing the new information.

View in Pivot

Go to the examples page to try out using Pivot within a web browser and using the standalone Pivot Windows client.

Setting up Pivot in Silverlight on Apache

The advantage of running Pivot in Silverlight is it will run on Windows and Mac OS within a web browser. There are various links to running this using IIS but it can be run on an Apache server. To do this you will need to edit the /etc/mime.types file and restart the server.

application/xaml+xml            xaml
application/xaml+xml            xap
application/manifest            manifest
application/xaml+xml            xaml
application/x-ms-application    application
application/x-ms-xbap           xbap
application/octet-stream        deploy
application/vnd.ms-xpsdocument  xps

Then download the files SilverlightApplication164.xap to the web server.

A quick way to make a page that will embed the Silverlight viewer is to copy the example index.html file. Then edit this copy to add in the URL (as below)

param name="initParams" value="collection=http://www.cbrg.ox.ac.uk/data/biopivot/example/small/example.cxml"

and make sure the SilverlightApplication164.xap and Silverlight.js file URL paths are referenced in the page correctly.

Load the URL of the page into your web browser and (if the Silverlight control is installed) after a short time the cxml and tiles will download and you will see the Pivot viewer embedded in the web page.



Search CBRG web site:

CBRG support

This file last modified Wednesday November 10, 2010