Introduction
This tutorial will teach you how to create a Deep Zoom Collection that can be viewed and filtered in Microsoft Live Labs Pivot using a ChIP-Seq example dataset. This can be applied to any GFF3 based data set that has corresponding GBrowse or UCSC views.
Prerequisites
- Pivot application for Windows Vista/7 (local download from CBRG site) or the PivotViewer plugin.
- BioPivot tools installed.
- A directory to on a UNIX based file system to store a collection of images to and the CXML metadata file. Ideally the directory should be mounted and visible on a web server to allow you do view the data in Pivot at each stage. For example, you can create the example below under a public_html directory if you are using the Apache web server and then copy paste the URL into the Pivot Browser.
- A GFF3 file. An example is included in the BioPivot tools download. This in the data directory.
- UCSC or GBrowse database that contains the data you want to view. In the example, our CBRG Human GBrowse database is used.
Creating a basic collection
Download the example GFF3 file and config file. This config file actually has more definitions than you need when you run the first analysis. However, these are ignored until they are needed. Other examples are available in the data directory when you install BioPivot. Please note the config file must be tab separated. Multiple spaces between name and type are not allowed.
Run the following command in the same directory:
gff32pivot.pl example.gff3 -dzi -generateimage -conf types.cfg -browser gbrowse2 -cxml example.cxml
This will create (if it doesn't exist) a directory called img and using the URL link specified in the types.cfg file, download images based on the URL
and GFF3 file coordinates which define the interest (ROI). The URL referenced should be based on:
- A UCSC session
- A bookmarked GBrowse session
You can use http, https and password protected URLs. The extra information you need to put in the URL is $chromosome, $start and $end where the ROI coordinates would usually go. Other examples of a UCSC config and GBrowse2 config are here.
Once downloaded, in the image directory, the -dzi flag will create an xml file for each image and a subdirectory containing
a pyramid of the image tiles at the different zoom levels. Finally the example.cxml data file is created. This the file that is loaded into the Pivot viewer and contains facet data, derived
from the gff file.
When you run gff32pivot.pl for the first time is a good idea to specify the option -test, for example
gff32pivot.pl example.gff3 -test 1 -generateimage -conf types.cfg -browser gbrowse2 -cxml example.cxml
To check you image has been pulled down ok (look in the img directory to view it).
The example only contains 5 genome regions. Of course, using real data
you may have have several thousand genome regions you want to visualise, in which case the command may take several hours to run and you should use something like
nohup so your process won't be terminated when you log off.
Adding further annotations
If your GFF3 file contains the standard 8 columns and the ninth column is empty (as in the example.gff3 this doesn't really show the power of using Pivot. Adding name various name/value pairs
to the ninth column that describes further your genome region can be very powerful. For example, you may want to filter all the genome regions that contain CpG islands. In which case you can add "overlaps_CPG=TRUE;" or "overlaps_CPG=FALSE;" to the ninth column. Or you may want to be able to "filter for all genome regions that are 1000bp upstream from the nearest gene". For this you can add the calculated distance from the nearest gene to the region e.g. "distance_from_exon1=950;". You can add just about any criteria to the ninth column that can be filtered for later.
There are few tools in BioPivot that make this easier to do. Run the following command:
annotate.pl example.gff3 hg18 > example.annotated.gff3
This uses cisgenome (ref)
refgene_getnearestgene and refgene_getlocationsummary to get the nearest gene to the genome region and whether it falls in various gene specific elements such as introns and exons. Using the precalculated refseq/GO index it adds relevant go terms which can be very useful when filtering data.
If you look at the GFF3 file now you will see that various fields in the ninth column have been added.
Adding overlaps
We will now test to see if our set of intervals overlap CpG islands. This is done using the following script:
intersectappend.pl example.annotated.gff3 cpg.gtf -facet overlap_CPG > example.annotated.overlap.GFF3
which is a wrapper around BEDTools(ref) which adds "overlaps_CPG=TRUE;" or "overlaps_CPG=FALSE;" to the ninth column. Any gtf, gff or BED file can be used either from publicly available features downloaded from the UCSC Table Browser or BioMart.
Updating the config file
Any new name/value pair is added to the GFF3 file, an entry must be added into the config file and the type of data stated. When this is translated to the CXML file it allows the Pivot viewer to use the correct graphical component. For example, a String type is selected using a checkbox and a Number type is filtered using a slider bar. There are also other types including LINK which allows Pivot to link to another site via URL. Here is an example config file.
Make a copy of the types.cfg file (call it newtypes.cfg) and add a new row called
overlaps_CPG String
Updating the cxml file
Once you have done all the changes it is necessary to rerun the gff32pivot.pl script, but this time leaving out the -generateimage and -dzi. If you don't omit these options, you will redownload and reindex all the images. You only need to include these latter two options if you add/remove data. So, in this case, run:
gff32pivot.pl example.annotated.overlap.gff3 -conf newtypes.cfg -cxml example.cxml
This will then regenerate the cxml containing the new information.
View in Pivot
Go to the examples page to try out using Pivot within a web browser and using the standalone Pivot Windows client.
Setting up Pivot in Silverlight on Apache
The advantage of running Pivot in Silverlight is it will run on Windows and Mac OS within a web browser. There are various
links to running this using IIS
but it can be run on an Apache server. To do this you will need to edit the /etc/mime.types file and restart the server.
application/xaml+xml xaml
application/xaml+xml xap
application/manifest manifest
application/xaml+xml xaml
application/x-ms-application application
application/x-ms-xbap xbap
application/octet-stream deploy
application/vnd.ms-xpsdocument xps
Then download the files SilverlightApplication164.xap to the web server.
A quick way to make a page that will embed the Silverlight viewer is to copy the example index.html file. Then edit this copy to add in the URL (as below)
param name="initParams" value="collection=http://www.cbrg.ox.ac.uk/data/biopivot/example/small/example.cxml"
and make sure the SilverlightApplication164.xap and Silverlight.js file URL paths are referenced in the page correctly.
Load the URL of the page into your web browser and (if the Silverlight control is installed) after a short time the cxml and tiles will download and you will see the Pivot viewer embedded in the web page.
|