|
Genome Browser User's Guide
|
|
| |
Table of Contents:
Last updated on 22 Nov. 2005.
Content by the (vertebrate)
UCSC Genome Browser staff.
Edited for Archeael Browser users by Todd Lowe.
| |
|
|
|
What does
the Genome Browser do?
|
|
| |
As genome sequences are completed, the issue of effective sequence display becomes critical: it is not
helpful to have several million letters of genomic DNA shown as plain text! As an
alternative, the UCSC Genome Browser provides a rapid and reliable display of any
requested portion of genomes at any scale, together with dozens of
aligned annotation tracks (known genes, predicted genes, RNA genes, promotors, functional array data,
gene homologies to other species, and more). Some of
the annotation tracks are computed at UCSC from publicly available
sequence data. Others are provided by collaborators and contributors. Users can
also add their own custom tracks to the browser for educational or research
purposes.
The Genome Browser stacks annotation tracks beneath genome
coordinate positions, allowing rapid visual correlation of different types of
information. The Genome Browser itself does not draw conclusions; rather, it
collates all relevant information in one location, leaving the exploration and
interpretation to the user.
The Genome Browser supports text and sequence based searches that provide quick, precise
access to any region of specific interest. Secondary links from individual
entries within annotation tracks lead to sequence details and supplementary off-site
databases. To control information overload, tracks need not be displayed in full. Tracks
can be hidden, collapsed into a condensed or single-line display, or filtered according
to the user's criteria. Zooming and scrolling controls help to narrow or broaden the
displayed chromosomal range to focus on the exact region of interest.
Clicking on an individual item within a track opens a details page containing a
summary of properties and links to off-site repositories such as
PubMed, NCBI, KEGG, and InterPro. The page provides item-specific information on
position, strand, data source, and encoded protein, genomic sequence
and alignment, as appropriate to the nature of the track.
A blue navigation bar at the top of the browser provides links to several other tools
and data sources. For instance, the DNA link
enables the user to view the raw genomic DNA sequence for the coordinate range
displayed in the browser window. This DNA can encode track features via elaborate
text formatting options. Other links tie the Genome Browser to the BLAT
alignment tool, provide access to the underlying relational database via the
Table Browser.
The browser data represents an immense collaborative effort
involving thousands of people from the international research community.
The Archaeal UCSC Bioinformatics Group itself does no sequencing. Although it creates
the majority of the annotation tracks in-house, the annotations are based on
publicly available data contributed by other labs and research groups. The majority of the sequence data, annotation tracks, and even
software are in the public domain and are available for anyone
to download
In addition to the Genome Browser, the UCSC Genome Bioinformatics group provides
several other tools for viewing and interpreting genome data:
| |
|
|
|
Getting Started:
Genome Browser gateways
|
|
| |
The UCSC Genome Bioinformatics home page
provides access to Genome Browsers on several different genome assemblies.
To get started, click the Browser link on the blue sidebar. This will take
you to a Gateway page where you can select which genome to display.
Opening the Genome Browser at a specific position
To get oriented in using the Genome Browser, try viewing a gene or region of
the genome with which you are already familiar, or use the default position.
To open the Genome Browser window:
| 1. |
Select the clade, genome and assembly that you wish to display from
the corresponding pull-down menus. To access older assembly versions
that are no longer available from the menu, look in the Genome
Browser archives.
|
| 2. |
Specify the genome location you'd like the Genome Browser to open to.
To select a location, enter a valid position query in the
position or search term text box at the top of the Gateway
page or accept the default position already
displayed. The search supports several different types of queries: gene
symbols, mRNA or EST accession numbers, chromosome
bands, descriptive terms likely to occur in GenBank text, or specific chromosomal
ranges. To display a region encompassed by two features on the same
chromosome, use a semi-colon, e.g. CRYBB3;CRYBB1. The Gateway page shows examples of some of the search requests supported by
the Genome Browser.
|
| 3. |
Click the submit button to open up the Genome Browser window to the
requested location. In cases where a specific term (accession, gene name,
etc.) was queried, the item will be highlighted in the display.
|
Occasionally the Gateway page returns a list of several matches in response
to a search, rather than immediately displaying the Genome Browser
window. When this occurs, click on the item in which you're interested and
the Genome Browser will open to that location.
The search mechanism is not a site-wide search engine. Instead, it
primarily searches GenBank mRNA records whose text annotations can include
gene names, gene symbols, journal title words, author names, and RefSeq
mRNAs. Searches on other selected identifiers, such as NP and NM accession
numbers, OMIM identifiers, and Entrez Gene IDs are supported. However, some
types of queries will return an error, e.g. post-assembly GenBank
entries, withdrawn gene names, and abandoned synonyms. If your initial
query is unsuccessful, try entering a different related term that
may produce the same location. For example, if a query on a gene symbol
produces no results, try entering an mRNA accession, gene ID number, or
descriptive words associated with the gene.
Finding a genome location using BLAT
If you have genomic, mRNA, or protein sequence, but don't know the name or
the location to which it maps in the genome, the
BLAT tool will rapidly locate the position by homology alignment, provided that the
region has been sequenced. This search will find close members of the gene
family, as well as assembly duplication artifacts. An entire set of query sequences can
be looked up simultaneously when provided in fasta format.
A successful BLAT search
returns a list of one or more genome locations that match the input sequence. To
view one of the alignments in the Genome Browser, click the browser link for the
match. The details link can be used to preview the alignment to determine if
it is of sufficient match quality to merit viewing in the Genome Browser. If too many BLAT hits
occur, try narrowing the search by filtering the sequence in slow mode with
RepeatMasker, then rerunning the BLAT search.
For more information
on conducting and fine-tuning BLAT searches, refer to the
BLAT section of this document.
Opening the Genome Browser with a custom annotation track
You can open the Genome Browser window with a custom annotation track displayed
by using the Add Your Own Tracks feature, accessed from the Gateway page. For more information on creating and using custom annotation tracks, refer to the Custom
Annotation Track documentation.
Annotation track data can be entered in one of three ways:
| -- |
Enter the file name for an annotation track source file in the
Annotation File text box.
|
| -- |
Type or paste the annotation track data into the large text box.
|
| -- |
If the annotation data is accessible through a URL, enter the URL name in the
large text box.
|
Once you've entered the annotation information, click the submit button at the top of
the Gateway page to open up the Genome Browser with the annotation track
displayed.
The Genome Browser also provides a collection of custom annotation tracks contributed by the UCSC Genome Bioinformatics group and the research community.
NOTE: If an annotation track does not display correctly when
you attempt to upload it, you may need to reset the Genome Browser to
its default settings, then reload the track. For information on
troubleshooting display problems with custom annotation tracks, refer to the
section on Troubleshooting Annotation Display Problems.
Viewing genome data as text
The
Table Browser,
a portal to the underlying open source MySQL relational database driving the Genome
Browser, displays genomic data as columns of text rather than as
graphical tracks. For more information on using the Table Browser, see the
section Getting Started: on the Table Browser.
Opening the Genome Browser from external gateways
Several external gateways provide direct links into the Genome Browser. Examples
include: Entrez Gene,
AceView,
Ensembl, SuperFamily, GeneLynx, and GeneCards. Journal
articles can also link to the browser and provide custom tracks. Be sure to use
the assembly date appropriate to the provided coordinates when using data from
a journal source.
Tips for Use
To facilitate your return to regions of interest within the Genome
Browser, save the coordinate range or bookmark the page of displays that you plan to
revisit or wish to share with others.
It is usually best to work with the most recent assembly even though a full
set of tracks might not yet be ready. Be aware that the coordinates of a given feature
on an unfinished chromosome may change from one assembly to the next as
gaps are filled, artifactual duplications are reduced, and strand orientations
are corrected. The Genome Browser offers multiple tools that can correctly
convert coordinates between different assembly releases. For more information
on conversion tools,
see the section Converting data between assemblies.
To ensure uninterrupted browser services for your research during UCSC server
maintenance and power outages, bookmark a
mirror site that
replicates the UCSC genome browser.
Bear in mind that the Genome Browser cannot outperform the underlying quality of
the draft genome. Assembly errors and sequence gaps may still occur well into
the sequencing process due to regions that are intrinsically difficult to sequence.
Artifactual duplications arise as unavoidable compromises during a build, causing
misleading matches in genome coordinates found by alignment.
| |
|
|
|
Interpreting and fine-tuning the
Genome Browser display
|
|
| |
The Genome Browser annotation tracks page displays a genome location
specified through a Gateway search, a BLAT search, or an uploaded
custom annotation track. There are five main features on this page: a set of
navigation controls, a
chromosome ideogram, the annotations tracks image,
display configuration buttons, and a set of track
display controls.
The first time you open the
Genome Browser, it will use the application default values to configure the
annotation tracks display. By manipulating the navigation, configuration
and display controls, you can customize the annotation tracks display to
suit your needs.
For a complete description of the annotation tracks available in all assembly
versions supported by the Genome Browser, see the
Annotation Track Descriptions section.
The Genome Browser retains user preferences from session to session within the
same web browser, although it never monitors or records user activities or submitted
data.
To restore the default settings, click the "Click here to reset"
link on the Genome Browser Gateway page. To return the display to the default
set of tracks (but retain other configured Genome Browser settings), click
the default tracks button on the Genome Browser page.
Display conventions
The annotation tracks displayed in the Genome Browser use a common set of
display conventions:
| -- | Annotation track
descriptions:
Each annotation track has an associated description page that contains a
discussion of the track, the methods used to create the annotation, the data
sources and credits for the track, and (in some cases) filter and
configuration options to
fine-tune the information displayed in the track. To view the
description page, click on the mini-button to the left of a displayed
track or on the label for the track in the Track Controls section. |
| -- | Annotation track
details pages:
When an annotation track is displayed in full, pack, or squish mode,
each line item
within the track has an associated details page that can be displayed by
clicking on the item or its label. The
information contained in the details page varies by annotation track, but
may include basic position information about the item, related links
to outside sites
and databases, links to genomic alignments, or links to corresponding
mRNA, genomic, and protein sequences.
|
| -- | Gene prediction tracks:
Coding exons are represented by blocks connected by horizontal lines
representing introns. The 5' and 3' untranslated regions (UTRs) are displayed
as thinner blocks on the leading and trailing ends of the aligning
regions. In full display mode, arrowheads on
the connecting intron lines indicate the direction of transcription.
In situations where no intron is visible (e.g. single-exon genes,
extremely zoomed-in displays), the arrowheads are displayed on the exon
block itself. |
| -- | Pat space layout (PSL) alignment
tracks:
Aligning regions (usually exons) are shown as black blocks. In dense display
mode, the degree of darkness corresponds to the number of features aligning to
the region or the degree of quality of the match. In full display mode, the aligning regions are connected by lines
representing gaps in the alignment (typically spliced-out introns), with arrowheads indicating
the direction of transcription, which is determined by looking at the
splice sites. In situations where no gap lines are
visible, the arrowheads are displayed on the block itself.
To prevent display problems, the Genome Browser imposes an upper
limit on the number of alignments that can be viewed simultaneously
within the tracks image. When this limit is exceeded, the Browser
displays the best several hundred alignments in a condensed display
mode, then lists the number of undisplayed alignments in the last
row of the track. In this situation, try zooming in to display
more entries or to return the track to full display mode. |
| -- | "Chain" tracks
(2-species alignment):
Chain tracks display boxes joined together by either single or double
lines. The boxes represent aligning regions. Single lines indicate gaps
that are largely due to a deletion in the genome of the first species or
an insertion in the genome of the second species. Double lines represent
more complex gaps that involve substantial sequence in both species.
This may result from inversions, overlapping deletions, an abundance of
local mutation, or an unsequenced gap in one species. In cases where
there are multiple chains over a particular portion of the genome,
chains with single-lined gaps are often due to processed pseudogenes,
while chains with double-lined gaps are more often due to paralogs and
unprocessed pseudogenes. In the fuller display modes, the individual
feature names indicate the chromosome, strand, and location
(in thousands) of the match for each matching alignment. |
| -- | "Net" tracks
(2-species alignment):
Boxes represent ungapped alignments, while lines represent gaps. Clicking
on a box displays detailed information about the chain as a whole, while
clicking on a line shows information on the gap. The detailed information
is useful in determining the cause of the gap or, for lower level chains,
the genomic rearrangement. Individual items in the display are
categorized as one of four types (other than gap):
-
Top - The best, longest match. Displayed on level 1.
-
Syn - Lineups on the same chromosome as the gap in the level
above it.
-
Inv - A lineup on the same chromosome as the gap above it, but
in the opposite orientation.
-
NonSyn - A match to a chromosome different from the gap in the
level above.
|
| -- | Cross-species synteny and
orthology tracks:
Sequences from an organism that match regions of the displayed genome
are represented by blocks of various colors. The block color indicates the
chromosomal location of the sequence in the organism's genome, as shown by the
Chromosome Color Key below the annotation track window. |
| -- | "Wiggle" tracks
(Cross-species conservation):
These tracks plot a continuous function along a chromosome. Data is
displayed in windows of a set number of base pairs in width.
The score for each window is given on a logarithmic scale, which
displays as "mountain ranges". The display characteristics vary among
the tracks in this group. See the individual track descriptions for more
information on interpreting the display. |
Changing the display mode of an individual annotation track
Each annotation track within the window may have up to five display modes:
| -- | Hide: the
track is not displayed at all. To hide all the annotation tracks,
click the hide all button. This mode is useful for restricting
the display to only those tracks in which you are interested. For
example, someone who is not interested in RNA genes may want
to hide these tracks to reduce track clutter and improve speed.
|
| -- | Dense: the
track is displayed with all features collapsed into a single
line. This mode is useful for reducing the amount of space used by a
track when you don't need individual line item details or when you just
want to get an overall view of an annotation. For example, by opening an
entire chromosome and setting the RefSeq Genes track to dense, you can get
a feel for the known gene density of the chromosome without displaying
excessive detail. |
| -- | Full: the
track is displayed with each annotation feature on a separate line. It is
recommended that you use this option sparingly, due to the large number
of individual track items that may potentially align at the selected position. For
example, hundreds of ESTs might align with a specified gene. When
the number of lines within a requested track location exceeds 250, the
track automatically defaults to a more tightly-packed display mode. In
this situation,
you can restore the track display to full mode by narrowing the chromosomal
range displayed or by using a track filter to reduce the number of items
displayed. On tracks that contain only hide, dense, and full modes, you
can toggle between full and dense display modes by clicking on the
track's center label. |
| -- | Squish: the
track is displayed with each annotation feature shown separately, but
at 50% the height of full mode. Features are unlabeled, and more than one
may be drawn on the same line. This mode is useful for reducing the
amount of space used by a track when you want to view a large number of
individual features and get an overall view of an annotation. It is
particularly good for displaying tracks in which a large number of
features align to a particular section of a chromosome, e.g. EST tracks.
|
| -- | Pack: the
track is displayed with each annotation feature shown separately and
labeled, but not necessarily displayed on a separate line.
This mode is useful for reducing the
amount of space used by a track when you want to view the large number of
individual features allowed by squish mode, but need the labeling and
display size provided by full mode. When the number of lines within the
requested track location exceeds 250, the track automatically defaults
to squish display mode. In this situation, you can restore the track
display to pack mode by narrowing the chromosomal range displayed or by
using a track filter to reduce the number of items
displayed. To toggle between pack and full display modes, click on
the track's center label.
|
The track display controls are grouped into categories that reflect the type
of data in the track, e.g. Gene Prediction Tracks, mRNA and EST tracks, etc.
To change the display mode for a track, find the track's controller in the
Track Controls section at the bottom of the Genome Browser page, select the desired
mode from the control's display menu, and then click the refresh button.
Alternatively, you can toggle between dense and full modes
for a displayed track (or pack mode when available) by clicking on the optional
center label for the track.
Changing the display mode for a group of tracks
Track display modes may be set individually or as a group on the Genome
Browser Track Configuration page. To access the configuration page, click the
configure button on the annotation tracks page or the
configure tracks and display button on the Gateway page. Exercise
caution when using the show all buttons on track groups or
assemblies that contain a large number tracks; this may seriously impact the
display performance of the Genome Browser or cause your Internet browser to
time out.
Hiding the track display controls
The entire set of track display controls at the bottom of the annotation
tracks page may be hidden from view by checking the Show track controls
under main graphic option in the Configure Image section of the Track
Configuration page.
Changing the display of a track by using filters and configuration options
Some tracks have additional filter and configuration capabilities. These options
let the user modify the color or restrict the data displayed within an annotation track.
Filters are useful for focusing attention on items relevant to the current task
in tracks that contain large amounts of data. Configuration
options let the user adjust the display to best show the data of interest.
For example, the min vertical viewing range value on wiggle tracks
can be used to establish a data threshold. By setting the min value to "50", only data values greater than 50 percent will display.
To access filter and configuration options for a specific annotation track, open
the tracks' description page by clicking the label for the track's control menu
under the Track Controls section or the mini-button to the left of the displayed
track. The filter and configration section is located at the top of the
description page. In most instance, more information about the configuration
options is available within the description text or through a special help link
located in the configuration section.
Filter and configuration settings are persistent from session to session on the
same web browser. To return the Genome Browser display to the default set of
tracks (but retain other configured Genome Browser settings), click the
default tracks button on the Genome Browser tracks page.
To remove all user configuration settings and completely restore the defaults,
click the "Click here to reset" link on the Genome Browser Gateway
page.
Zooming and scrolling the tracks display
At times you may want to adjust the amount of flanking region displayed in the
annotation tracks window or adjust the scale of the display. At a scale of 1 pixel per
base pair, the window accurately displays the width of exons and introns, and
indicates the direction of transcription (using arrowheads) for multi-exon features. At a
grosser scale, certain features - such as thin exons - may disappear. Also, some
exons may falsely appear to fall within RepeatMasker features at some scales.
Click the zoom in and zoom out buttons at the top of the
Genome Browser page to zoom in or out on the center of the annotation tracks window by 1.5, 3 or
10-fold. Alternatively, you can zoom in 3-fold on the display by clicking
anywhere on the
Base Position track. In this case, the zoom is centered on the coordinate of the
mouse click. To view the base composition of the sequence underlying the
current annotation track display, click the base button.
To scroll the annotation tracks sideways to the left or right by 10%, 50%, or 95% of the displayed
size (as given in base pairs), click the corresponding move arrow. It is also
possible to scroll the left or right side of the tracks by a specified number
of vertical gridlines while keeping the
position of the opposite side fixed. To do this, click the appropriate move
start or move end arrow, located under the annotation tracks window.
For example, to keep the left-hand display coordinate fixed but increase the
right-hand coordinate, you would click the right-hand move end arrow. To increase
or decrease the gridline scroll interval, edit the value in the move start or
move end text box.
Changing the displayed track position
To display a completely different position in the genome, enter the new query in
the position/search text box, then click the jump button. For more information on
valid entries for this text box, refer to the Getting
Started section.
Changing the width of the annotation track window
By default, the width of the annotation track window is set to 620 pixels. To modify the width to
best suit the display capabilities of your monitor, enter a new value in the
image width text box on the Track Configuration page, then click the
submit button. For example, setting the
display to 1100 pixels on a 19" monitor will increase the visible portion of the genome
and reduce the need for redraws. The maximum supported width is 5000 pixels.
Changing the text size in the annotation track image
The annotation track image may be adjusted to display text in a range of
fonts from "tiny" to "huge". To change the size of the
text, select an option from the text size pull-down menu on the
Track Configuration page, then click Submit. The text size is set to
"small" by default.
Hiding the annotation track labels
The track and element labels displayed above and to the left of the tracks
in the annotation tracks image may be hidden from view by unchecking the
Display track descriptions above each track and Display labels
to the left of items in tracks boxes, respectively, on the Track
Configuration page.
Hiding the display grid on the annotation tracks image
The light blue vertical guidelines on the annotation tracks image may be
removed by unchecking the Show light blue vertical guidelines box
on the Track Configuration page.
Hiding the chromosome ideogram
The chromosome ideogram, located just above the annotation tracks image,
provides a graphical overview of the features on the selected chromosome,
including its bands, the position of the centromere, and an indication of
the region currently displayed in the annotation tracks image.
To hide the ideogram, uncheck the Display chromosome ideogram above
main graphic box on the Tracks Configuraiton page.
Printing a copy of the annotation track window
The Genome Browser provides a mechanism for saving a copy of
the currently displayed annotation tracks image to a file that can be printed
or edited. Images saved in PostScript format can be printed at high resolution
and edited by drawing programs such as Adobe Illustrator. This is useful for
generating figures intended for publication. Images can also be saved in PDF
format for viewing by Adobe Acrobat Reader.
To print or save the image to a file:
| 1. |
Click the PDF/PS link in the menu on the annotation tracks page. |
| 2. |
Click the PostScript or PDF link. |
| |
|
|
|
| |
BLAT (BLAST-Like Alignment Tool) is a very fast sequence alignment tool similar to
BLAST.
For more information on BLAT's internal scoring schemes and its overall n-mer
alignment seed strategy, refer to W. James Kent (2002) BLAT -
The BLAST-Like Alignment Tool, Genome Res 12:4 656-664.
On DNA queries, BLAT is designed to quickly find sequences with 95% or greater
similarity of length 40 bases or more. It may miss genomic alignments that
are more divergent or shorter than these minimums, although it will find perfect
sequence matches of 33 bases and sometimes as few as 22 bases. The tool is
capable of aligning sequences that contain large introns.
On protein queries, BLAT rapidly locates genomic sequences with 80% or greater
similarity of length 20 amino acids or more. In general, gene family members that
arose within the last 350 million years can generally be detected. More
divergent sequences can be aligned by using NCBI's BLAST and
psi-BLAST, then using BLAT to align the resulting match onto the UCSC archaeal genome. In practice DNA BLAT works well on species in the same or a nearby genus, and protein BLAT works
well between more distant phyla.
Some common uses of BLAT include:
| -- |
finding the genomic coordinates of gene or protein
|
| -- |
searching for gene family members
|
| -- |
finding homologs of a query from another species.
|
Making a BLAT query
To locate a nucleotide or protein within a genome using BLAT:
| 1. |
Open the BLAT Search Genome page by clicking the BLAT link on the top
blue menu bar of any of the Genome Browser pages. |
| 2. |
Select the genome, assembly, query type, output sort order, and output
type. To order the search results based on the closeness of the sequence
match, choose one of the score options in the Sort output menu.
The score is determined by the number of matches vs. mismatches in the
final alignment of the query to the genome.
|
| 3. |
If the sequence to be uploaded is in an unformatted plain text file, enter
the file name in the Upload sequence text box, then click the
submit file
button. Otherwise, paste the sequence or fasta-formatted list into the large
edit box, and then click the submit button. Input sequence can be
obtained from the Genome Browser as well as from a custom annotation
track. |
Header lines may be included in the input text if they are preceded by > and
contain unique names.
Multiple sequences may be submitted at the same time if they are of the
same type and are preceded by unique header lines. Numbers, spaces, and extraneous
characters are ignored:
>sequence_1
ATGCAGAGCAAGGTGCTGCTGGCCGTCGCCCTGTGGCTCTGCGTGGAGAC
CCGGGCCGCCTCTGTGGGTTTGCCTAGTGTTTCTCTTGATCTGCCCAGGC
>sequence_2
ATGTTGTTTACCGTAAGCTGTAGTAAAATGAGCTCGATTGTTGACAGAGA
TGACAGTAGTATTTTTGATGGGTTGGTGGAAGAAGATGACAAGGACAAAG
>sequence_3
ATGCTGCGAACAGAGAGCTGCCGCCCCAGGTCGCCCGCCGGACAGGTGGC
CGCGGCGTCCCCGCTCCTGCTGCTGCTGCTGCTGCTCGCCTGGTGCGCGG
BLAT limitations
DNA input sequences are limited to a maximum length of 25,000 bases. Protein or
translated input sequences must not exceed 5000 letters. As many as 25 multiple
sequences may be submitted at the same time. The maximum combined length of DNA
input for multiple sequence submissions is 50,000 bases (with a 25,000 base
limit per individual sequence). For
protein or translated input, the maximum combined input length is 12,500 letters
(with a 5000 letter limit per individual sequence).
NOTE: Program-driven BLAT use is
limited to a maximum of one hit every 15 seconds and no more than 5000 hits per
day.
BLAT query search results
If a query returns successfully, BLAT will display a flat database file that
summarizes the alignments found. A BLAT query often generates multiple
hits. This can happen when the genome contains
multiple copies of a sequence, paralogs, pseudogenes, statistical coincidences,
artifactual assembly duplications, or when the query itself contains repeats or common
retrotransposons. When too many hits occur, try resubmitting the query sequence
after filtering in slow mode with RepeatMasker.
Items in the search results list are ordered by the criteria specified in the
Sort output menu. Each line item provides links to view the details of the sequence
alignment or to open the corresponding view in the Genome Browser. The
details link gives the letter-by-letter alignment of the sequence to the
genome. It is recommended
that you first examine the details of the alignment for match quality before viewing
the sequence in the Genome Browser.
When several nearby BLAT matches occur on a single chromosome, a
simple trick can be used to quickly adjust the Genome Browser track window to display
all of them: open the Genome Browser with the match that has the lowest chromosome
start coordinate, paste in the highest chromosome end coordinate from the list of
matches, then click the jump button.
Creating a custom annotation track from BLAT output
To make a custom track directly from BLAT, select the
PSL format
output option. The resulting PSL track can be uploaded into the Genome Browser by
pasting it into the Add Your Own Tracks text box, accessed from the Browser
Gateway page.
Using BLAT for large batch jobs or commercial use
For large batch jobs or internal parameter changes, it is best to install
command line BLAT on your own Linux server. Sources and executables are free for
academic, personal, and non-profit purposes. BLAT source may be downloaded from
http://www.soe.ucsc.edu/~kent (look for the blatSrc*.zip file
with the most recent date). For BLAT executables, go to
http://www.soe.ucsc.edu/~kent/exe/; binaries are sorted by
platform.
Non-exclusive commercial licenses are available from the
Genome
BLAT website.
BLAT documentation
For more information on the BLAT suite of programs, see the
BLAT
Program Specifications and the
Blat section of the Genome
Browser FAQ.
| |
|
|
|
Getting started on the Table Browser
|
|
| |
The Table
Browser provides text-based access to the genome assemblies and annotation
data stored in the Genome Browser database. As a flexible alternative to the
graphical-based Genome Browser, this tool offers an enhanced level of
query support that includes restrictions based on field values, free-form
SQL queries, and combined queries on multiple tables. Output can be filtered
to restrict the fields and lines returned, and may be organized into one of
several formats, including a simple tab-delimited file that can be loaded into
a spreadsheet or database as well as advanced formats that may be uploaded into
the Genome Browser as custom annotation tracks.
The Table Browser provides a convenient
alternative to downloading and manipulating the entire genome and its massive data tracks.
For information on using the Table Browser features, refer to the Table Browser User Guide.
| |
|
|
|
| |
The Genome Browser provides a feature to configure the retrieval, formatting,
and coloring of the text used to depict the DNA sequence underlying the features in the displayed
annotation tracks window. Retrieval options allow the user to add a padding of
extra bases to the upstream or downstream end of the sequence. Formatting options range from simply displaying exons in upper case to
elaborately marking up a sequence according to multiple track data. The DNA sequence covered by
various tracks can be highlighted by case, underlining, bold or italic fonts, and color.
The DNA display configuration feature can be useful to highlight features within a
genomic sequence, point out overlaps between two types of features (for example, known
genes vs. gene predictions), or mask out unwanted features.
Using the DNA text formatting feature
To access the feature, click on the DNA link on the top blue menu
bar on the Genome Browser page. The Get DNA in Window page that appears contains
sections for configuring the retrieval and output format.
To display extra bases upstream of the 5' end of your sequence or downstream
of the 3' end of the sequence, enter the number of bases in the corresponding
text box. This option is useful in looking for regulatory regions.
The Sequence Formatting section lists several options for
adjusting the case of all or part of the DNA sequence. To choose one of these formats,
click the corresponding option button, then click the get DNA button. To access a table
of extended formatting options, click the Extended case/color options button.
The Extended DNA Case/Color page presents a table with many more format options. The page
provides instructions for using the formatting table, as well as examples of its use. The
list of tracks in the Track Name column is automatically generated from the list of tracks
available on the current genome.
Tips for Use
A few caveats mentioned on the Extended DNA Case/Color page bear repeating. Keep the formatting simple at first: it is easy to
make a display that is pretty to look at but is also completely cryptic. Also, be careful when
requesting complex formatting for a large chromosomal region: when all the HTML tags have been
added to the output page, the file size may exceed the size limits that your internet browser,
clipboard, and other software can safely display. The maximum size of genome that can be
formatted by the tool is approximately 10 Mbp.
| |
|
|
|
Converting data between assemblies
|
|
| |
Coordinates of features frequently
change from one assembly to the next as gaps are closed, strand orientations are
corrected, and duplications are reduced. Occasionally, a chunk of sequence may
be moved to an entirely different chromosome as the map is
refined. There are 3 different methods available for migrating data from one
assembly to another: BLAT alignment, coordinate conversion, and lifting of
coordinates. The BLAT alignment tool is described in the section
Using BLAT alignments.
Coordinate conversion
The Genome Browser Convert tool is useful for
locating the position of a feature of interest in a newer release of a genome.
During the conversion process, portions of the genome in the coordinate range of
the original assembly are aligned to the new assembly while preserving their order and
orientation. In general, it is easier to achieve successful conversions with shorter
sequences. NOTE: At the present time, this tool may be used only on human
genome assemblies.
The conversion tool works by performing a BLAT search on the first 1000, last 1000, and middle
1000 bases
in the current window. If all three searches land uniquely in the same order on
the other version, the program announces a successful
conversion. If the search results are not so straightforward, the user is given
various options to find the corresponding sequence.
Frequently, if the feature the user is looking for is tied to an mRNA, it is
simplest just to BLAT the mRNA.
Using the convert tool
To access the conversion tool, click the Convert link in the top menu bar on the
Genome Browser page. On the page that displays, select the assembly
version in which the feature is located in the Original Draft list, then
pick the assembly version you'd like to convert to in the New Draft list.
Modify the position coordinates if necessary, then click the submit
button. Note that archived assembly versions are not accessible from the
conversion tool. Currently the tool supports only the human genome.
If the match is successful, the Genome Browser will announce success and display
the coordinate ranges for both the original and new assemblies. Clicking the
browser link to the right of the coordinate range will start up the
browser at the given position on the assembly. The Alignment Details section
shows which sequences of the original draft were aligned by BLAT to determine
the new set of coordinates. Note that the conversion is a best guess: it is
recommended that you check with local landmarks and use
common sense when evaluating the results.
If the conversion is unsuccessful, the Genome Browser returns a failure message
and a possible explanation for the failure.
Lifting coordinates
The liftOver tool is useful for converting a large number of coordinates to a
different assembly. This command-line utility requires access to a Linux platform.
The executable file can be downloaded from
www.soe.ucsc.edu/~kent/exe/linux/liftOver.gz.
LiftOver requires a UCSC-generated over.chain file as input. Pre-generated files
for a given assembly are located in the liftOver subdirectory of the assembly's
downloads directory. If the desired conversion file is not available, one can be obtained by sending a
request to the genome mailing list.
| |
|
|
|
Creating custom annotation tracks
|
|
| |
The Genome Browser provides dozens of aligned annotation tracks that have been
computed at UCSC or have been provided by outside collaborators. In addition to these
standard tracks, it is also possible for users to upload their own annotations for
temporary display in the browser. These custom annotation tracks are viewable
only on the machine from which they were uploaded and are only
kept for 8 hours after the last time they were accessed. Optionally, users can
make custom annotations viewable by others as well.
Custom tracks are an important research feature of the Genome Browser. Because
space is limited in the Genome Browser track window, many excellent genome-wide tracks
cannot be included in the set of tracks packaged with the Genome Browser. Other
tracks of interest may be excluded from distribution because the annotation
track data is too specific to be of general interest or can't be shared until
journal publication. To view a list of custom annotation tracks submitted by
Genome Browser users, click the Custom
Tracks link on the Genome Browser home page.
Custom annotation tracks are similar to standard tracks, but never become part of the
MySQL genome database. Each track has its own controller and persists even when not
displayed in the Genome Browser window, e.g. if the position changes to a range
that no longer includes the track. Typically, custom annotation tracks are aligned
under corresponding genomic sequence, but they can also be completely unrelated to
the data. For example, a track can be displayed under a long sequence
consisting of millions of Ns.
This section presents a summary of the steps involved in constructing and
displaying a custom annotation track. For a complete discussion of custom
annotation tracks, file formats, custom URLs, and a troubleshooting guide, refer
to the document Displaying Your Own Annotations in the Genome Browser.
Creating a custom annotation track
Genome Browser annotation tracks are based on files in line-oriented format.
Each line in the file defines a display characteristic for the annotation track
or defines a data item within the track.
Annotation files contain 3 types of lines: browser lines, track lines, and data
lines. Empty lines and lines starting with # in the
annotation file are ignored.
The easiest way to create a correctly formatted annotation track is by
collecting PSL output from BLAT. Advanced users can make custom tracks from the Table
Browser and track-formatted DNA. To create a custom annotation track from scratch,
it's best to begin with the examples and experiment with new lines and altered values
in a spreadsheet or text editor.
To construct an annotation file and display it in the Genome Browser, follow
these steps:
Step 1. Format the data set
Formulate your data set as a tab-separated file using one of the formats supported by the Genome Browser. Annotation data can be in
standard GFF format or in a format designed specifically for the Human Genome Project, including
GTF, PSL, or BED. GFF and GTF files must be tab-delimited rather than
space-delimited to display correctly. You may include more than one data
set in your annotation file. However, all of the data lines for a given annotation track must be in the same format.
Step 2. Define the Genome Browser display characteristics
Add one or more optional browser lines to the beginning of your formatted data file to configure the overall
display of the Genome Browser when it initially displays your annotation data. Browser lines allow you to configure such things
as the genome position that the browser will initially open to, the width of the display, and the configuration of the other
annotation tracks that are shown (or hidden) in the initial display. NOTE: If
the browser position is not explicitly set in the annotation file, the initial display will default
to the position setting most recently used by the user, which may not be an
appropriate position for viewing the annotation track.
Step 3. Define the annotation track display characteristics
Following the browser lines - and immediately preceding the formatted data - add
a track line to
define the display attributes for your annotation data set. Track lines allow you to define annotation track characteristics
such as the name, description, colors, initial display mode, use score,
etc. If you have included more than one data set in your annotation
file, insert a track line at the beginning of each new set of data.
Example:
Here is an example of an annotation file that defines 2 separate
annotation tracks in BED format. The first track displays blue one-base
tick marks every 10000 bases at the beginning of chr2.
The second track displays red 100-base features alternating with
blank space in the same region of chr2.
browser position chr2:201000-2014000
track name=spacer description="Blue ticks every 10000 bases" color=0,0,255,
chr2 20100000 20100001
chr2 20110000 20110001
chr2 20120000 20120001
track name=even description="Red ticks every 100 bases, skip 100" color=255,0,0
chr2 20100000 20100100 first
chr2 20100200 20100300 second
chr2 20100400 20100500 third
Example:
This example shows an annotation file containing one data set in BED
format. The track displays paired features with a thick end and thin
end, and hatch marks indicating the direction of transcription. The track
labels display in green (0,128,0), and the gray level of the each feature
reflects the score value of that line. NOTE: The track line in this example
has been
split over 2 lines for documentation purposes. If you paste this example into
the browser, you must remove the line break to display the track successfully.
browser position chr2:1000-10000
browser hide all
track name=pairedReads description="Clone Paired Reads" visibility=2
color=0,128,0 useScore=1
chr2 1000 5000 cloneA 960 + 1000 5000 0 2 567,488, 0,4512
chr2 2000 6000 cloneB 200 - 2000 6000 0 2 433,399, 0,5601
Step 4. View your annotation track in the Genome Browser
To view your annotation data in the Genome Browser, open up the UCSC Genome
Bioinformatics home page (http://archaea.ucsc.edu/) and click on the Genome
Browser link in the top menu bar. On the Genome Browser Gateway page that
displays, select the genome and assembly on which your
annotation data is based, then click the add your own tracks button. Upload your annotation
file by entering the name of your file in the Annotation File box or by pasting the contents of your file into the large edit
box. Scroll back to the top of the page and click the submit button to
display the Genome Browser track window with your annotation. If you
encounter difficulties in displaying your annotation, read the section Troubleshooting Annotation Display Problems.
To upload a custom annotation track from another machine or web site, paste the
URL of the track into the large edit box. Custom tracks can be displayed in
conjunction with ordinary BLAT tracks.
Step 5. (Optional) Add details pages for individual track features
After you've constructed your track and have successfully displayed it in the
Genome Browser, you may wish to customize the details pages for individual track
features. The Genome Browser automatically creates a default details page for each
feature in the track containing the feature's name, position information, and a
link to the corresponding DNA sequence. To view the
details page for a feature in your custom annotation track (in full display
mode), click on the item's label in the annotation track window.
You can add a link from a details page to an external web page containing
additional information about the feature by using the track line url attribute.
In the annotation file, set the url attribute in the track line to point
to a publicly available page on a web server. The url attribute
substitutes each occurrence of '$$' in the URL string with the name defined by the
name attribute. You can take advantage of this feature to provide
individualized information for each feature in your track by creating HTML anchors
that correspond to the feature names in your web page.
Example:
Here is an example of a file in which the url attribute has been set to
point to the file http://archaea.ucsc.edu/goldenPath/help/clones.html. The '#$$'
appended to the end of the file name in the example points to the HTML NAME tag within the file that
matches the name of the feature (cloneA, cloneB, etc.).
NOTE: The track line in this example has been split over 3 lines for
display purposes. If you paste this example into
the browser, you must remove the line breaks to display the track successfully.
browser position chr2:1000-10000
browser hide all
track name=clones description="Clones" visibility=2
color=0,128,0 useScore=1
url="http://archaea.ucsc.edu/goldenPath/help/clones.html#$$"
chr2 1000 5000 cloneA 960
chr2 2000 6000 cloneB 200
chr2 5000 9000 cloneC 700
chr2 6000 10000 cloneD 600
chr2 11000 15000 cloneE 300
chr2 13000 17000 cloneF 100
Sharing your annotation track with others
The previous steps showed you how to upload annotation data for your own use on your own machine. However, many users would like to
share their annotation data with members of their research group on different machines
or with colleagues at other sites. To make your Genome Browser annotation track
viewable by others, follow the steps below. (Note that
some of the URL examples in this section have been broken up into 2
lines for documentation display purposes).
Step 1.
Put your formatted annotation file on your web site. Be sure that the file permissions allow it to be read by others.
Step 2.
Construct a URL that will link this annotation file to the Genome Browser. The URL must contain 3 pieces of information specific
to your annotation data:
- The species or genome assembly on which your annotation data is based.
To automatically display the most recent assembly for a given organism, set
the org parameter: e.g. org=thermococcus. To specify a
particular genome assembly for an organism, use the db parameter,
db=database_name, where database_name is to the UCSC code for
the genome assembly.
- The genome position that the Genome Browser should initially open to. This information is of the form
position=chr_position, where chr_position is a chromosome number, with or without a set of
coordinates. Examples of this include: position=chr22, position=chr22:15916196-31832390.
- The URL of the annotation file on your web site. This information is
of the form hgt.customText=URL, where
URL points to the annotation file on your website. An example of an
annotation file URL is http://archaea.ucsc.edu/goldenPath/help/test.bed.
Combine the above pieces of information into a URL of the following
format (the information specific to your annotation file is highlighted):
http://archaea.ucsc.edu/cgi-bin/hgTracks?org=species_name& position=chr_position&hgt.customText=URL
| |
|
|
|
Summary Tips for viewing annotation track data
|
|
| |
For a more information on configuring
and using the tracks displayed in the Genome Browser track window, see the section
Fine-tuning the Genome Browser display.
| -- |
To display a description page with more information about the track,
click on the mini-button to the left of a track. |
| -- |
To display a details page with additional information about a specific line
item within a track in full display mode, click on the item or its label. |
| -- |
A track does not appear in the browser if its display mode is set to
hide. To restrict the browser's display to only those tracks in
which you're interested, set the display mode of the unwanted tracks to
hide. |
| -- |
A track set to full display mode will default to a more tightly
packed display mode if
the total number of lines in the track exceeds 250. |
| -- |
To quickly toggle between full and dense or pack display modes, click on
the track's center label. |
| -- |
Track data can be viewed equivalently in columns as text tables using
the Table Browser. |
| -- |
For specific information about a given track,
look at the track's description page. |
| |
| |
|