Breaking-Cas is a versatile system for detecting putative sgRNA off-targets in CRISPR/Cas applications. Its main features are summarized in Table 1:
|Input||One or several FASTA sequences up to 20,000 nucleotides in total. File uploading allowed.|
|Output||Rich interactive web page containing detailed information about candidate oligos, on-targets and off-targets. Scores, coordinates and overlapping genes are shown. Mini genome browsers allow checking the genomic environment of each putative off-target. Results can be downloaded as tables.|
|Throughput||Medium/High. Multiple queries in batch are allowed.|
|Use cases||To design sgRNAs and evaluate putative undesired off-targets for CRISPR/Cas applications. It can be used for any eukaryotic genome available at ENSEMBL/ENSEMBLGENOMES.|
|Genomes||More than 650 (as of April, 2016).|
|sgRNA constrains||Versatile system: oligo size, mismatch number, PAM sequence and position, and scoring system can be customized by the user.|
|Pros||Very fast and easy to use. Results are interactive, very detailed and well organized.|
|Cons||Does not include any method to measure sgRNA efficiency.|
Table 1. Breaking-Cas's features. See Table 2 of
Graham and Root, 2015
for a recent comparison of the very same fields for other similar tools.
This guide illustrates the use of Breaking-Cas with the analysis of the sms-2 gene from Caenorhabditis elegans (roundworm).
Following steps 1 to 3 is equivalent to using the "Fill with example" link at the bottom of Breaking-Cas form.
Step 1. At the input form, select Caenorhabditis elegans as organism (write the name or select it from the alphabetic list).
Step 2. Copy/Paste the nucleotide sequence of gene sms-2 (in FASTA format) in the text area:
(Alternatively, upload a text file containing the same sequence).
Step 3. Select settings for the nuclease of interest (Streptococcus pyogenes Cas9 by default) or introduce your own parameters:
PAM sequence (in IUPAC notation),
PAM position (5' or 3'),
Guide length (18-25 nt),
Mismatches allowed: recommended 4.
Positional weights are editable to take into account position-dependent relevance of mismatches
in the oligo when calculating off-target scores (see Box 1 below).
A value of "0" means that the mismatch is 100% allowed by the nuclease whereas a value of
"1" means that any off-target with a mismatch in this position will not be recognized by the nuclease
(Soff=0). Weights have been obtained experimentally by evaluating the nuclease activity in an
array of oligos covering a large space of the sequence positilities (e.g. those obtained by
Hsu et al (2013) for Cas9, included in this server in the presets for that nuclease).
Changing these values (as well as the other parameters of the nuclease)
allows to use this server with generic experimental datasets on nuclease preferences
(e.g. unpublished) provided they are compatible with Hsu's formulation. Nevertheless,
this is an advanced feature and should be used only by experienced users. Wrong weights
can lead to misleading off-target scores. Inexperienced users are advised to leave the default values offered by
Finally, introduce your email to receive a message when the analysis is finished (optional). and click Submit button.
If all input fields are correct, a "waiting room" page will open and the analysis will initiate.
Depending on the size of input sequences and genome, the process will take from a few seconds to several hours.
When the analysis is finished, the results can be opened on a new window (available on-line for a few days)
or downloaded as a compressed folder (recommended). In this last case, uncompress this file and open
"index.html" with any current web browser to see the interactive results page.
Interactive Results Page
Oligo candidate details and their genomic targets are presented as an interactive web page divided in three
main parts: Upper Section, Left Panel and Right Panel.
UPPER SECTION: Project Details
Input sequence ID, Organism name, and parameters used are indicated.
LEFT PANEL: Oligo Candidates
This table shows all oligo candidates found for the input sequence
(e.g. 20 nt followed by a compatible PAM (NRG) in the case of Cas9).
By default results sorted by the aggregated score.
Thanks to the explicit PAM sequence provided, users can limit their gRNA candidates to those with a desired PAM
while allowing a more generic PAM for off-targets. The typical case is S. pyogenes's Cas9, where NGG is desirable for
gRNAs, whereas off-targets can contain NAG or NGG (NRG in IUPAC).
Use the text-boxes right above the column headers to apply numerical filters to any column (example: for showing only oligos with score > 99, write ">99" in the SCORE textbox and click Enter).
Click on any header to sort values (click again for reverse sorting).
Click on any row to select a candidate and show its targets details at RIGHT PANEL.
Trails of four consecutive Ts are highlighted in red since they can lead to problems in CRISPR/Cas setups involving pol-III, due to the similarity with the termination motif of this polimerase.
Left panel details:
|START||Start position of the oligo in the input sequence.|
|END||End position of the oligo in the input sequence.|
|STRAND||Strand (+ or -) of the input sequence were the oligo was found.|
|OLIGO||Nucleotide sequence of the oligo candidate. PAM is also indicated.|
|ONTARGETS||Number of alignments with zero mistmatches in the reference genome.|
|OFFTARGETS||Number of alignments with one or more mistmatches in the reference genome.|
|GENES||Number of overlapping or nearby genes. Both on-targets and off-targets are considered.|
|SCORE||Aggregated score (Sguide) based on the number and quality of off-targets (see BOX 1 for details)|
RIGHT PANEL: Target Details of Selected Oligo Candidate
When an oligo is clicked on the left panel, target details will
appear in the form of minibrowsers illustrating the hybridization sites in the genome.
Above each minibrowser, the Score (Soff), chromosome, coordinates, strand and overlapping genes
(linked to ENSEMBL pages) are indicated for each genomic hit. See BOX 1 for details on score calculations.
Minibrowsers in yellow correspond to ON-TARGETS (perfect alignments).
Minibrowsers in grey correspond to potential OFF-TARGETS (alignments with mismatches).
For a quick interpretation of alignments, the two DNA strands and the oligo are displayed.
Use the zooming tool on the right to increase/reduce the flanking area.
(ENSEMBL logo) to open the region in the original ENSEMBL genome browser which shows more features and provides links to additional information associated to the genomic region.
TIP: double-click at any minibrowser region to expand/collapse genes (useful to view splicing variants).
A comparison between Breaking-Cas and CRISPR Design Tool (MIT) is available here (pdf). Results are very similar yet not
identical, probably due to small differences in estimation of pairwise distance between mismatches for each off-target.
External Tools for Evaluating sgRNA Efficiency
Not only the number and characteristics of potential off-targets should
be taken into account when designing sgRNA, but also its on-target
efficiency. Thus, it is advisable to use Breaking-Cas
in conjunction with a tool aimed at predicting on-target efficiency,
since the final goal is to find sgRNAs with high on-target efficiency
and a low number of off-targets.
sgRNA Scorer 1.0 (Chari et al, 2015)
is an interactive web tool that can be
used to assign a predicted activity to sgRNAs selected by Breaking-Cas
(or other similar tools). The predicted activity is based on support vector machine models
from experimental results obtained with Cas9 from S. pyogenes or S. thermophilus
in human 293T cells, so its application is restricted to CRISPR/Cas9 experiments.
sgRNA Designer (Doench et al, 2014) is
another interactive tool that
provides ranked lists of sgRNAs and their predicted efficiency, which can be used in combination with Breaking-Cas results
(restricted to S. pyogenes Cas9 nuclease). In this case, the predictor is based on the
analysis of the ability of a pool of 1,814 sgRNAs, covering all possible target sites
of a panel of several human and mouse
genes, to produce null alleles of the target gene.
WU-CRISPR (Wong et al, 2015) uses the same experimental data than sgRNA Designer
predictor. The authors compared the most potent sgRNAs
(top 20% in ranking) with the least potent sgRNAs (bottom 20%) and found sequence features (different from those used for sgRNA Designer)
related to sgRNA efficiency. These new sequence characteristics were incorporated into an algorithm
available in form of web tool or stand-alone package.
SSC (Xu et al, 2015)
is a web page where the user can search a DNA sequence for sgRNAs compatible
with CRISPR/Cas9. This tool predicts the efficiency of each sgRNA based on sequence features extracted from the analysis of the
outcome of several published CRISPR/Cas9 experiments. The method can consider different features
for CRISPR knockout (native Cas9) and CRISPR inihibition/activation experiments (dCas9).
Several algorithms have been proposed to analyze sgRNA
features that determine efficiency, defined as the capacity
to generate indels at target sites.
As efficiency is specific for each particular Cas nuclease,
algorithms are focused mainly on sgRNAs specific for the
widely used Cas9 and are not generalizable to other systems:
BOX 1: Scores calculations details
A. ESPECIAL CASE: off-target score calculation for Cas9 from Streptococcus pyogenes:
For each off-target, the probability of being a true secondary target for Streptococcus pyogenes's Cas9
is estimated as described in Optimized CRISPR Design tool (Zhang Lab, MIT):
|Position (p)||Weight (W)|
Table 2. Streptococcus pyogenes's Cas9 position-dependent weights for mismatches (crispr.mit.edu/about)
A score (Soff) is calculated for each off-target based
on the number and position of the mismatches. The higher the score, the higher the
probability of acting as a true secondary Cas9 site. In general, for Streptococcus pyogenes's Cas9 mismatches at last positions
(close to the PAM) strongly decrease the off-target's score. The formula consists of three factors:
T1 x T2 x T3:
||positions with mismatches.
||Effect of each mismatch position (see Table 2) (from Hsu et al., Nature Biotechnology 2013).
||averaged pairwise distance between mismatches approximated as:
||total number of mismatches.
The above formula takes into account the influence of the mismatches due to their position (T1), the effect of the mean pairwise
distance between mismatches (T2) and penalizes targets
with many mismatches (T3).
B. GENERAL CASE: off-target score calculation when there is no experimental evidence for mismatch's weights:
If there are no experimental evidence for the positional influence of mismatches, for each off-target, the probability of being a true secondary target
is estimated as above except that all positional weights are set to "0" so the factor of the equation
that depends on the mismatch's positions (T1) is always 1.
Also, in the general case, the "19" in T2 term of the formula is replaced by oligosize-1.
It is possible to introduce customized positional weights (values between 0 and 1, being "0" a totally allowed mismatch and "1" a totally forbiden mismatch)
for nucleases with experimental evidences known by the user... modify them at your own risk.
C. Aggregated score for guide candidates:
The sum of all Soff for a candidate guide is used for calculating
a global score (Sguide) as:
This Sguide can be used as a main criteria to select interesting gRNA candidates: the higher the
aggregate score for a gRNA candidate, the less "problematic" the off-targets will be.
Oligo alignments with no mismatches (on-targets) are not included
in the above formulas and have an imputed score (Soff) of 100.