Informatica 30 (2006) 357–364 357
A Three-Phase Algorithm for Computer Aided siRNA Design
Saint Joseph College, West Hartford, CT 06117, USA
Superarray Bioscience Corporation, 7320 Executive Way, Frederick, MD 21704, USA firstname.lastname@example.org
Yufang Wang and Benjamin Ray Seyfarth
University of Southern Mississippi, Hattiesburg, MS 39406, USA Keywords: siRNA, RNA interference, three-phase, Smith-Waterman, BLAST Received: July 10, 2005
As our knowledge of RNA interference accumulates, it is desirable to incorporate as many selection rules as possible into a computer-aided siRNA-designing tool. This paper presents an algorithm for siRNA selection in which nearly all published siRNA-designing rules are categorized into three groups and applied in three phases according to their identified impact on siRNA function. This tool provides users with the maximum flexibility to adjust each rule and reorganize them in the three phases based on users’ own preferences and/or empirical data. When the generally accepted stringency was set to select siRNA for 23,484 human genes represented in the RefSeq Database (NCBI, human genome build 35.1), we found 1,915 protein-coding genes (8.2%) for which none suitable siRNA sequences can be found. Curiously, among these 1,915 genes, two had validated siRNA sequences published. After close examination of another 105 published human siRNA sequences, we conclude that (A) many of the published siRNA sequences may not be the best for their target genes; (B) some of the published siRNA may risk off-target silencing; and (C) some published rules have to be compromised in order to select a testable siRNA sequence for the hard-to-design genes.
Povzetek: Predstavljen je algoritem za obdelovanje genoma.
Since the seminal paper published by Craig C. Mello’s
group in 1998 , RNA interference (RNAi) has
emerged as a powerful technique to knock out/down the
expression of target genes for gene function studies in
various organisms [2,3,4]. What is truly remarkable
about the RNAi effect is that it is sequence-specific. This
means that as long as we know the sequence of the
transcript to be targeted, we can design a short doublestranded RNA (small interfering RNA or siRNA) to knock down, if not eliminate the expression of the target
gene without changing the genetic make-up of the cells.
Compared to the anti-sense oligonucleotide technology
developed earlier [5,6], RNAi is much more effective
because RNAi is achieved by catalytic components
within the cell [1,7,8,9].
Understandably, how to design the best siRNA has
become an intense competition between academic
research groups as well as commercial providers of
siRNA. The following is a summary of some major
designing rules published.
The length of functional siRNAs: The length of
siRNA ranges from 19 to 30 base pairs (bps)
[2,10,11]. Double stranded RNA longer than 30 bps
is likely to invoke an antiviral interferon response, a
general shut-down of the cellular translation instead
of gene-specific RNAi [12,13,14].
The GC content of functional siRNA: The optimal
GC content of siRNA should be between 30% and
55% [10,14,15]. GC-rich sequences, in general, have
the tendency to form quadruplex or hairpin
structures . Sequences with GC stretches over 7
in a row may form duplexes too stable to be
unwound [16,17,18,19]. On the other hand,
sequences with extremely low GC content cannot
form stable siRNA duplexes.
The thermo-stability bias at the 5’ end of the
antisense strand: Since it is desirable to have only
the antisense strand incorporated into the RISC
complex, lowering the thermo-stability at the 5’ end
of the antisense strand can promote helicase unwind
siRNA duplexes from this end [17,20,21].
Concerning tandem repeats and palindromes:
Since sequences containing tandem repeats or
palindromes may form internal fold-back structures,...
Please join StudyMode to read the full document