Современные методы геномных исследований

реклама
Современные методы
геномных исследований и
секвенирования ДНК
Константин Валерьевич Крутовский
В.н.с. Лаборатории популяционной генетики
Института общей генетики им. Н. И. Вавилова РАН
Нayчный pyководитeль Научно-образовательного центра геномных
исследований и Лаборатории лесной геномики
Сибирского федерального университета
Профессор Отделения лесной генетики и селекции
Гёттингенского университета, Германия
Адьюнкт профессор Отделения по изучению и управлению экосистем
Техасского агро-механический университета, США
Лекция студентам СФУ 10 июня 2014 г.
1
Learning objectives and outcomes
• Brief introduction into genomics (structure,
objectives, etc.)
• Genome-sequencing methods:
‒ Sanger sequencing
‒ Next-generation sequencing (NGS)
technology
• de novo sequencing, resequencing, and
target sequencing
Лекция студентам СФУ 10 июня 2014 г.
2
Что такое «Геномика»?
• Термин «геном» (genome) был предложен немецким ботаником проф. Hans
Winkler (1877- 1945) в 1920 г. (University of Hamburg), который объединил
термины «ген» (“gene”) и «хромосома» (“chromosome”) для обозначения
одновременно всех генов во всех хромосомах ядра клетки
• Термин «геномика» (genomics) был предложен относительно недавно в 1986 г.
Thomas Roderick (Jackson Laboratory, USA) для нового журнала Genomics и
описания научной дисциплины связанной с секвенированием, картированием и
анализом генома
• Геномика более широкое понятие в настоящее время и охватывает сравнение
геномов разных видов (comparative genomics), их эволюцию (evolutionary
genomics) и функционирование генома в целом (functional genomics)
Геномика – это изучение генов и их функций в их полной
совокупности и взаимодействии
Лекция студентам СФУ 10 июня 2014 г.
3
Основы геномной структуры
Ген
Строчка в тексте/Предложение
(состоящее из 4-х «букв»-нуклеотидов A, T, C и G,
и 3-х буквенных «слов»-триплетов или кодонов)
Хромосома
Глава
Геном
Генофонд
Основная задача геномики - полное секвенирование и расшифровка генома
Лекция студентам СФУ 10 июня 2014 г.
4
Современные методы секвенирования ДНК
• Выпуск с 2005 г. коммерческих высокопроизводительных секвенаторов ДНК на основе
новых технологий массивного параллельного
секвенирования компаниями Roche и Illumina
• Создание в 2012 г. Научно-образовательного центра
геномных исследований Сибирского федерального
университета при поддержке Отдела генетики и
селекции Центра защиты леса Красноярского края и
Лаборатории лесной генетики и селекции Института
леса им. В. Н. Сукачева СОРАН (http://genome.sfukras.ru) оборудованного самым производительным
секвенатором Illumina HiSeq 2000
Лекция студентам СФУ 10 июня 2014 г.
5
Проблема полногеномного секвенирования с помощью новых секвенаторов:
возможность секвенировать только короткими фрагментами 100-150 но
Выделение
тотальной
геномной ДНК
Дополнительное
фрагментирование
ДНК
Геном
Длинные
фрагменты
ДНК
Биоинформатическая обработка на
суперкомпьютерах миллионов «ридов»
и сборка их в контиги и скаффолды
Короткие
фрагменты
ДНК
Приготовление
библиотек для
секвенирования
и кластеризация
(cBot)
Секвенирование на HiSeq 2000 и
генерация сотен миллионов
коротких прочтений («ридов»)
Референсный
Геном
Лекция студентам СФУ 10 июня 2014 г.
6
Nitrogenous, Nucleoside and Nucleotide Bases
Нуклеотид
(Азотистые основания )
(nucleoside with a
phosphate at 5’ carbon)
пуриновые А.о.:
(A)
(G)
аденин
гуанин
пиримидиновые А.о.
(C)
цитозин
тимин
(T)
(U) урацил
Nucleoside
(nitrogenous base linked to a 2-deoxyD-ribose at 1’ carbon)
Нуклеозиды —гликозиламины, содержащие азотистое
основание, связанное с сахаром (рибозой или дезоксирибозой).
Лекция студентам СФУ 10 июня 2014 г.
7
Synthesis of a Nucleotide:
Adenosine monophospate (AMP)
Лекция студентам СФУ 10 июня 2014 г.
8
Nucleotide polymer
• DNA Polymerase
“backbone”
цепь
“backbone”
цепь
A
T
G
C
Phosphodiester bond
C
G
Лекция студентам СФУ 10 июня 2014 г.
9
DNA Sequencing Methods
1nd generation sequencing methods:
─ chemical degradation of nucleotides method (Allan Maxam and Walter Gilbert, 1977)
─ chain-termination or dideoxy method (Frederick Sanger, 1977)
2nd generation high-throughput massively parallel shotgun sequencing methods:
– sequencing-by-synthesis methods:
 pyrosequencing by Jonathan Rothberg (Marguilis et al. 2005) (454 Life Sciences, a subsidiary of
Roche Diagnostics, originally a subsidiary of CuraGen Corporation) based on fixing fragmented
(nebulized) and adapter-ligated DNA fragments to small DNA-capture beads PCR amplified in a
water-in-oil emulsion in a PicoTiterPlate, and then sequenced using DNA polymerase, ATP
sulfurylase, and luciferase (to generate light for detection of the individual nucleotides added to the
template DNA) and adding sequentially 4 DNA nucleotides in a fixed order using the Genome
Sequencer FLX Instrument (there is also a downscaled Junior version);
 based on reversible dye-terminators, the DNA are extended one nucleotide at a time followed by
image acquisition - the camera takes images of the fluorescently labeled nucleotides, then the dye
along with the terminal 3' blocker is chemically removed from the DNA, allowing the next cycle
(Solexa, now Illumina) using Illumina Genome Analyzer II, HiSeq and MiSeq instruments;
 semiconductor sequencing developed by Ion Torrent Systems Inc. founded by Jonathan Rothberg
(now Life Technologies), based on the detection of hydrogen ions that are released during the
DNA synthesis, as opposed to the optical methods used in other sequencing systems (Ion Torrent
and Ion Proton Sequencers - Personal Genome Machines).
– sequencing-by-ligation method developed by Applied Biosystems (now Life Technologies), the
DNA is amplified by emulsion PCR and then sequenced using the SOLiD Instrument.
Лекция студентам СФУ 10 июня 2014 г.
10
DNA Sequencing Methods
3rd generation single molecular (SM) based sequencing methods:
– sequencing-by-synthesis methods:
 SM method by Helicos Biosciences uses DNA fragments with added poly-A tail
adapters which are attached to the flow cell surface. The next steps involve extensionbased sequencing with cyclic washes of the flow cell with fluorescently labeled
nucleotides (one nucleotide type at a time, as with the Sanger method). The reads are
short, up to 55 bp per run, but there are recent improvements;
 Single molecule real time (SMRT) sequencing by Pacific Biosciences is based on the
sequencing by synthesis of the DNA in zero-mode wave-guides (ZMWs) - small welllike containers with the signal capturing tools located at the bottom of the well using
DNA polymerase attached to the ZMW bottom and producing reads of up to 15 Kbp,
with mean read lengths of 2.5 to 2.9 Kbp;
4th generation single molecular (SM) based sequencing methods:
Nanopore DNA sequencing by Oxford Nanopore Technologies is based on the readout
of electrical signals occurring at nucleotides passing through pores in membranes created
by the pore-forming protein α-hemolysin covalently bound with cyclodextrin within the
nanopore that will bind transiently to the DNA molecule being detected. The DNA passing
through the nanopore changes its ion current. Each type of the nucleotide blocks the ion
flow through the pore for a different period of time (GridION system and miniaturised
MinION instrument).
Лекция студентам СФУ 10 июня 2014 г.
11
Dideoxy (Sanger) Method
Dideoxynucleotides without a hydroxyl
group at 3'-end, which prevents strand
extension are used together with normal
nucleotides in the sequencing reaction:
PPP
O
5'
CH2
O
BASE
Frederick Sanger
1918 –2013
OH
3'
Лекция студентам СФУ 10 июня 2014 г.
12
Dideoxy (Sanger) Method
As a result there is a mixture of fragments of different size in a sequencing reaction:
Sequencing cycle:
• 94°C: DNA denaturing
• 45°C: primer annealing
A
A
A
T
T
T
• 60-72°C: thermostable DNA
polymerase primer extension
Repeat 25-35 times
ddATP in the reaction: a ddA will be
occasionally added to the growing strand
whenever there is a T in the template strand
Лекция студентам СФУ 10 июня 2014 г.
13
How to visualize DNA fragments?
• Radioactivity
– radioactive isotope labeled primers (kinase with 32P)
– radioactive isotope labeled dNTPs (gamma 35S or 32P)
• Fluorescence
– ddNTPs chemically synthesized to contain fluorescent dye
– each ddNTP fluoresces at a different wavelength allowing
identification
• Polyacrylamide gel electrophoresis - high resolution of
fragments differing by a single dNTP
– slab gels
– capillary gels: require only a tiny amount of sample to be
loaded, run much faster than slab gels, best for high
throughput sequencing
Лекция студентам СФУ 10 июня 2014 г.
14
Sequencing gel electrophoresis of DNA fragments amplified
using radioactive isotope labeled primers or ddNTP
• Radioactively labelled
primer or dNTPs in
sequencing reaction
• 4 different ddNTPs used in 4
separate reactions,
respectively
• Separation of sequencing
products by slab gel
electrophoresis
• Autoradiography
Лекция студентам СФУ 10 июня 2014 г.
15
Sl b
l
An automated sequencer using
mixed ddNTPs labelled by 4
different fluorescent dyes
Single lane or capillary output:
Slab or capillary gel electrophoresis:
Лекция студентам СФУ 10 июня 2014 г.
17
Лекция студентам СФУ 10 июня 2014 г.
18
Automated Version of the Dideoxy Method
ABI PRISM 3730xLGenetic Analyzer
96 capillaries
Лекция студентам СФУ 10 июня 2014 г.
19
Dideoxy (Sanger) Method
Advantages:
• relatively long fragments (500-750 bp)
• low frequency of sequencing errors (“gold standard”)
Disadvantages:
• expensive
• laborious
• low productivity
ABI PRISM Genetic Analyzers:
Лекция студентам СФУ 10 июня 2014 г.
20
Next Generation Sequencing (NGS)
Overview of Next generation sequencing (NGS) technologies:
second, third and fourth generations
Second generations:
•
Sequencing-by-synthesis using luciferase and light signal detection
- Pyrosequencing (454 / Roche)
•
Sequencing-by-synthesis using reversible nucleotide terminators
and fluorescent signal detection - Illumina sequencing technology
•
Application of Illumina next-generation sequencing for marker
discovery and genotyping




Whole genome resequencing
Whole transcriptome sequencing
Genotyping by Sequencing
Genomic complexity reduction and target resequencing
Лекция студентам СФУ 10 июня 2014 г.
21
Available 2nd-Generation Sequencing Technologies
500-800 bp
www.roche-applied-science.com
www.illumina.com
/2x300
1 x 75 bp
2 x 50 bp
www.lifetechnologies.com
Niedringhaus et al. (2011) Landscape of Next-Generation Sequencing Technologies.
Anal. Chem. 83: 4327–4341
Illumina dominates the market with 71% of the market share (revenue
of about $1.6 billion in 2013), followed by ABI Life Technologies at
16%, Roche at 10%, and PacBio at 3%.
Лекция студентам СФУ 10 июня 2014 г.
22
2nd Generation Sequencing:
Pyrosequencing method – Basic Principle
• Pyrosequencing is based on the generation of light signal through release of diphosphate
or pyrophosphate (PPi) on nucleotide addition: (NA)n + dNTP  (NA)n+1 + PPI
• PPi is used to generate ATP from adenosine phosphosulfate (APS): APS + PPI  ATP
• ATP and luciferase generate light by conversion of luciferin to oxyluciferin.
pyrophospate
DNA Polymerase I (from E. coli)
adenosine phosphosulfate
adenosine triphosphate
(from fireflies; oxidizes
luciferin and emits light)
– visible light is generated and is
proportional to the number of
incorporated nucleotides
– 1 pmol DNA = 6×1011 ATP = 6×109
photons at 560 nm
Лекция студентам СФУ 10 июня 2014 г.
23
Pyrosequencing: Preparation of DNA library
• shearing genomic DNA to small DNA fragments 500-800 bp long
• attaching single DNA fragments to very small plastic beads (one
fragment per bead)
• emulsion-based clonal PCR amplification (emPCR) of the DNA on
each bead to cover each bead with a cluster of identical fragments to
enhance the light signal
• placing each bead in a separate well on a PicoTiterPlate, a fiber optic
chip with up to 1.6 million wells (A mix of enzymes such as DNA
polymerase, ATP sulfurylase, and luciferase are also packed into the
well.)
• The PicoTiterPlate is then placed into the GS FLX System for
sequencing.
Лекция студентам СФУ 10 июня 2014 г.
24
Pyrosequencing – Basic Principle
• Sequence-by-synthesis via DNA polymerase directed chain extension,
one base at a time in the presence of a reporter (luciferase). Each
nucleotide is added separately in a separate cycle.
• Only one of four will generate a light signal. Luciferase will emit a
photon of light in response to the pyrophosphate (PPi) released upon
nucleotide addition by DNA polymerase
• The remaining nucleotides are either (1) washed away or (2) removed
enzymatically by apyrase (nucleotide degradation enzyme).
• The light signal is recorded on a pyrogram:
DNA sequence: T A G T CC GG A
Nucleotide added: A T C A G
C T
Лекция студентам СФУ 10 июня 2014 г.
25
Pyrosequencing Results:
Height of peak
indicates the number
of dNTPs added
This sequence: TTTGGGGTTGCAGTT
Лекция студентам СФУ 10 июня 2014 г.
26
Pyrosequencing
25 Mbp in about 4 hours
APS = Adenosine phosphosulfate
http://www.youtube.com/watch?v=bFNjxKHP8Jc
Лекция студентам СФУ 10 июня 2014 г.
27
Pyrosequencing
Sequencing by synthesis
• Advantages:
–
–
–
–
–
–
accurate - low frequency of sequencing errors
relatively long fragments (500-800 bp)
parallel processing
automated
no need for labeled primers and nucleotides
no need for gel electrophoresis
• Disadvantages:
–
–
–
–
expensive
laborious
low productivity
nonlinear light response after more than 5-6 identical nucleotides
Лекция студентам СФУ 10 июня 2014 г.
28
Illumina: Preparation a DNA library for
sequencing - Cluster formation
Samples are loaded in a
flowcell with 8 lanes and
clusters attached to the cell
surface are generated and
prepared for sequencing
using cBot instrument
Hybridization and
second strand
synthesis
Flowcell with 8 separate lanes and
primers complementary to adapters
immobilized on the flowcell surface.
Bridge PCR
Linearization and primer
annealing
Лекция студентам СФУ 10 июня 2014 г.
cBOT
HiSeq2000
29
Illumina: Sequencing by Synthesis (SBS)
• Sequencing is performed by addition of all
4 labeled reversible terminators (1 base is
added per cycle for all the fragments).
• The block and phluorophore are then
removed, and another cycle starts up to the
desired read length.
HiSeq: 120-150 Millions of clusters per lane
(8 lanes per flowcell); up to 150 bp × 2 (total
output = 600 Gb for a full run with 2
flowcells)
MiSeq: 25 Millions of clusters per flowcell
with a single lane; up to 300 bp × 2 (total
output = 15 Gb for a full run)
Лекция студентам СФУ 10 июня 2014 г.
30
Illumina sequencing technology in 12 steps
http://www.illumina.com/downloads/SS_DNAsequencing.pdf
Лекция студентам СФУ 10 июня 2014 г.
31
1. Fragment genomic DNA
and ligate adapters
2. Attach DNA to surface
3. Bridge amplification
DNA
adapters
4. Fragments become double
stranded
5. Denature the doublestranded molecules
6. Complete amplification
Randomly fragment genomic DNA and ligate
adapters to both ends of the fragments
Лекция студентам СФУ 10 июня 2014 г.
32
adapter
DNA
fragment
adapter
dense
lawn of
primers
1. Fragment genomic DNA and
ligate adapters
2. Attach DNA fragments to
surface
3. Bridge amplification
4. Fragments become double
stranded
5. Denature the doublestranded molecules
6. Complete amplification
Bind single-stranded fragments randomly to
the inside surface of the flow cell channels
Лекция студентам СФУ 10 июня 2014 г.
33
1. Fragment genomic DNA and
ligate adapters
2. Attach DNA fragments to
surface
3. Bridge amplification
4. Fragments become double
stranded
5. Denature the doublestranded molecules
6. Complete amplification
Add unlabeled nucleotides and enzyme to
initiate solid-phase bridge amplification
Лекция студентам СФУ 10 июня 2014 г.
34
1. Fragment genomic DNA and
ligate adapters
2. Attach DNA fragments to
surface
attached terminus free
terminus
attached
terminus
3. Bridge amplification
4. Fragments become double
stranded
5. Denature the double- stranded
molecules
6. Complete amplification
The enzyme incorporates nucleotides to build doublestranded bridges on the solid-phase substrate
Лекция студентам СФУ 10 июня 2014 г.
35
1. Fragment genomic DNA and
ligate adapters
2. Attach DNA fragments to
surface
attached
attached
3. Bridge amplification
4. Fragments become double
stranded
5. Denature the doublestranded molecules
6. Complete amplification
Denaturation leaves single-stranded
templates anchored to the substrate
Лекция студентам СФУ 10 июня 2014 г.
36
1. Fragment genomic DNA and
ligate adapters
2. Attach DNA fragments to
surface
3. Bridge amplification
4. Fragments become double
stranded
5. Denature the double- stranded
molecules
clusters
6. Complete amplification
7. Reversed complement
Several million dense clusters of double- strands are cleaved and
stranded DNA are generated in each
washed away
channel of the flow cell
Лекция студентам СФУ 10 июня 2014 г.
37
8. First nucleotide base adding and
the first nucleotide base synthesis
9. Image first base
10. The block and phluorophore are then
removed
11. Second nucleotide base adding and
the second nucleotide base synthesis
12. Image second base
13. The block and phluorophore are then
removed
Laser
14. Sequencing over multiple chemistry
cycles
The first sequencing cycle begins
by adding four labeled reversible
terminators, primers, and DNA 15. Align data
Лекция студентам СФУ 10 июня 2014 г.
polymerase
38
8. First nucleotide base adding and the
first nucleotide base synthesis
9. Image first base
11. The block and phluorophore are then
removed
12. Second nucleotide base adding and
the second nucleotide base synthesis
13. Image second base
14. The block and phluorophore are then
removed
15. Sequencing over multiple chemistry
cycles
After laser excitation, the emitted 16. Align data
fluorescence from each cluster is
captured and the first base is identified
Лекция студентам СФУ 10 июня 2014 г.
39
8. First nucleotide base adding and the
first nucleotide base synthesis
9. Image first base
10. The block and phluorophore are then
removed
11. Second nucleotide base adding and
the second nucleotide base synthesis
12. Image second base
13. The block and phluorophore are then
removed
Laser
14. Sequencing over multiple chemistry
cycles
The next cycle repeats the
15. Align data
incorporation of four labeled
reversible terminators, primers,
and DNA polymerase Лекция студентам СФУ 10 июня 2014 г.
40
8. First nucleotide base adding and the
first nucleotide base synthesis
9. Image first base
10. The block and phluorophore are then
removed
11. Second nucleotide base adding and
the second nucleotide base synthesis
12. Image second base
13. The block and phluorophore are then
removed
14. Sequencing over multiple chemistry
cycles
After laser excitation the image is
captured as before, and the identity
of the second base is recorded.
15. Align data
Лекция студентам СФУ 10 июня 2014 г.
41
8. First nucleotide base adding and the
first nucleotide base synthesis
9. Image first base
10. The block and phluorophore are then
removed
11. Second nucleotide base adding and
the second nucleotide base synthesis
12. Image second base
13. The block and phluorophore are then
removed
14. Sequencing over multiple
chemistry cycles
15. Process sequence reads, generate
contigs, map (align) reads or contigs
to the reference sequence, if it is
available
www.youtube.com/watch?v=l99aKKHcxC4
42
The sequencing cycles are repeated
to determine the sequence of bases
in a fragment, one base at a time.
Лекция студентам СФУ 10 июня 2014 г.
8. First nucleotide base adding and the
first nucleotide base synthesis
9. Image first base
Reference
sequence
10. The block and phluorophore are then
removed
11. Second nucleotide base adding and
the second nucleotide base synthesis
Unknown
variant
identified
and called
Known
SNP
called
12. Image second base
13. The block and phluorophore are then
removed
14. Sequencing over multiple chemistry
cycles
15. Process sequence reads, generate
The data are aligned and compared
contigs, map (align) reads or contigs
to a reference, and sequencing
to the reference sequence, if it is
differences are identified.
available
43
Лекция студентам СФУ 10 июня 2014 г.
Single, paired-end and multiplexed sequencing
Single read sequencing:
Primer 1
Adapter 1 with
annealing site
for Primer 1
Pair-end sequencing:
Primer 1
Adapter 2
Adapter 1 with
annealing site
for Primer 1
• resequencing
• expression quantification
Adapter 2 with
annealing site
for Primer 2
Primer 2
• de novo sequencing for better assembly
• allow to resolve better transcript isoforms
• can be used to detect gene fusion
Multiplexed sequencing:
Primer 1
Adapter 1 with
annealing site
for Primer 1
Adapter 2 with
Barcode annealing site
for Primer 2 and
barcode tag
Primer 2
• The sequencing of a 6-nucleotides
barcode on one of the adapters allows to
identify samples pooled together and
sequenced in a single lane.
• Up to 96 samples per lane.
Лекция студентам СФУ 10 июня 2014 г.
44
Illumina Sequencers and Genotypers
Лекция студентам СФУ 10 июня 2014 г.
45
Illumina Sequencers
Лекция студентам СФУ 10 июня 2014 г.
46
Illumina HiSeq 1000, 2000 or 2500
HiSeq 2000
Лекция студентам СФУ 10 июня 2014 г.
47
Illumina MiSeq Desktop Sequencer
Лекция студентам СФУ 10 июня 2014 г.
48
Third-Fourth Generation Sequencing Technologies
• PacificBiosciences (www.pacificbiosciences.com):
PacBioRS
• Complete Genomics (www.completegenomics.com)
(only for human genome)
• Ion Torrent and Ion Proton (Life/ABI)
(www.iontorrent.com): Ion Personal Genome Machine
(PGM™) sequencer
• Oxford Nanopore (www.nanoporetech.com/):
GridION (Fourth generation)
Лекция студентам СФУ 10 июня 2014 г.
49
PacBio Single Molecule Real Time
(SMRT) Sequencing
ZMW = Zero Mode Waveguide
http://www.youtube.com/watch?v=v8p4ph2MAvI
Лекция студентам СФУ 10 июня 2014 г.
50
PacBio Sample prep
Лекция студентам СФУ 10 июня 2014 г.
51
PacBio SMRT Sequencing
Лекция студентам СФУ 10 июня 2014 г.
52
PacBio Base Modification Detection
(Application in development)
methylated
unmethylated
A methylated adenine in the template (top) slows the incorporation of a thymine in the replicating strand of DNA.
The rate of incorporation can be compared to an unmodified version of the same template (bottom) which has a
much faster thymine addition. Differences between the modified and unmodified incorporation rates indicate
potential sites of modified bases. These differences often span multiple bases, creating a distinctive signature.
Flusberg et al. (2010) Nature Methods 7: 461-465
Лекция студентам СФУ 10 июня 2014 г.
53
Ion Torrent & Proton: Personal Genome Machine (PGM)
When a nucleotide is added to a DNA template and is then incorporated into a
strand of DNA, a hydrogen ion is released. The charge from that ion changes the
pH of the solution, which can be detected by a ion sensor.
• reads up to 200-400 (Torrent) or 200 (Proton) bp long
• from 30 Mb to >2 Gb per run (different Torrent chips)
• 10 Gb (Proton)
• max 4-6 M reads (4-7 hours; Torrent), 60-80 M reads (2-4 hours; Proton)
• useful for Amplicon-seq, smallRNA-seq, small genomes sequencing (i.e. bacterial, virus)
(www.iontorrent.com)
http://www.youtube.com/watch?v=yVf2295JqUg
Лекция студентам СФУ 10 июня 2014 г.
54
Oxford Nanopore: The GridION system
• Electrical single-molecule sequencing
• Protein nanopores used as biosensor
• Exonuclease sequencing: combining a protein
nanopore and processive enzyme for the
sequential identification of DNA bases as they
pass through the pore
• Oxford Nanopore signed a commercialisation
agreement with Illumina for this technology,
however commercialisation timelines have not
been disclosed
Лекция студентам СФУ 10 июня 2014 г.
55
The GridION system: Nanopore sensing
T
C
A
Exonuclease sequencing: combining a protein nanopore and
processive enzyme (e.g., a exonuclease) for the sequential
identification of DNA bases as they pass through the pore.
Лекция студентам СФУ 10 июня 2014 г.
56
The GridION system: Electrical sequencing trace
• The system is designed to give ultra-high read length (tens of Kb)
• First generation commercial system is designed to achieve tens of
Gb per day
Лекция студентам СФУ 10 июня 2014 г.
57
Third-Forth Generation Sequencing Technologies
35 bp
200-400 bp
Niedringhaus et al. (2011) Landscape of Next-Generation Sequencing Technologies. Anal.
Chem. 83: 4327–4341
Лекция студентам СФУ 10 июня 2014 г.
58
NGS Platform statistics
Instrument
Illumina HiSeq X (2 flow cells)
Illumina HiSeq 2500 - high output v4
Life Technologies SOLiD – 5500xl
Illumina NextSeq 500
Oxford Nanopore GridION 8000
Ion Torrent - Proton III
Illumina MiSeq v3
Ion Torrent – PGM 318 chip
Ion Torrent – PGM 316 chip
Oxford Nanopore MinION
454 FLX+
Ion Torrent – PGM 314 chip
Pacific Biosciences RS II
Amplification Run time
BridgePCR
BridgePCR
emPCR
BridgePCR
None - SMS
emPCR
bridgePCR
emPCR
emPCR
None - SMS
emPCR
emPCR
None - SMS
3 days
6 days
8 days
30 hrs.
varies
6 hrs.
55 hrs.
7.3 hrs.
4.9 hrs.
≤6 hrs.
20 hrs.
3.7 hrs.
2 hrs.
M of Read /
run
Bases /
read
6000
2000
1410
400
10
500
22
4.75
2.5
0.1
1
0.475
0.03
300
250
110
300
10000
175
600
400
400
9000
650
400
3000
Reagent
Reagent Reagent Cost Gbp / cost / Gb,
Cost / run, $ Cost / Gb, $ /M reads, $
run
$
12750
14950
10503
4000
1000
1000
1442
874
674
900
6200
474
100
7
30
68
33
10
11
109
460
674
1000
9538
2495
1111
2
7
7
10
100
2
66
184
270
9000
6200
998
3333
1800
500
155.1
120
100
87.5
13.2
1.9
1
0.9
0.65
0.19
0.09
7
30
68
33
10
11
109
460
674
1000
9538
2495
1111
‘Benchtop’ NGS technology: will substitute the capillary electrophoresis
(CE) sequencers for common experiments, such as Illumina libraries
verification, amplicon sequencing and small genome sequencing.
Лекция студентам СФУ 10 июня 2014 г.
59
Overview of genome sequencing analysis
Genome
whole genome
shortgun de novo
sequencing
whole genome
resequencing
target
sequencing
de novo
assembling
assembling via
mapping to a
reference sequence
assembling via
mapping to a
target sequence
Genome annotation
(repetitive DNA
elements, proteincoding &other genes)
Лекция студентам СФУ 10 июня 2014 г.
60
Applications of Genome Sequencing
Purpose
Template
Example
genome sequencing
The Genome 10K project (sequencing 10,000
vertebrate species genomes, approximately one for
every vertebrate genus); 1K Plant Genomes Project,
10K fungi genomes
De novo
sequencing
Resequencing
ancient DNA
Extinct Neanderthal genome
metagenomics
Human guts
whole genomes
Sequencing 1000 individual human genomes project
genomic regions
Assessment of genomic rearrangements or diseaseassociated regions
somatic mutations
Transcriptome
full-length transcripts
Noncoding RNAs
Epigenetics
Methylation changes
Sequencing mutations in cancer
Defining regulated messenger RNA transcripts
Identifying and quantifying microRNAs in samples
Measuring methylation changes in cancer
Table 13.15 in Bioinformatics and Functional Genomics by J. Pevsner (2nd ed., Wiley-Blackwell, 2009) p. 538
Лекция студентам СФУ 10 июня 2014 г.
61
Genome sizes in nucleotide base pairs
plasmids
viruses
bacteria
fungi
plants
algae
insects
The size of the human
genome is ~ 3.2 X 109 bp
mollusks
bony fish
The human genome is
thought to contain ~21,000
protein and ~2,000 RNA
coding genes.
amphibians
reptiles
birds
mammals
104
105
106
107
108
109
conifers
1010
1011
1012
http://www3.kumc.edu/jcalvet/PowerPoint/bioc801b.ppt
Лекция студентам СФУ 10 июня 2014 г.
62
Eukaryotic completed genome projects > 2 Gb
Genus, species
Subgroup
Size (Mb)
#chr
Pinus taeda
Land Plants
20150
12
Loblolly pine
Picea abies
Land Plants
20000
12
Norway spruce
Triticum urartu
Land Plants
4940
7
wheat A-genome progenitor
Aegilops tauschii
Land Plants
4360
7
Tausch's goatgrass
Macropus eugenii
Mammals
3800
8
tammar wallaby
Oryctolagus cuniculus
Mammals
3500
22
rabbit
Land Plants
3480
12
pepper
Cavia porcellus
Mammals
3400
31
guinea pig
Homo sapiens
Mammals
3200
23
human
Pan troglodytes
Mammals
3100
24
chimpanzee
Bos taurus
Mammals
3000
30
cow
Dasypus novemcinctus
Mammals
3000
32
nine-banded armadillo
Loxodonta africana
Mammals
3000
28
African savanna elephant
Sorex araneus
Mammals
3000
20
European shrew
Rattus norvegicus
Mammals
2750
21
rat
Nicotiana sylvestris
Land Plants
2636
12
tobacco
Mammals
2400
39
dog
Land Plants
2365
10
corn
Capsicum annuum
Canis familiaris
Zea mays
Лекция студентам СФУ 10 июня 2014 г.
Common name
63
База данных геномных проектов http://www.genomesonline.org
•
•
•
•
Complete Projects 6366
Permanent Drafts 16884
Incomplete Projects 24508
Targeted Projects
920
Лекция студентам СФУ 10 июня 2014 г.
• Organisms 48772
Archaea
873
Bacteria
35373
Eukarya
8155
64
GC content varies across genomes
Bacteria
Number of species
in each GC class
10
5
Plants
5
Invertebrates
3
Vertebrates
10
5
20
30
40
50
60
70
80
GC content (%)
Fig. 13.15 in Bioinformatics and Functional Genomics by J. Pevsner (2nd ed., Wiley-Blackwell, 2009) p. 556
Лекция студентам СФУ 10 июня 2014 г.
65
Genomic markers development and genotyping using next generation sequencing
bar-coded DNA
or mRNA library
pools
individual genomic or
mRNA (tissue-specific or
total)
target-enriched
genomic DNA
bar-coded pools
individual
genomic DNA
LOB 2
LOB 3
est9217
20 cM
LOB 1
LOB 4
LOB 5
image
analysis
LOB 6
est9155
est1623
est8569
est9022
est2253
est1576
est9157
est0048
est8972
est8647
est8500
est8513
est8613
est8596
est0893
est8747
est8837
est9092
est8614
est8612
est8939
est1955
LOB 11
LOB 12
est0674
est8580
est9076
est8732
est2615
est8510
est9036
est1626
est1165
est2610
est1950
est1764
est0624
est8436
est1635
est1643
est2290
est0149
est0500
est1750
est8540
est8564
est9044
est9113
estce8
est8702
est8415
est8429
genotyping and
phenotyping in
mapping or natural
population
LOB 10
est1956
est9156
est9151
est2781
est1454
est9008
est8560
est8781
LOB 9
est9034
est8473
est0739
est2166
est0066
est9102
est9198
LOB 8
est8907
estce9
est0464
LOB 7
est2358
est8531
est1934
est8843
est9053
est8537
next generation
high-throughput
massively parallel
DNA sequencing
est2274
est0606
est8738
est8887
est2009
est8886
est8565
est8725
estc9
est8898
quantitative trait loci
(QTL), candidate gene or
association mapping
high-throughput
SNP genotyping
SNP marker
development
Лекция студентам СФУ 10 июня 2014 г.
DNA chromatograms
sequence
processing
and analysis
66
Genomic DNA target enrichment for high-throughput massively parallel
sequencing using the Agilent's SureSelect Target Enrichment System
• Our library contained 647,634 baits (2X coverage) representing 35,550 unigenes
• Capture target size ≈ 39 Mb
http://www.opengenomics.com/SureSelect_Target_Enrichment_System
Лекция студентам СФУ 10 июня 2014 г.
67
Genomic DNA enrichment for 35,550 genes in loblolly pine for highthroughput massively parallel sequencing using bar-coding and the
Agilent's SureSelect Target Enrichment System
647,634 oligonucleotide hybridization 120 bp long probes (baits) based on 35,550 unigenes build by Dr. Chun Liang (Miami University) were designed to target 39 Mb of gene space using Agilent Genomic Workbench software to gene enrich DNA libraries for sequencing
Example of 113 baits covering unigene #14
Лекция студентам СФУ 10 июня 2014 г.
68
USDA NIFA Climate Change Program 1: Regional Approaches to Climate Change
Project: “Integrating research, education and extension for enhancing southern pine
climate change mitigation and adaptation” http://www.pinemap.org
• 2,8 млн SNPs уже генотипировано в почти 40,000 генах ладанной сосны в моей
лаборатории в Texas A&M University в этом проекте путём прямого секвенирования
геномной ДНК, обогащённой экзомными районами с помощью гибридизации
тотальной ДНК с 600 млн олигонуклетидных проб, представляющих почти полный
транскриптом (~40 тыс. экспрессируемых генов) ладанной сосны
• более чем 400 деревьях со всего ареала, профенотипированных по большому
числу адаптивных и селекционно-ценных признаков, а также изученных по
большому числу средовых факторов будут генотипированы по всем обнаруженным
SNPs для обнаружения аллелей и гаплотипов связанных с изменчивостью адаптивных
признаков, а также с устойчивостью к средовым факторам
• фактически, это означает переход от отдельных маркёров к полному
генотированию через секвенирование!
• эра маркёров заканчивается – наступает эра полногеномного секвенирования!
• популяционная геномика вместе с молекулярной экологией (экогеномикой)
позволят:
− обнаружить гены и аллели ответственные за адаптацию
− связать генотипы с адаптивными фенотипами и средой
Лекция студентам СФУ 10 июня 2014 г.
69
Спасибо за внимание!
По вопросам биоинформатики
обращайтесь к Юлии Путинцевой
yuliya-putintseva@rambler.ru
Лекция студентам СФУ 9 июня 2014 г.
70
Скачать