Theodosius Dobzhansky Center for Genome Bioinformatics May 25, 2015 St. Petersburg State University Founded 1724 "Nothing in Biology makes any sense except in light of evolution" Theodosius Dobzhansky Genomic Medicine GWATCH - A new frontier in Genome Wide Association Analyses and Data Release • GWAS GWAS G W A TC H A suite of genome tools and programs to discover, view and assess hits from GWAS and WGS • Genome • Wide • Association • Tracks • Chromosome • Highway Display Everything beyond Manhattan GWATCH Snapshots2D 3D Tests Genetic association Tests SNPs 2D Snapshot -Chromosome 7 CFTR Region A heat plot of p-values for 131 ARG association tests 200+ SNPs in each region ~26,000 SNP-test com in PROX1 region chr 1 PROX1 2D Snapsho PROX1 polarized Promise of GWATC H • Automate analyses of GWAS and WGS • Improve display • Instant replication of hits without Bonferroni pena • Open data release without violating patient privacy •Let’s take a ride along a chromosome GWATCH Genetic Diversity of the Russian Mycobacterium tuberculosis strains GMTV Database Ekaterina Chernyaeva St. Petersburg State University 11.06.2015 19 MTBC History • MTBC emerged about 70,000 years ago, accompanied migrations of anatomically modern humans out of Africa and expanded as a consequence of increases in human population density during the Neolithic period 11.06.2015 20 M. tuberculosis H37Rv genome SIZE: GENES: 4411532 b.p. 3993 protein coding 50 RNA coding GC CONTENT: 65% INSERTION ELEMENTS: 56 copies of IS-elements (Families IS3, IS5, IS21, IS30, IS110, IS256, ISL3, IS1535) MTBC genetic diversity • • 11.06.2015 MTBC is characterized by 99.9% similarity of 16S rRNA gene, but differ widely in terms of their host tropisms, phenotypes and pathogenicity 14 “Regions of Difference” which discriminate all MTBC members GMTV Database Genome-wide Mycobacterium tuberculosis Variation Database http://mtb.dobzhanskycenter.ru Includes 1084 M. tuberculosis genomes collected across Russia over 69,000 SNP or Indel variants Drug resistance information Clinical Data (diagnosis, HIV-status) Geographical information Gender Year of isolation 11.06.2015 22 TB Centers of surveillance Countries UK, USA, China, Canada, Portugal, Germany, Georgia, Uzbekistan, Netherlands, Malawi, Uganda, South Africa, Global collection (Ethiopia, Vietnam, Mexico, South Korea, Pakistan, Senegal, Cambodia, Gambia, Malaysia, Sri Lanka, Nepal, India, Ghana, Sierra Leone, Tanzania, Iran, Afghanistan, Turkey, Singapore, Burkina Faso, Turkmenistan, Colombia, Puerto Rico, Nicaragua, Mongolia, Indonesia, Thailand, Burma, Laos, The Philippines, Guatemala, Salvador, Somalia) 11.06.2015 24 Clinical Data in GMTV • HIV-status: - 11 isolates revealed from HIV-infected patients - 19 from HIV-negative • Clinical outcome: - 12 patients with extrapulmonary TB - 15 patients with pulmonary TB - 3 patients with pulmonary and extrapulmonary localizations. • Drug resistance: - 478 multiple drug resistance (MDR) - 60 Extensive Drug Resistance (XDR) 11.06.2015 25 Comparative Genomics 10,000 vertebrate genomes Coming Now Sequencing 10,000 Vertebrate Species Of circa 60,000 species • • • • • 3000 of 5,000 mammals 2000 of 10,000 birds 1300 of 8,500 reptiles 700 of 6,500 amphibians 3000 of 30,000 fish Role of G10K Community of Scientists • Gather Voucher specimens • Identify species’ biological communities • Set Standards for Genome: – Assembly – Annotation – Release on Browser • • • • Monitor progress Rapid data release Raise funds Spawn Offspring….. Felidae Genome Consortium 20,343 O'Brien, S. J. , Wildt, D. E. , Goldman, D. , Merril, C. R. , and Bush, M. : The Cheetah genome-Statistics • Namibian Cheetah Chewbacca-75X coverage Illumina HiSeq BGI • Six cheetahs seqiuenced ~6x coverage – Tanzania -3 - A. jubatis rainei – Namibia -3 -A. jubatis jubatis – 8 mate pair libraries • N50 – Contigs 28.2 kbp – Scaffolds 3.1Mbp • Estimated genome size – 25x raw reads 2.395 GBP – Assembly 2.375 Gbp • Assembly • SOAP denovo • Assisted assembly with FCA 6.0 – RH map 3000 markers – SNP array LM 60,000 markers SNV frequency in different Mammal genomes 0,0025 0,002 0,0015 0,001 0,0005 0 Feline Genome Project May 25 2015 FELIDAE GENOMES sequenced or promised Species genome sequence Where done Publication status plans Common name Felis catus Domestic cat Wash U GigaScience 2014;PNAS 2014;Gen Res 2009 Felis silvestris Wildcat NIAAA Analysis Acinonyx jubatus Cheetah BGI Submitted Panthera leo Lion Korea Nature Comm 2014 Panthera leo Af & Asian lion BGI Writing up now Panthera tigris Tiger Korea Nature Comm 2014 Panthera uncia Snow Leopard Korea Nature Comm 2014 Panthera pardus Leopard Hudson Alpha; Moleculo Illumina Just started Panthera onca Jaguar Univ Porte Allegra Analyses Lynx pardalis Iberian Lynx Spain Writing up now Puma concolor Florida panther & BGI Just started Puma concolor Western Puma UCSC Just started Neofelis nebulosa Clouded leopard Smithsonian gatherring samples Neofelis diardi Sunda Cl.Leopard Smithsonian gatherring samples Prionailuris bengalensis Leopard cat Natural History Museum of Denmark, gatherring samples Prionailuris viverrina Fishing cat Natural History Museum of Denmark, gatherring samples Caracal caracal caracal Natural History Museum of Denmark, gatherring samples Prionailuris rubiginosis Rusty Spotted cat Natural History Museum of Denmark, gatherring samples Crocuta crocuta Hyena BGI Pending; read done at BGI dog Broad Nature Chinese pangolin Wash U analyses Malayan pangolin u Malaysia Analyses Canis familiaris Manis pentadactyla Manis javanica Conservation Genetics Shujin Luo & Jae-Haup Kim G10K Offspring 1000 fungal genomes Highlights of Dobzhansky Center 2012-2015 • • • • • • • • Center staffed to ~30 employees and occupied labs on Sredniy Prospekt Oct. 2012 Hosted 6 lab retreats and 30+ science visitors; chaired 6 Conferences Hosted international web sites. Led and coordinated Genome10K project Published > 140 peer reviewed papers in high ranking journals Initiated Genome Russia Project 2015-2-2018 Education in courses, Coursera Online, ConGen and G10K. Accessed and housed sequence and genotypes data from 30 species genomes and 12,000 human study participants in disease gene association studies 1000 genomes Studies of human genomic variation have great potential to identif genes that may underlie differences in disease resistance (e.g., MHC region) or drug metabolism Цели и задачи: Genome Russia Mission • 1. Создать биоибанк тканей и ДНК представителей крупных этнических групп, живущих в России. • 2. Составить каталог геномных вариаций в рамках этнических групп. • 3. Документированное описание структуры и распределения генов различных заболеваний, наследуемых • населением России • 4. Созадать Русский HapMap - проект, который поможет исследователям найти гены, связанные с заболеванием • человека и ответ на лекарственные препараты. • 5. Вычленение у характреситика специфических изменений геномной ДНК, характерных лишь для населения России. • 6. Изучить пути древних географических перемещений предков современных русских народов. • 7. Разработать инновационные биоинформатические алгоритмы и подходы, применимые к анализы болезней, вызванных охарактеризованными в ходе проекта генов Международный проект HapMap партнерство ученых и финансирующих учреждений Канады, Китая, Японии, Нигерии, Соединенного Королевства и Соединенных Штатах Америки по разработке общественного ресурса, который поможет исследователям найти гены, связанные с заболеванием человека и ответ на лекарственные препараты. Русский проект внесет свою лепту в международные усилия, сконцентрировавшись на особенностях и разнобразии российского населения Russian HapMap Genome Russia Concortium 21 groups from 14 cities 1000 RU Genomes ` And now I really would like to say "Thanks" to… G W A TC H Russian Mycobacterium tuberculosis GMTV Database Theodosius Dobzhansky Center for Genome Bioinformatics St. Petersburg St. Petersburg State Univers Founded 1724 Dobzhansky Center Excited author in the process of research.