S162 Electrophoresis 2009, 30, S162–S173 Celebrating 30 years Nicolas Guex1 Manuel C. Peitsch1,2 Torsten Schwede3,4 1 Swiss Institute of Bioinformatics, Lausanne, Switzerland 2 Philip Morris International, Research and Development, Neuchâtel, Switzerland 3 Biozentrum, University of Basel, Switzerland 4 Swiss Institute of Bioinformatics, Basel, Switzerland Received March 3, 2009 Revised April 13, 2009 Accepted April 14, 2009 Automated comparative protein structure modeling with SWISS-MODEL and SwissPdbViewer: A historical perspective SWISS-MODEL pioneered the field of automated modeling as the first protein modeling service on the Internet. In combination with the visualization tool Swiss-PdbViewer, the Internet-based Workspace and the SWISS-MODEL Repository, it provides a fully integrated sequence to structure analysis and modeling platform. This computational environment is made freely available to the scientific community with the aim to hide the computational complexity of structural bioinformatics and encourage bench scientists to make use of the ever-increasing structural information available. Indeed, over the last decade, the availability of structural information has significantly increased for many organisms as a direct consequence of the complementary nature of comparative protein modeling and experimental structure determination. This has a very positive and enabling impact on many different applications in biomedical research as described in this paper. Keywords: Bioinformatics / Homology modeling / Protein structure / SWISS-MODEL / Swiss-PdbViewer DOI 10.1002/elps.200900140 1 Introduction Comparative protein structure modeling and experimental efforts complement each other with the goal of providing structural models for diverse applications in biomedical research. Stable, accurate, reliable and fully automated modeling pipelines are required to provide structural information for the rapidly growing amount of sequence data. SWISSMODEL pioneered the field of automated modeling as the first protein modeling service on the Internet (e-mail-based interface in 1991 and the first web-based interface in 1993). In combination with the visualization tool Swiss-PdbViewer (aka DeepView), it provides a fully integrated sequence to structure platform, which has been described in our 1997 paper ‘‘SWISS-MODEL and the Swiss-PdbViewer: An environment for comparative protein modelling’’ in Electrophoresis [1]. When the original SWISS-MODEL and Swiss-PdbViewer article was published, protein modeling was still a very specialized field, Correspondence: Professor Torsten Schwede, Swiss Institute of Bioinformatics, Biozentrum, University of Basel, Klingelbergstrasse 50/70, CH-4056 Basel, Switzerland E-mail: torsten.schwede@unibas.ch Fax: 141-61-267-15-84 Abbreviations: DAS, distributed annotation system; HMM, hidden Markov model; PDB, protein data bank; PSI, protein structure initiative; RMSD, root mean square deviation & 2009 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim mostly due to the necessity to use specialized hardware and software. Our goal was to hide much of the complexity of comparative modeling and make this technology accessible to a broader audience and to empower non-structural scientists to leverage the available molecular structure information to design experiments. Since the pioneering days, several other groups have followed suit and developed similar servers to automate a variety of algorithms and methods, focusing on different aspects of the modeling workflow: While programs like SWISS-MODEL [1–5] or COMPOSER [6] derive the model coordinates using information from aligned template fragments in Cartesian space, many servers are based on MODELLER, applying satisfaction of spatial restraint techniques to generate the model coordinates [7, 8]. The introduction of hidden Markov model (HMM) methods [9, 10] has significantly improved the sensitivity of template detection and accuracy of target–template alignments in comparative modeling. Several methods have been developed, which attempt to combine information from multiple template structures, e.g. through iterative clustering approaches [11], conformational space annealing methods [12] or by profile–profile threading alignment followed by iterative refinement of the assembly of threading fragments [13, 14]. Recently, methods originally developed for fragment based de novo modeling have been shown to be effective for comparative modeling [15] and refinement of structure models [16]. Authors appear in alphabetical order www.electrophoresis-journal.com General Electrophoresis 2009, 30, S162–S173 Several modeling servers for specialized tasks have been developed, e.g. modeling of antibodies [17, 18]. For a list of available modeling servers, please refer to [19, 20]. During the most recent critical assessment of techniques for protein structure prediction experiments (CASP), it became apparent that the best fully automated modeling methods have improved to a level where they challenge many human predictors in producing accurate models [14, 20, 21]. In the following paragraphs, we will describe the SWISS-MODEL protein structure prediction and analysis environment, which today consists of a modeling server [3], a web-based personalized workspace [2, 22], the visual front end Swiss-PdbViewer [1] and a repository of annotated comparative models [23–25]. 2 How SWISS-MODEL and SwissPdbViewer evolved over the last decade 2.1 Comparative protein structure modeling Homology (or comparative) protein structure modeling is the method of choice for generating reliable and accurate 3-D models of proteins that share significant sequence similarity with proteins of known structure. Automated modeling servers made different modeling algorithms easily accessible to the general user and removed the need to learn idiosyncratic software commands – making them valuable tools for both modeling experts and non-experts alike. By removing the individual personal expert bias, the development of automated modeling pipelines has made modeling reproducible. Moreover, the assessment of automated modeling methods on larger data sets allows estimating their expected accuracy [20, 26–29]. Today, all protein structure modeling approaches make use of one or more automated pipelines. The SWISS-MODEL pipeline consists, similarly to most homology modeling approaches, of the following steps: First, a library of experimental template structures is searched for templates sharing significant sequence similarity with the targeted protein, and the most suitable template(s) are selected. Based on the alignment between the sequence of the target protein and the template structure(s), the coordinates of the model are constructed for the structurally conserved regions of the model. Residues corresponding to insertions and deletions in the target–template alignment have to be modeled de novo without using template information. After applying limited molecular mechanics-based energy minimization to regularize the geometry of the models, model quality estimation methods are used to detect potential errors and inaccuracies. Since the initial implementation over a decade ago, all of these steps have been further developed and significantly improved. Comparative modeling critically depends on the detection of suitable templates from a library of structures. To this end, we created the SWISS-MODEL template library derived from the remediated protein data bank (PDB), which aims to remove some of the inconsistencies in the original depositions [30]. The SWISS-MODEL template library contains & 2009 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim S163 searchable sequence databases, profiles and structure quality annotation (e.g. experimental resolution, mean force potential scores) for each chain, excluding low quality entries (e.g. entries consisting only of Ca coordinates). The introduction of profilebased sequence comparison methods such as PSI-BLAST [31] and later HMM-HMM profile methods [10] has significantly improved the sensitivity and precision of template selection and alignment. Today, SWISS-MODEL is using a hierarchical approach to first identify target regions sharing high sequence similarity to their templates before applying more sensitive HMM-HMM profile methods to detect and align more distantly related templates. Possible templates are ranked according to their E-value, sequence identity to the target, resolution and structure quality [23]. Templates are progressively selected from this list, where new templates are added if they significantly increase the coverage of the target sequence, or add new information (e.g. templates spanning several domains help to infer relative domain orientation). Coordinate building in the SWISS-MODEL pipeline is performed by transferring template information from aligned template fragments in Cartesian space. Regions corresponding to insertions and deletions in the alignment are built using both backbone libraries and de novo loop-building procedures. First, an ensemble of fragments compatible with the flanking regions is constructed using constraint satisfaction programming. The best fragment is selected using a scoring scheme, which accounts for force field energy, steric hindrance and favorable interactions like hydrogen bond formation. In cases where constraint satisfaction programming does not give a satisfying solution and for loops above ten residues, a library derived from experimental structures is searched to find compatible fragments. The reconstruction of the amino acid side chains is based on the weighted positions of corresponding residues in the template structures. Starting with conserved residues, the model side chains are built by isosterically replacing template structure side chains. Feasible side chain conformations are selected from a backbonedependent rotamer library [32], which has been carefully constructed taking the quality of the source structures into account. A scoring function assessing favorable interactions (hydrogen bonds, disulfide bridges) and unfavorably close contacts is applied to select the most likely conformation. The stereochemistry of the resulting models is regularized using a short energy minimization procedure with the Gromos 96 force field [33]. Model quality estimation is performed using mean force potential approaches such as ANOLEA [34] and QMEAN [35]. 2.2 Automated modeling server The first automated protein modeling server was built before the advent of the Web and its highly interactive technology. In 1991, for the first time a modeling request could be submitted using a formatted E-mail. With the arrival of the World Wide Web, SWISS-MODEL was among the first bioinformatics services available on the Web as part www.electrophoresis-journal.com S164 N. Guex et al. of the ExPASy system [36]. The first Web-based user interface to SWISS-MODEL automatically created a correctly formatted E-mail and sent it to the modeling server [5]. In more recent years, this aging interface and communication mode was replaced by the SWISS-MODEL Workspace. Today, an interactive personalized web-based working environment [2, 22] allows several projects to be performed in parallel. In addition to structure modeling, SWISSMODEL Workspace offers different types of modelingrelated tasks such as domain assignment, template selection, prediction of secondary structure or disordered segments, or model quality estimation. In-page visualization using Java applets provides a fast preview of the overall fold of the model, while further detailed exploration and finetuning of the models is possible with Swiss-PdbViewer (see below). Currently, SWISS-MODEL Workspace receives 1500 interactive modeling requests every day. Electrophoresis 2009, 30, S162–S173 The SWISS-MODEL Repository web interface (Fig. 1) can be queried for specific proteins using database accession codes (e.g. UniProt AC and ID, GenBank, IPI, Refseq) or directly with the protein amino acid sequence, or fragments thereof, e.g. for a specific domain (http://swissmodel. expasy.org/repository/). The functional and domain annotation for the target protein is retrieved dynamically using web service protocols in real time to ensure that the latest annotation information is provided – even if the model has been built some time before. In order to allow for additional (not pre-computed) analyses on the models or on the underlying protein target sequence, we have implemented a tight link between the SWISS-MODEL Repository and the corresponding modules in the Workspace, which allow, e.g. for estimation of model quality using different global and local quality scores. 2.3 SWISS-MODEL Repository 2.4 Database interoperability and programmatic access In spring 1998, we subjected all entries of Swiss-Prot and trEMBL (equivalent to all protein sequences known at that time) to the SWISS-MODEL pipeline in a completely automated process called 3D-Crunch [26]. This followed several experiments that tested the concept of genome scale protein modeling on bacterial [37–39] and yeast genomes [38, 40]. This was the first time a large-scale data set was available to analyze the performance of an automated modeling pipeline. Based on 3D-Crunch and the early experiments using both confirmed and putative proteins derived from several bacterial and the yeast genomes enabled us to make a first analysis of the potential of protein modeling to close the sequence to structure gap. In the following months, this was instrumental in improving the server’s performance and provided the initial seed models for the SWISS-MODEL Repository [23–25, 39]. In later years [23–25], the SWISS-MODEL Repository has been developed as a relational database of annotated models, aiming at comprehensive and up-to-date coverage of selected model proteomes. As interactive model building can be relatively time-consuming, a comprehensive database of pre-computed models provides the opportunity to crosslink model information with other biological data resources, such as sequence databases or genome browsers, in real time. In the repository, model target sequences are uniquely identified by their md5 cryptographic hash of the full-length amino acid sequence. This mechanism allows the redundancy in protein sequence databases to be reduced, and facilitates cross-referencing with resources using different accession code systems [23]. Regular incremental updates include new target sequences from the UniProt database [41] and newly available template structures [42]. However, when major improvements to the underlying modeling algorithms have been made, full updates are required. The integration of different types of data, such as sequence annotations and 3-D structure information for large amounts of diverse data in heterogeneous formats, is still an open challenge in Bioinformatics. Protein models provide a natural bridge connecting sequence-based data resources, such as genome browsers and protein structure information. However, unlike experimental results that remain static once entered into the corresponding databases, model information is intrinsically dynamic as models need to be re-calculated when better template structures become available or improvements in modeling algorithms allow building better models for a given target sequence. We have therefore developed technologies capable of dynamic integration of sequence, experimental and model structure information. The Protein Model Portal [43] is a component of the Protein Structure Initiative (PSI) structural genomics knowledge base [44] and provides a single interface to access several million pre-built models from (i) the SWISS-MODEL Repository [23], (ii) ModBase [7], (iii) several large-scale PSI centers as well as (iv) to experimental structures from the PDB [42]. The ‘‘distributed annotation system’’ (DAS) [45] is a light-weight mechanism for web-service-based annotation exchange, which is widely used in genome browsers and other software frameworks for sequence annotation. The DAS concept relies on an XML specification that defines the communication between server and client. We have implemented a DAS-server for the SWISS-MODEL Repository based on the DAS/1 standard. Any DAS compatible annotation system can thereby extend its sequence annotation by 3-D model information using either UniProt accession codes or md5-hashes of the corresponding amino acid sequences as identifiers. The SWISS-MODEL Repository DAS service is accessible at http://swissmodel.expasy.org/ service/das/swissmodel/. & 2009 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.electrophoresis-journal.com Electrophoresis 2009, 30, S162–S173 General S165 Figure 1. Example for a SWISS-MODEL Repository entry for a model of UniProt entry A4C2S2, a protein of unknown function from Polaribacter irgensii 23-P. 2.5 Accuracy of automated models Possible applications of protein models depend largely on the quality of the models. Therefore, evaluation of model quality is a crucial step in homology modeling. During the 3D-Crunch experiment, a control set of 1200 models for proteins of known 3-D structure was generated, sharing 25–95% sequence identity between the template and the target. For the first time it was thereby possible to analyze the reliability of automated modeling on a large scale [26, 46]. SWISS-MODEL was the first comparative modeling service to join the EVA project for the continuous and automated assessment of modeling servers [29]. Between 2000 and 2006 (256 weekly releases of the PDB) the & 2009 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim sequences of 21 318 proteins representing 18 078 distinct protein target chains have been submitted to the SWISSMODEL. The resulting models have been evaluated based on the root mean square deviation (RMSD) of Ca atoms following global superposition of the model and the experimental target structures. This process allows estimating the overall expected accuracy as a function of the percentage of sequence identity between target and best template. As expected, model RMSD increases with decreasing alignment accuracy. All models and evaluation results are available on the EVA website [29]. While the assessment of a prediction method can provide an estimate of the average performance of a method, the differences in accuracy reached for different modeling www.electrophoresis-journal.com S166 N. Guex et al. targets are much larger than the differences between different methods for the same modeling target [20, 21, 29]. At the time of modeling, the accuracy of a model is unknown and cannot be measured directly as the ‘‘real’’ structure is unknown. Therefore, the accuracy of each model has to be estimated individually using model quality estimation methods [35, 47]. 2.6 Protein structure visualization and analysis with Swiss-PdbViewer The aim of enabling non-specialists to utilize structural data on standard desktop computers creates a particular challenge. Indeed, while providing such an environment creates the opportunity for scientists with no particular expertise in structural biology to obtain and visualize proteins models in a completely automated way, it also opens the door to overinterpretations as these models are certainly not devoid of errors and inaccuracies. Over the years we incorporated various validation tools on the SWISS-MODEL server (WhatIf [48], ANOLEA [34]) and coloring schemes in Swiss-PdbViewer (‘‘protein problems’’) or tools such as Ramachandran plots and mean force potential to highlight residues with abnormal topologies [49]. Further guidance on the proper utilization and limitation of those modeling tools has been disseminated through the publication of a chapter in Current Protocols [50] and through on-line courses and tutorials for students. In particular, our long-standing collaboration with Prof. Gale Rhodes (University of Southern Maine), who continuously maintained and updated his tutorial for each new release of Swiss-PdbViewer, has been key to the success of this application. 2.7 New Swiss-PdbViewer features The basic functionality of Swiss-PdbViewer [1, 49] has remained the same with the main purpose to serve as (i) a simple way to visualize align and compare structures and (ii) an interface with the SWISS-MODEL server. Since its inception, particular emphasis has been put on the user interface and interface reactivity. The interface and all algorithms are implemented in native C code and are very efficient. Windows are synchronized to provide visual feedback between the structure(s) displayed, the sequence alignment and the residues selected for display. SwissPdbViewer also provides an extended set of tools to manipulate sequence–structure alignments, selections, display and allow general protein analysis and validation. However, compared with the version described in the original article, the current version offers a more complete set of tools, among which the possibility to compute molecular surfaces, detect cavities and electrostatic potentials. Furthermore, it is now possible to perform homology modeling directly within the application. The original loop building approach that uses a loop library has been & 2009 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Electrophoresis 2009, 30, S162–S173 supplemented with a de novo loop-building method based on satisfaction of spatial restraints and the rotamer library has been updated to a backbone-dependent rotamer library [32]. Support for energy minimization is provided through an implementation of the GROMOS96 [33] force field. In the original version of Swiss-PdbViewer, the paradigm was to use several structures to facilitate the exploration of one given target-sequence. Therefore, it was not well suited to the exploration of structural differences in sequence families until structural models for each individual member of the family were obtained. Thus, we changed the paradigm and several sequences and/or structures can now be loaded simultaneously. With the increasing number of whole genome association studies, mapping the non-synonymous SNPs in the structural context has become a common task. However, it is relatively tedious to map SNPs to structures principally because of the necessity to convert genomic coordinates to structural coordinates. To facilitate this process, we introduced the possibility to load cDNA sequences in the software: the translated amino acid sequences and their predicted structural information remain associated with their nucleotide sequences throughout the process, even when alignments are altered. Since its first release, Swiss-PdbViewer has been tightly linked to SWISS-MODEL, and thus it has been extended to support the recently released SWISS-MODEL Workspace [2] through direct communication with the server. Modeling templates can be searched and retrieved from the server using BLAST, modeling requests can be submitted and then models can be retrieved directly from the server. Overall, the communication capabilities of Swiss-PdbViewer have been increased and it is now possible to import sequences, structures and compounds or to align sequences using MUSCLE [51] on a remote server (Fig. 2). Furthermore, the addition of a scripting language created the possibility for users to write additional commands for the user interface and/or to process sequences in batch mode. We added new ways to superpose structures. The popular ‘‘Magic Fit’’ command relies on the correct detection of a stretch of similar residues at sequence level to ‘‘seed’’ a structural fit. However, this will fail when no sequence similarity can reliably be identified for distantly related proteins. Therefore, we included a method for sequence-independent superposition using vectorized secondary structure information as seed to search for possible ways to superpose distantly related proteins. As this method relies on similarly organized secondary structure elements, it cannot be used to explore the conservation of finer-grained local similarities based on sparse residues such as catalytic ones. Thus, as a way to identify common local structural arrangements of residues, we also developed a method that allows searching for specific 3-D motifs in a given set of proteins. Briefly, for each position of the motif, it is possible to specify a list of desired amino acids, the secondary structure, the minimum and maximum backbone www.electrophoresis-journal.com Electrophoresis 2009, 30, S162–S173 General S167 Figure 2. Swiss-PdbViewer – a tool for protein structure modeling, visualization and analysis. Structural data can be retrieved directly from the PDB [42] using accession numbers or simple text queries. When available, electron density maps can be retrieved from the Uppsala Electron-Density Server [92]. Small molecular compounds can be retrieved from PubChem [93], and energy minimized with the Dundee PRODRG2 server [94]. cDNA sequences can be retrieved from GenBank [93], whereas amino acid sequences can be imported from ExPASy [36] or GenBank. Identification of homologous sequences or structures is achieved using the BLAST service of the SWISSMODEL Workspace [2, 3]. Protein structures can be searched for the presence of user-defined 3-D motifs, and sequences can be aligned using built-in tools or external tools, such as MUSCLE [51] running at the Vital-IT (http://www.vital-it.ch) Center for high-performance computing of the Swiss Institute of Bioinformatics. Protein modeling requests can be directly submitted to SWISS-MODEL and results re-imported into the workspace for further refinement. separation between residues and a set of additional distance constraints between any pairs of atoms. Those 3-D motifs can be generated directly from within Swiss-PdbViewer and then submitted to the Vital-IT cluster (http://spdbv.vitalit.ch/) to search a non-redundant set of structures. Results can then be retrieved directly in the interface for visualization, superposition and analysis. 3 Structural coverage – the structure gap Since DNA sequencing data is outgrowing structure determination efforts at exponential rates (Fig. 3), protein structure modeling will be the only available method to generate accurate structural models for the vast majority of proteins. Recent work by Levitt (personal communication) has confirmed the notion that, although the number of multi-domain architecture families grows rapidly and at the same rate as the number of newly sequenced genes [52], almost all of this complexity arises from the arrangement of known single domains within a chain, particularly for eukaryotes. For model organisms, humans and known pathogens, the repertoire of structural domains is finite by definition. Comprehensive coverage of the complete protein domain space by representative structures appears as a reachable goal in the mid-term perspective, and has been set as one of the scientific aims of the PSI structural genomics efforts [53–55]. Structural genomics has considerably increased the structural coverage of protein sequence space, significantly contributed to describe novel structural & 2009 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim families and often provided the first representatives for functional groups that had not been structurally characterized before [56–58]. As a consequence, for many organisms, availability of structure information has significantly changed over the last decade. The quality of a protein model reflects the deviation of the template structure relative to the actual structure of the target as well as limitations of sequence comparison and alignment methods. It is generally accepted that the percentage of sequence identity between target and template allows for a reasonable first estimate of the model quality, and that the core Ca atoms of protein models sharing 50% sequence identity with their templates will deviate by approximately 1.0 Å RMSD from their experimentally elucidated structures for regions of proteins not subject to molecular rearrangements upon binding to an other molecular entity. Taking Escherichia coli as an example, during the 3D-Crunch experiment in 1998 [26] only a very small fraction of sequence entries were amenable to protein modeling using templates sharing more than 30% sequence identity with the target protein, resulting in a coverage of 15% of the target sequences. Today, profile-based methods for sequence comparison and alignments allow extending target–template alignments to more remotely related templates, while at the meantime, experimental template structures are available for many more protein families. We have computed a retrospective estimate of structural coverage of the E. coli proteome (Fig. 4). For each of the 4173 sequences in the complete E. coli proteome obtained from UniProt [59], a PSI-BLAST [31] profile was calculated using www.electrophoresis-journal.com S168 N. Guex et al. Electrophoresis 2009, 30, S162–S173 Figure 3. Number of entries in public sequence and structure databases. Although the number of entries in the PDB [42] is growing exponentially, sequence databases [59, 95] are growing at a much higher rate – widening the structure knowledge gap. Figure 4. Retrospective analysis of structural coverage of E. Coli over time. We have analyzed retrospectively which structure information – either experimental structures or models of various levels of target–template sequence identity – was available at a given point in time for the residues in the proteome of the model organism E. Coli. a non-redundant protein sequence database (current as of December 2008). This profile was used to search the sequences of experimentally determined structures deposited in the PDB [42] as of December 2008 for suitable templates. For each year starting in 1972, we recorded the highest sequence identity to the closest template for each of the 1 358 278 residues in the E. coli proteome. Figure 4 shows the steady increase in structural knowledge. Today, for about 50% of all E. coli protein sequences, a model can be built using a template sharing at least 30% sequence identity with the target sequence, covering approx. 23% of all residues – compared with ca. 11% in 1998. This observation may lend support to the early prediction (1990s) that & 2009 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim by 2020 we will be able to generate at least one reasonable quality protein structure model for most proteins of the major model organisms. 4 Applications of models in biomedical research There is a wide range of applications for comparative models [60, 61], such as designing experiments for sitedirected mutagenesis or protein engineering, predicting ligand binding sites and docking small molecules in structure-based drug discovery [62, 63], studying the effect www.electrophoresis-journal.com General Electrophoresis 2009, 30, S162–S173 of mutations and SNPs [64, 65], phasing X-ray diffraction data in molecular replacement [16, 66], as well as protein engineering and design. Hereafter, we provide only just a few examples of applications of models mainly built with our modeling environment. 4.1 Functional analysis of proteins Insights into the 3-D structure of a protein can be of great assistance in assigning its molecular function, while its biological role and localization are much more difficult to relate to its structure. Predicting the molecular function of a protein on the sole basis of a 3-D structure is however, in itself, a very challenging task. Indeed, if the active site has been observed previously [41, 67, 68] or, if the protein has been co-crystallized with a cognate ligand, we have a better chance of succeeding. In our hands, we were able to verify and confirm the assignment of several Caenorhabditis elegans insulin-like genes using low-accuracy models [69]. Similarly, the trimeric nature of the CD40L was first proposed based on a low-accuracy model where the target and the TNF-a template share less than 26% sequence identity [70]. Generally speaking, the study and comparison of 3-D structural features, as opposed to the study of linear sequence alignments, allows to reason on how proteins might interact with other molecular entities and permits to map functional epitopes [71, 72]. Similarly, the combination of experimental mutagenesis data with biophysical measurements allows to build models that fit the data and that can in turn be used to propose new hypotheses [73, 74]. 4.2 Studying the impact of mutations and SNPs on protein function Diseases, or less-severe phenotypic variations, which can be unequivocally assigned to single point mutations, provide a good framework to understand the molecular function and biological role of a protein. Therefore, protein models can be readily applied to interpret the impact mutations can have on the overall structure and, thus, the function of a protein [64, 65]. It is through ‘‘visual inspection’’ associated with a good knowledge and understanding of the rules underlying protein structure that the most useful hypothesis regarding the reasons for mutant malfunction can be made (for concrete examples see [45, 64, 65, 75, 76]). There is an increasingly large body of data on naturally occurring mutations (over 43 000 human sequence variants are reported in Swiss-Prot) and SNP, of which a sizeable proportion will alter the translated protein sequences. Interpreting the potential functional effects of these mutants will be crucial to elucidate the molecular basis of human diseases. The ability to map mutations onto structures or models is also particularly relevant in the context of infectious diseases where agents such as HIV and Influenzae have a high rate of mutations and for which a wealth of sequences data is collected. & 2009 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim S169 4.3 Planning site-directed mutagenesis experiments One definite advantage of 3-D structure and models in functional protein analysis is that they provide a solid base for site-directed mutagenesis experiments aimed at the elucidation of the molecular function of proteins. Even medium and low-accuracy models can be used as frameworks for experiment planning to guide the selection of key mutants designed to test functional hypothesis [77] or to modulate a protein’s biophysical properties [78]. These experimentally generated mutants complement the naturally occurring ones mentioned in Section 4.2, and together with the mapping of other facts such as sites of post-translational modifications, greatly contribute to the elucidation of protein function [79]. For instance, the comparative models that were generated for the Fas ligand, its protein family members [5] and receptor illustrate how models can be applied to (i) understand the impact of naturally occurring mutations [80, 81], (ii) experimental mutagenesis and (iii) interpret and map other known features such as glycosylations to understand the finer molecular function of a protein. 4.4 Molecular replacement Solving the phase problem in crystallography experiments is a crucial step towards reconstructing atomic structures that optimally fit the experimental data. As phases cannot be measured directly, they have to be obtained indirectly using experimental methods such as heavy-atom isomorphous replacement, anomalous scattering or by molecular replacement [16, 61, 66, 82]. The first application of a model built with SWISS-MODEL in molecular replacement was performed by Karpusas and co-workers [83] to obtain a 2 Å resolution X-ray structure of the human CD40 ligand (PDB entry 1aly). The authors used our published murine homology model [70] (PDB entry 1cda) to build a human model of CD40L and then applied the latter ‘‘model of a model’’ to the molecular replacement approach. A more recent example can be seen here [13]. 5 Concluding remarks Over the last 15 years, we have witnessed the transition from a situation, where structural information was available only in rare cases, to today’s context, where for many model organisms either experimental structures or models are available for a large part of proteins. Protein modeling today is well established and routinely used in various biomedical research applications. However, there are still major challenges ahead: (i) Template coverage: Systematic international structural genomics projects have contributed significantly to the increase in novel structural information in the PDB in www.electrophoresis-journal.com S170 (ii) (iii) (iv) (v) (vi) N. Guex et al. recent years [56]. However, continued effort in this direction is required to map out the remaining uncharted regions of the protein universe. Especially membrane proteins will require significant attention. Depending on the biomedical interest in specific protein families, different levels of sampling granularity may be appropriate. Since large protein families tend to be functionally more diverse, finer grained sampling will be required to elucidate functional differences. Modeling complexes from individual domains: Often, the structures of individual domains are experimentally better tractable than multi-domain proteins or complexes. Computational modeling of the relative orientation of the individual domain components is therefore an important goal. Remarkable progress has been made in this endeavor in recent years as documented in the community wide experiment on the comparative evaluation of protein–protein docking for structure prediction experiments [84]. With the increasing amount of complete genome data becoming available, approaches based on mutual information analysis are becoming increasingly powerful [85–87]. Model refinement: Comparative modeling methods are based on the basic assumption that structure information for the target protein can be inferred from the template structure for evolutionary-related proteins. However, with increasing evolutionary distance, considerable structural differences between target and template will occur. Recently, significant progress has been reported for model refinement using Monte Carlo sampling approaches initially developed for ab initio modeling [16]. Modeling small induced differences: While evolutionary inference often allows modeling conserved properties of a protein such as its overall fold, it is often desirable to predict small, functionally divergent features, such as variations in substrate specificity or ligand affinity within a family of proteins, structural effects of mutations or non-synonymous SNPs, or other functional properties. Also, some regions in the apostructure of a protein may not correspond to the conformation it adopts when binding a partner, e.g. activation loops of kinases. Model quality estimation: Possible applications of models ultimately depend on their accuracy. However, at the time of modeling, the accuracy of a model is unknown and has to be predicted. Several approaches for estimating the expected accuracy of models have been developed [35, 47, 88–90]. However, there is still a long way to go until the suitability of a model for a certain application can be predicted reliably. Visualization of uncertainty and precision: Experimental structures as well as models have limitations in their precision, which may even vary for different regions within the same structure. However, many graphical & 2009 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Electrophoresis 2009, 30, S162–S173 molecular representations suggest invariable atomic precision throughout the structure, and do not visualize the uncertainty of the underlying structure data. With the increase in available low-resolution experimental data and composite experimental computational models, the question of visualization of uncertainty will become more urgent. (vii) Integrative/hybrid modeling: Ultimately, all structure determination methods are ‘‘hybrid’’ methods as they rely – to different extents – on both experimental data and computational components such as molecular force fields. While many low-resolution experimental techniques do not produce sufficient data to directly derive atomic precision structures, they still provide valuable information about certain aspects of the macromolecular assembly. By combining various complementary sources of information, both experimental and computational, it is possible to derive an integrative model that would not have been possible with any of the individual components alone, as has been impressively demonstrated for the NPC nuclear pore complex [91]. In many ways, the challenges and limitations of comparative modeling that existed 12 years ago are still valid today. However, protein structure modeling has made the transition from a niche approach for anecdotal examples to a mainstream technology applicable to a majority of proteins. The availability of whole genome data does not only allow for better evolutionary inference and improved sequence alignments; in combination with automated structure modeling it opens the possibility to compare proteins and to analyze functional differences in their structural context. Now is the time to change our mental picture of a protein as a ‘‘linear string of letters’’ to a ‘‘3-D structure in the functional context of its evolutionary relatives’’. We hope that the SWISS-MODEL and Swiss-PdbViewer suite of tools will contribute to make that change. We are particularly thankful to Timothy N. C. Wells (GlaxoWellcome, now at Medicines for Malaria Venture), Jonathan C. K. Knowles (GlaxoWellcome, now at Roche), and Allan Baxter (GlaxoSmithKline) who have established the necessary environment in the beginning of this project, and to Michael W. Lutz and David B. Searls for their support. Furthermore, we are deeply indebted to Stanley K. Burt, Robert W. Lebherz and Jack R. Collins as well as the entire staff at the Advanced Biomedical Computing Center at NCI-Frederick (Frederick, MD, USA) for their support in operating the US mirror of the SWISS-MODEL server. We are extremely grateful to Gale Rhodes of the University of Southern Maine for coordinating the active Swiss-PdbViewer user community and his outstanding commitment to teaching in structural biology. We thank Alexander Diemand for his contributions to the SwissPdbViewer Linux code. We are deeply indebted to Konstatin Arnold, Jürgen Kopp, Rainer Pöhlmann, Michael Podvinec, www.electrophoresis-journal.com Electrophoresis 2009, 30, S162–S173 Lorenza Bordoli and Florian Kiefer for their many contributions to the development and daily operations of the SWISS-MODEL Server, Repository and Workspace. We gratefully acknowledge financial support by GlaxoSmithKline, Novartis, the SNF Swiss National Science Foundation, the Biozentrum of the University of Basel and the Swiss Institute of Bioinformatics. The authors have declared no conflict of interest. 6 References [1] Guex, N., Peitsch, M. C., Electrophoresis 1997, 18, 2714–2723. [2] Arnold, K., Bordoli, L., Kopp, J., Schwede, T., Bioinformatics 2006, 22, 195–201. [3] Schwede, T., Kopp, J., Guex, N., Peitsch, M. C., Nucleic Acids Res. 2003, 31, 3381–3385. [4] Peitsch, M. C., Biochem. Soc. Trans. 1996, 24, 274–279. [5] Peitsch, M. C., Biotechnology 1995, 13, 658–660. [6] Srinivasan, N., Blundell, T. L., Protein Eng. 1993, 6, 501–512. [7] Pieper, U., Eswar, N., Webb, B. M., Eramian, D., Kelly, L., Barkan, D. T., Carter, H. et al., Nucleic Acids Res. 2009, 37, D347–D354. [8] Sali, A., Blundell, T. L., J. Mol. Biol. 1993, 234, 779–815. [9] Karplus, K., Barrett, C., Hughey, R., Bioinformatics 1998, 14, 846–856. General S171 [25] Kopp, J., Schwede, T., Nucleic Acids Res. 2004, 32, D230–D234. [26] Peitsch, M. C., Schwede, T., Guex, N., Pharmacogenomics 2000, 1, 257–266. [27] Marti-Renom, M. A., Stuart, A. C., Fiser, A., Sanchez, R., Melo, F., Sali, A., Annu. Rev. Biophys. Biomol. Struct. 2000, 29, 291–325. [28] Rychlewski, L., Fischer, D., Protein Sci. 2005, 14, 240–245. [29] Koh, I. Y., Eyrich, V. A., Marti-Renom, M. A., Przybylski, D., Madhusudhan, M. S., Eswar, N., Grana, O. et al., Nucleic Acids Res. 2003, 31, 3311–3315. [30] Henrick, K., Feng, Z., Bluhm, W. F., Dimitropoulos, D., Doreleijers, J. F., Dutta, S., Flippen-Anderson, J. L. et al., Nucleic Acids Res. 2008, 36, D426–D433. [31] Altschul, S. F., Madden, T. L., Schaffer, A. A., Zhang, J., Zhang, Z., Miller, W., Lipman, D. J., Nucleic Acids Res. 1997, 25, 3389–3402. [32] Lovell, S. C., Word, J. M., Richardson, J. S., Richardson, D. C., Proteins 2000, 40, 389–408. [33] van Gunsteren, W. F., Billeter, S. R., Eising, A., Hünenberger, P. H., Krüger, P., Mark, A. E., Scott, W. R. P. et al., Biomolecular Simulations: The GROMOS96 Manual and User Guide, VdF Hochschulverlag ETHZ, Z ü rich 1996. [34] Melo, F., Feytmans, E., J. Mol. Biol. 1998, 277, 1141–1152. [35] Benkert, P., Tosatto, S. C., Schomburg, D., Proteins 2008, 71, 261–277. [10] Soding, J., Bioinformatics 2005, 21, 951–960. [36] Appel, R. D., Bairoch, A., Hochstrasser, D. F., Trends Biochem. Sci. 1994, 19, 258–260. [11] Fernandez-Fuentes, N., Madrid-Aliste, C. J., Rai, B. K., Fajardo, J. E., Fiser, A., Nucleic Acids Res. 2007, 35, W363–W368. [37] Peitsch, M. C., Wilkins, M. R., Tonella, L., Sanchez, J. C., Appel, R. D., Hochstrasser, D. F., Electrophoresis 1997, 18, 498–501. [12] Joo, K., Lee, J., Lee, S., Seo, J. H., Lee, S. J., Lee, J., Proteins 2007, 69, 83–89. [38] Peitsch, M. C., Guex, N., in: Wilkins, M. R., Williams, K. L., Appel, R. O., Hochstrasser, D. F. (Eds.), Proteome Research: New Frontiers in Functional Genomics, Springer 1997, pp. 177–186. [13] Zhou, H., Pandit, S. B., Lee, S. Y., Borreguero, J., Chen, H., Wroblewska, L., Skolnick, J., Proteins 2007, 69, 90–97. [14] Zhang, Y., Proteins 2007, 69, 108–117. [15] Chivian, D., Baker, D., Nucleic Acids Res. 2006, 34, e112. [16] Qian, B., Raman, S., Das, R., Bradley, P., McCoy, A. J., Read, R. J., Baker, D., Nature 2007, 450, 259–264. [17] Sivasubramanian, A., Sircar, A., Chaudhury, S., Gray, J. J., Proteins 2009, 74, 497–514. [18] Marcatili, P., Rosi, A., Tramontano, A., Bioinformatics 2008, 24, 1953–1954. [19] Fox, J. A., McMillan, S., Ouellette, B. F., Nucleic Acids Res. 2006, 34, W3–W5. [20] Battey, J. N., Kopp, J., Bordoli, L., Read, R. J., Clarke, N. D., Schwede, T., Proteins 2007, 69, 68–82. [21] Kopp, J., Bordoli, L., Battey, J. N., Kiefer, F., Schwede, T., Proteins 2007, 69, 38–56. [39] Peitsch, M. C., Proc. Int. Conf. Intell. Syst. Mol. Biol. 1997, 5, 234–236. [40] Sanchez, R., Sali, A., Proc. Natl. Acad. Sci. USA 1998, 95, 13597–13602. [41] Bairoch, A., Apweiler, R., Wu, C. H., Barker, W. C., Boeckmann, B., Ferro, S., Gasteiger, E. et al., Nucleic Acids Res. 2005, 33, D154–D159. [42] Berman, H., Henrick, K., Nakamura, H., Markley, J. L., Nucleic Acids Res. 2007, 35, D301–D303. [43] Arnold, K., Kiefer, F., Kopp, J., Battey, J. N., Podvinec, M., Westbrook, J. D., Berman, H. M. et al., J. Struct. Funct. Genomics 2009, 10, 1–8. [44] Berman, H. M., Westbrook, J. D., Gabanyi, M. J., Tao, W., Shah, R., Kouranov, A., Schwede, T. et al., Nucleic Acids Res. 2009, 37, D365–D368. [22] Bordoli, L., Kiefer, F., Arnold, K., Benkert, P., Battey, J., Schwede, T., Nat. Protoc. 2009, 4, 1–13. [45] Jenkinson, A. M., Albrecht, M., Birney, E., Blankenburg, H., Down, T., Finn, R. D., Hermjakob, H. et al., BMC Bioinformatics 2008, 9, S3. [23] Kiefer, F., Arnold, K., Kunzli, M., Bordoli, L., Schwede, T., Nucleic Acids Res. 2009, 37, D387–D392. [46] Schwede, T., Diemand, A., Guex, N., Peitsch, M. C., Res. Microbiol. 2000, 151, 107–112. [24] Kopp, J., Schwede, T., Nucleic Acids Res. 2006, 34, D315–D318. [47] Cozzetto, D., Kryshtafovych, A., Ceriani, M., Tramontano, A., Proteins 2007, 69, 175–183. & 2009 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.electrophoresis-journal.com S172 N. Guex et al. [48] Hooft, R. W., Vriend, G., Sander, C., Abola, E. E., Nature 1996, 381, 272. [49] Guex, N., Diemand, A., Peitsch, M. C., Trends Biochem. Sci. 1999, 24, 364–367. [50] Guex, N., Schwede, T., Peitsch, M. C., Curr. Protoc. Protein Sci. 2001, Chapter 2, Unit 2 8. [51] Edgar, R. C., Nucleic Acids Res. 2004, 32, 1792–1797. [52] Yooseph, S., Sutton, G., Rusch, D. B., Halpern, A. L., Williamson, S. J., Remington, K., Eisen, J. A. et al., PLoS Biol. 2007, 5, e16. [53] Burley, S. K., Nat. Struct. Biol. 2000, 7, 932–934. [54] Kim, S. H., Curr. Opin. Struct. Biol. 2000, 10, 380–383. [55] Sanchez, R., Pieper, U., Melo, F., Eswar, N., MartiRenom, M. A., Madhusudhan, M. S., Mirkovic, N. et al., Nat. Struct. Biol. 2000, 7, 986–990. Electrophoresis 2009, 30, S162–S173 [73] Scheib, H., McLay, I., Guex, N., Clare, J. J., Blaney, F. E., Dale, T. J., Tate, S. N. et al., J. Mol. Model 2006, 12, 813–822. [74] Sanders, R. W., Hsu, S. T., van Anken, E., Liscaljet, I. M., Dankers, M., Bontjer, I., Land, A. et al., Mol. Biol. Cell 2008, 19, 4707–4716. [75] O’Hara, F. P., Guex, N., Word, J. M., Miller, L. A., Becker, J. A., Walsh, S. L., Scangarella, N. E. et al., J. Infect. Dis. 2008, 197, 187–194. [76] Pajerowska-Mukhtar, K. M., Mukhtar, M. S., Guex, N., Halim, V. A., Rosahl, S., Somssich, I. E., Gebhardt, C., Planta 2008, 228, 293–306. [77] Junne, T., Schwede, T., Goder, V., Spiess, M., Mol. Biol. Cell 2006, 17, 4063–4068. [78] Schwede, T. F., Badeker, M., Langer, M., Retey, J., Schulz, G. E., Protein Eng. 1999, 12, 151–153. [56] Levitt, M., Proc. Natl. Acad. Sci. USA 2007, 104, 3183–3188. [79] Peitsch, M. C., Bioinformatics 2002, 18, 934–938. [57] Nair, R., Liu, J., Soong, T. T., Acton, T. B., Everett, J. K., Kouranov, A., Fiser, A. et al., J. Struct. Funct. Genomics 2009, 10, 181–191. [80] Hahne, M., Peitsch, M. C., Irmler, M., Schroter, M., Lowin, B., Rousseau, M., Bron, C. et al., Int. Immunol. 1995, 7, 1381–1386. [58] Redfern, O. C., Dessailly, B., Orengo, C. A., Curr. Opin. Struct. Biol. 2008, 18, 394–402. [81] Notarangelo, L. D., Peitsch, M. C., Immunol. Today 1996, 17, 511–516. [59] UniProtConsortium, Nucleic Acids Res. 2009, 37, D169–D174. [60] Schwede, T., Sali, A., Honig, B., Levitt, M., Berman, H. M., Jones, D., Brenner, S. E. et al., 2009, 17, 151–159. [61] Tramontano, A., in: Schwede, T., Peitsch, M. C. (Eds.), Computational Structural Biology, World Scientific Publishing, Singapore 2008. [82] Stirnimann, C. U., Grütter, M. G., in: Schwede, T., Peitsch, M. C. (Eds.), Computational Structural Biology, World Scientific Publishing, Singapore 2008. [83] Karpusas, M., Hsu, Y. M., Wang, J. H., Thompson, J., Lederman, S., Chess, L., Thomas, D., Structure 1995, 3, 1031–1039. [84] Janin, J., Wodak, S., Structure 2007, 15, 755–759. [62] Hillisch, A., Pineda, L. F., Hilgenfeld, R., Drug Discov. Today 2004, 9, 659–669. [85] Gobel, U., Sander, C., Schneider, R., Valencia, A., Proteins 1994, 18, 309–317. [63] Vangrevelinghe, E., Zimmermann, K., Schoepfer, J., Portmann, R., Fabbro, D., Furet, P., J. Med. Chem. 2003, 46, 2656–2662. [86] Burger, L., van Nimwegen, E., Mol. Syst. Biol. 2008, 4, 165. [64] Feyfant, E., Sali, A., Fiser, A., Protein Sci. 2007, 16, 2030–2041. [65] Wattenhofer, M., Di Iorio, M. V., Rabionet, R., Dougherty, L., Pampanos, A., Schwede, T., Montserrat-Sentis, B. et al., J. Mol. Med. 2002, 80, 124–131. [87] Weigt, M., White, R. A., Szurmant, H., Hoch, J. A., Hwa, T., Proc. Natl. Acad. Sci. USA 2009, 106, 67–72. [88] Eramian, D., Eswar, N., Shen, M. Y., Sali, A., Protein Sci. 2008, 17, 1881–1893. [89] Paluszewski, M., Karplus, K., Proteins 2009, 75, 540–549. [66] Raimondo, D., Giorgetti, A., Giorgetti, A., Bosi, S., Tramontano, A., Proteins 2007, 66, 689–696. [90] Wallner, B., Elofsson, A., Proteins 2007, 69, 184–193. [67] Bartlett, G. J., Porter, C. T., Borkakoti, N., Thornton, J. M., J. Mol. Biol. 2002, 324, 105–121. [91] Alber, F., Dokudovskaya, S., Veenhoff, L. M., Zhang, W., Kipper, J., Devos, D., Suprapto, A. et al., Nature 2007, 450, 683–694. [68] Laskowski, R. A., Thornton, J. M., Humblet, C., Singh, J., J. Mol. Biol. 1996, 259, 175–201. [69] Duret, L., Guex, N., Peitsch, M. C., Bairoch, A., Genome Res. 1998, 8, 348–353. [70] Peitsch, M. C., Jongeneel, C. V., Int. Immunol. 1993, 5, 233–238. [92] Kleywegt, G. J., Harris, M. R., Zou, J. Y., Taylor, T. C., Wahlby, A., Jones, T. A., Acta Crystallogr. D Biol. Crystallogr. 2004, 60, 2240–2249. [93] Sayers, E. W., Barrett, T., Benson, D. A., Bryant, S. H., Canese, K., Chetvernin, V., Church, D. M. et al., Nucleic Acids Res. 2009, 37, D5–D15. [71] Wan, Y., Zheng, Y. Z., Harris, J. M., Brown, R., Waters, M. J., Mol. Endocrinol. 2003, 17, 2240–2250. [94] Schuttelkopf, A. W., van Aalten, D. M., Acta Crystallogr. D Biol. Crystallogr. 2004, 60, 1355–1363. [72] Guimaraes, A. J., Hamilton, A. J., de, M. G. H. L., Nosanchuk, J. D., Zancope-Oliveira, R. M., PLoS ONE 2008, 3, e3449. [95] Boutet, E., Lieberherr, D., Tognolli, M., Schneider, M., Bairoch, A., Methods Mol. Biol. 2007, 406, 89–112. & 2009 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.electrophoresis-journal.com General Electrophoresis 2009, 30, S162–S173 S173 Dr. Nicolas Guex studied plant biology and biochemistry at the University of Lausanne. In the early nineties, during the course of his Ph.D., he pioneered the use of molecular biology in the Institute of Plant Biology, isolated and sequenced two genes of the glyoxylate cycle, built a molecular model for one of them and initiated the development of Swiss-PdbViewer. He obtained his Ph.D. in 1995 and joined the group of Dr. Manuel Peitsch at GlaxoWellcome, where he contributed to the development of SWISS-MODEL. From 1996-2002, he also taught postgrade Structural Biology modules at the University of Geneva, EPFL and for the Swiss Institute of Bioinformatics. During his 12 years at GlaxoSmithKline, he occupied positions of increasing responsibilities, led a group specialized in Evolutionary and Structural Bioinformatics and contributed to several drug discovery research programs. In 2008 he returned to the Swiss Institute of Bioinformatics, in the Vital-IT team, where he continues the development of Swiss-PdbViewer, contributes his bioinformatics and biology expertise to research groups and develops specialized software to support research projects that necessitate high performance computing. Nicolas has been developing and optimizing computer software since 1979. Manuel C. Peitsch is Director Computational Sciences and Bioinformatics with Philip Morris International Research and Development, which he joined from the Novartis Institutes of BioMedical Research (NIBR) where he successively led Informatics & Knowledge Management and later Systems Biology. Prior to joining Novartis in 2001, Manuel held several leadership positions in bioinformatics, scientific computing and knowledge management with GlaxoWellcome and GlaxoSmithKline. Manuel obtained his Ph.D. in biochemistry from the University of Lausanne (Switzerland) and spent his post-doctoral years at the Laboratory of Mathematical Biology of the National Cancer Institute in Frederick MD and at the University of Lausanne. Since 2002 he is Professor for Bioinformatics at the University of Basel. Manuel is a co-founder of several initiatives, including two start-up companies and the Swiss Institute of Bioinformatics. He is a member of the Swiss National Research Council, the Chairman of the Executive Board of the Swiss Institute of Bioinformatics and an active scientific advisor to several academic and commercial entities. Torsten Schwede obtained his Ph.D. in chemistry from the Albert-Ludwigs University of Freiburg i.Br. (Germany) for his studies in the field of protein X-ray crystallography. As a postdoctoral fellow at GlaxoWellcome in Geneva, and later as research scientist at GSK R&D, his research interests focused on computational structural biology. In the group of Manuel Peitsch, he took the responsibility for the further development of the SWISS-MODEL server. Since 2001 he is professor for Structural Bioinformatics at the Biozentrum of the University of Basel and group leader at the Swiss Institute of Bioinformatics (SIB). His research group is devoted to molecular modeling of protein structures and their functional properties. Central to this aspect is the development of fully automated expert systems, such as the SWISS-MODEL server for comparative protein structure modeling, and the Protein Model Portal of the PSI Structural Genomics Knowledgebase. Applied aspects such as simulation of protein ligand interactions and structure based protein engineering complement his group’s research activities. Torsten is chairman of the Biozentrums research core program "Computational and Systems Biology". In addition he serves on several boards, including the executive board of the Swiss Institute of Bioinformatics and the scientific advisory board of the PDBe (EMBL-EBI). & 2009 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.electrophoresis-journal.com