UCLA Genotyping and Sequencing Core Resources
Last updated 07/01/14
The UCLA Genotyping and Sequencing Core Facility offers a wide range of core genetic techniques to research groups on the UCLA campus and in the broader scientific community. Services include DNA sequencing (both # Sanger Sequencing and Next Generation/Whole Genome Sequencing) genotyping of SNP and microsatellite markers, both in genome scans and in fine-mapping studies; mutation detection; quantitative PCR for gene expression analysis; single-cell gene expression; methylation analysis; copy number variation analysis; database and bioinformatics services; and statistical genetics. The Core also provides training and trouble-shooting for students and postdoctoral fellows, and offers frequent seminars on emerging genetic technologies.
Dr Papp is an Professor in the Department of Human Genetics at UCLA, Director of the UCLA Genotyping and Sequencing Core since 2000, and an active member of the Human Genetics Bioinformatics Core. She has spent the last 15 years of her career working in genetic core facilities. Dr. Papp has extensive experience directing both large and small genomics projects, and also developing genomic databases and error checking algorithms to improve the quality of genotyping data. Her databases have been used at the Wellcome Trust Centre for Human Genetics at the University of Oxford, Saint Bartholomew's Hospital in London, the Pasteur Institute of France, and the French National Center for Genotyping. Before coming to UCLA Dr. Papp worked in large genome centers in France - The National Institute of Genotyping in Paris - and England - The Wellcome Trust Center for Human Genetics at the University of Oxford. She currently runs data cores for three large, multi-site research consortia.
The Core employs three Staff Research Associates, and one programmer to operate a wide range of analytic equipment.
Major Equipment: Laboratory
- MiSeq Personal Sequencer
- Roche Genome Sequencer FLX Titanium (454)
- Roche GS Junior (454)
- Roche LightCycler 480
- QuantStudio 3D Digital PCR Instrument
- BioMark MX/HX Genetic Analysis System
- ABI 3730 capillary DNA sequencers
- ABI TaqMan 7900 Real-time PCR instrument
- PSQ96 Pyrosequencer
- GeneMachine Hydroshear
- Qiagen TissueLyser
- Agilent Bioanalyzer
- Multimode Microplate Reader Spectrophotometer
- Beckman Coulter Counter
- Five ABI dual block PCR thermalcyclers
- Sequencing Cluster Datarig
The Core consists of 1944 square feet of wet laboratory fully equipped for standard molecular genetic techniques, and an additional 122 square feet of office space. In addition to the standard equipment, the lab is equipped for automated high-throughput SNP and microsatellite genotyping, sequencing and mutation detection. High throughput sample preparation is performed on a Biomek NX 8-channel robot, a Biomek FX 96-channel robot, and a Robbins Hydra Microdispenser robot. These instruments are used for DNA dilutions, reaction set-up, and reaction pooling. DNA amplification is performed using 96 and 384-well PCR plates on ABI GeneAmp PCR System 9700 thermalcyclers. Polymorphism detection is conducted using a Roche LightCycler 480. For SNP genotyping the Core has platforms appropriate to high, medium, and low throughput projects. For high throughput, the SNPlex assay is performed on two ABI 3730 capillary instruments; for medium throughput, the ABI TaqMan 7900 Real-time PCR instrument or the Beckman CEQ 8000 capillary instrument can be used; the PSQ96 Pyrosequencer is employed for low throughput projects. The ABI TaqMan 7900 Real-time, Quantitative PCR platform can also be used for gene expression analysis, mutation screening, and determining dissociation curves. Sequencing is performed primarily on the ABI 3730 capillary sequencer, which gives the longest reads and highest data quality. For data interpretation and analysis, the lab is equipped with 19 personal computers running under Windows, Macintosh and Linux operating systems.
The Sequencing Unit of the Core has an ABI 3730XL 96 capillary sequencer available for running Sanger sequencing. The normal sequencing turn-around time is 12-24 hours.
The facility offers two sequencing services: 1. Full Service, in which the customer provides the template as plasmid, PCR product, or BAC, and Core personnel carry out the sequencing reactions and run the sample. 2. Ready-to-Run, where the customer performs the sequencing reaction, and brings in the reacted sample for running on the capillary sequencer. All samples entering the Core laboratory are logged into the Core's sample-tracking database (IGDB) and given a unique tracking ID. On completion of sequencing and analysis, data is immediately available on Web-Seq, the Core's password protected, web-based file retrieval system. The Core's scientists are also available for trouble-shooting and technical support at any stage of the sequencing process.
Next Generation Sequencing
The Core operates both short-read and long-read instruments.
The short-read Illumina MiSeq enables researchers to go from sample prep to data analysis in as little as eight hours.
Applications possible on this instrument include:
- highly multiplexed PCR amplicon sequencing
- TruSeq Amplicon- Cancer Panel
- targeted resequencing
- Small genome resequencing
- De novo sequencing
- small RNA sequencing
- Library QC for HiSeq
- 16s metagenomics studies
The long-read Roche Genome Sequencer FLX (formerly 454) with Titanium Series reagents provides sequence read lengths up to 400 base pairs, and 400 to 600 million base pairs per instrument run.
Applications possible on this instrument include
- De novo Sequencing
- Comparative Genomics
- Whole Genome Sequencing
- Amplicon Resequencing
- Transcriptome Analysis
- Gene Regulation Studies
- Small RNA Identification
- Methylation Analysis
Genotyping of Multiallelic Markers
The Genotyping unit of the Core is highly automated in order to produce data quickly, accurately, and as inexpensively as possible. Genotyping is performed on the two ABI 3730 instruments. These instruments are capable of generating over 10,000 genotypes per instrument per day, 7 days a week. Liquid handling is performed using three robotic systems: a Robbins Hydra Microdispenser and a Biomek FX 96-channel robot for rapid, accurate preparation of 96 and 384 well microtiter plates, and a Biomek NX 8-channel robot to set up PCR reactions and to perform other liquid handling tasks. Marker amplification is performed using ABI GeneAmp PCR System 9700 thermalcyclers. Together, this equipment increases both genotyping speed and accuracy over manual methods.
PCR products are resolved using the ABI 3730 data collection software and sized by application of the Genemapper software package from Applied Biosystems. Two positive control samples of CEPH 1347-2 per 96-well plate are used to validate the accuracy of genotype calls for each marker. The computer-generated genotypes are checked by technicians blind to disease diagnosis and family structure. The called genotypes are imported to a relational database for error-checking and archival.
Genotyping of SNPs
The Core currently has three available technologies for different applications of SNP detection, from discovery and validation to screening, linkage and association studies. Low throughput SNP genotyping can be performed on the PSQ96 Pyrosequencer. For medium throughput, the allelic discrimination assay is run on an ABI PRISM 7900 Sequence Detection System, or the primer extension assay on the Beckman CEQ 8000 capillary instrument. For high throughput, the SNPlex oligonucleotide ligation assay is run on the ABI 3730 capillary sequencer. The 3730, ABI's latest sequencing platform, is capable of performing over 200,000 SNP genotypes per day using the multiplexed SNPlex assay at a low per SNP cost. With the current 48-plex primer sets, the per SNP cost is as low as $0.14 per SNP genotype, depending on the project size.
The Core offers a number of options for gene expression analysis.
- The ABI TaqMan 7900 Real-time PCR instrument is equipped to run individual or 384-well micro fluidic cards, and has two methods for gene expression analysis:
- TaqMan Probe-Based Gene Expression Analysis
- SYBR Green-Based Gene Expression Analysis
- The Fluidigm BioMark Genetic Analysis System can perform single-cell gene expression profiling in a massively parallel format.
- Transcriptome Sequencing on the GS-FLX. Using the GS-FLX instrument and Assembler software it is possible to perform straightforward cDNA sequencing to comprehensively analyze an organism's transcriptome, from small non-coding RNA to full-length coding messenger RNA (mRNA).
- Serial analysis of gene expression (SAGE) on the GS-FLX.
Methylation Analysis can be performed at a number of levels. The PSQ96 Pyrosequencer can give high-precision determination of specific methylation sites. The Roche LightCycler 480 Real-Time PCR System can give the percent methylation of a specific DNA region. At the largest scale, the GS-FLX can give whole genome methylation analysis using chromatin immunoprecipitation (ChIP).
Data Management and Quality Control
Once the genotypes have been called, they are imported into a Microsoft SQL Server database for error-checking and data-cleaning. This Integrated Genetic Database (IGDB) holds all the genotypes generated in the Core, as well as information on the individual projects, such as locus information, pedigree information, phenotypic data, tissue source, DNA concentration, sample location, and instruments and technicians generating the data for the project. The database also holds marker information such as size range, heterozygosity, allele frequency, and genetic and physical maps.
Scientists in the UCLA Department of Human Genetics have developed statistical methods to trap errors and allow a more accurate dataset to be passed on for further statistical analysis. The quality checks are both local and global. That is, each genotype is evaluated independently according to a number of quality parameters, then the overall dataset is judged by population-based statistical methods , . including checking Hardy-Weinberg equilibrium and comparisons to published allele frequencies. These methods are relevant for datasets both with and without family structure information. Finally, for pedigree-based datasets, a statistical analysis providing posterior mistyping probabilities at each genotype is performed. All results obtained during the quality control process are fed back into the relational database. The use of a relational database confers the additional advantages of improved integrity, management, manipulation, and presentation of the considerable amounts of data generated in large genome studies. Tests can be applied during the course of a study, as more data become available. When the data have been thoroughly checked and validated, the results can be exported in a variety of formats for analysis by different statistical packages.
Special consideration is given to the issues of data security and patient confidentiality. To safeguard patient confidentiality to the highest degree, no information that could identify a patient is stored in the Genotyping Core databases connected to the network. Serial back-ups of the databases are stored at a remote site. Raw image data is maintained on line for a period of several months while the data is likely to require frequent referencing. After this the image files are archived permanently.
All data submitted to or generated by the Core is entered into the Core’s Integrated Genetic Database (IGDB). The Core accepts only de-identified data, so all data in the IGDB will be identified by barcode only. Every data record has two release fields: 1. Release Private, that is, ready to return to the Principal Investigator or owner of the data; and 2. Release Public, that is, ready to post publicly in an anonymized form. The Release Private field is checked True as soon as a set of data is complete and fully cleaned. Release Public will generally be permitted shortly after publication of the data, on approval of the Principal Investigator. Once the Release Public field is checked True, de-identified data can be made publicly available through the Core’s web-based front-end. Other non-sensitive categories of data – statistics, metadata, and marker allele frequencies without trait information – will be publicly available over the web as soon as it is cleaned and flagged ready for Release Private. There is also information posted on the same site regarding experimental protocols, methodology, and definitions of variables. Researchers who want more information than the de-identified versions available on the web may submit on-line requests to the Core, who will forward them to the owner of the data.
Outside Investigator Access Subject to individual IRBs and consents, de-identified data will be shared with qualified outside investigators. Access will be determined via a simple online application system. An investigator will fill out an application with their name, department, title, and an abstract stating the data requested, the hypothesis to be tested, and the basic analytic methods. These applications will be reviewed by the Principal Investigators. This committee will include investigators both on the grant, and outside experts who are not co-investigators on the current proposal. Researchers will be eligible for sharing approval based on submission of an application containing a scientifically and ethically sound hypothesis and analytic plan. Investigators will also be required to submit their IRB approval or exemption status with their application. No access will be granted without documented IRB approval. A report of the application review will be generated by each committee member for documentation purposes. The reports will be stored by the UCLA Ataxia Database staff. A record will be kept of all released data, including what data was released, when, to whom, and the constitution of the review committee approving the release.
The Integrated Genetic Database is hosted by the UCLA Department of Human Genetics Bioinformatics Core. The hardware resources provided by the Bioinformatics Core include an extremely fast and high-throughput (Gb Ethernet) Local Area Network and connection to the second-generation Internet (Internet2). The database is written in Microsoft SQL Server 2012 and is run on a dedicated database server. The Bioinformatics Core also provides the necessary staffing to maintain these systems. Transaction logs and serial back-ups of the database are stored off-site.
The Core provides services to research groups on the UCLA campus and in the broader scientific community. Scientists from across the U.S. and Europe have made use of the facility. The Core currently has active accounts with 432 research groups. The Core’s user database holds information on 4707 users, and there have been 913 unique visitors to the laboratory in the last year. The Core website provides information about the Core, technical protocols, genetic education including a genetic glossary, and links and ordering information. The Core website receives approximately 6000 hits per month.
Administrative Support and Institutional Commitment
UCLA has made a commitment to supporting and facilitating the research of all its scientists by providing a system of core facilities. These cores are staffed and equipped at a cutting-edge level, allowing scientists to concentrate more time on intellectual problem solving, and less on acquiring and learning new technologies. The David Geffen School of Medicine at UCLA provides administrative support for many of the financial aspects of running a core. The Department of Human Genetics, where the Genotyping and Sequencing Core is housed, provides the space in which the Core operates, and the daily assistance of the Department's staff of five administrative specialists.
To calculate a budget for your grant, visit our Prices webpage. For genotyping projects, these figures should be used as estimates only. Before you run your project, you must meet with the Core to obtain a quote, as details of your project set-up may affect the cost. You should also consider making allowances in your budget to cover the possibility of having to do a small percentage of repeats of genotyping experiments, in case of missing data.