Information for Attendees of the Sequencing Technology Workshop, 2019

Part of the 6th Annual RNA Symposium of the RNA Institute

Instructor: Morgan Sammons, Assistant Professor of Biology, State University of New York at Albany

Instructor: Ryan Meng, Bioinformatics Support Specialist, State University of New York at Albany

There are a number of resources online for learning about and using high-throughput sequencing data in your own work. The Harvard Chan Bioinformatics Core provides extremely detailed, well-organized introductions to a number of sequencing approaches, software, and workflows. I highly recommend this for further information.

Main Presentation
Presentation given by Morgan as part of the workshop.
Bulk RNA-seq Replicate Guidelines
Excellent manuscript published in RNA discussing why biological replicates matter and how to select the number of replicates in a bulk RNA-seq experiment.

Tools used during the symposium

STAR: a splice-aware alignment tool.

Dobin, A., Davis, C. A., Schlesinger, F., Drenkow, J., Zaleski, C., Jha, S., Batut, P., Chaisson, M., … Gingeras, T. R. (2012). STAR: ultrafast universal RNA-seq aligner. Bioinformatics (Oxford, England), 29(1), 15-21. doi: 10.1093/bioinformatics/bts635

DESeq2: An R-based package for differential gene expression analysis using raw read counts derived from STAR. Short Tutorial

Love, M.I., Huber, W., Anders, S. (2014) Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biology, 15:550 doi: 10.1186/s13059-014-0550-8

Samtools: software for manipulating SAM and BAM alignment files.

Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R, and 1000 Genome Project Data Processing Subgroup, The Sequence alignment/map (SAM) format and SAMtools, Bioinformatics (2009) 25(16) 2078-9 doi: 10.1093/bioinformatics/btp352

Commands used for Genome Indexing and Alignment

Genome Indexing with STAR

We used the Saccharomyces cerevisiae sacCer3 genome from UCSC found at the Illumina iGenomes website.

iGenomes is a nice source for the genomes of many model organisms used in research.

You can use STAR (or other genome aligners) to build indexes to your own model organism using a FASTA file of the genome.

Genome Alignment with STAR

We used STAR in a cluster setting to align our raw FASTQ files to the sacCer3 genome we indexed in the prior steps.

The script shown here is designed to be used in a cluster setting using the slurm schedule manager. If you are performing these tasks locally, you would not need to use the slurm nomenclature. If you are performing these tasks on your home institution cluster, they may use another schedule manager.

Another point to remember is that we asked for certain computational resources in order to do the alignment. We based these numbers off of the available computational resources and our expectations of how long the alignment should take based on it being yeast (versus human or mouse with larger genomes).

Non-alignment Based Strategies for Differential Gene Expression

We used salmon to perform transcript quantification of our RNA-seq data without prior alignment to a reference genome.

The major advantage over STAR (or other aligners) is speed. salmon and a similar tool called kallisto do not actually perform any alignments, which drastically speeds up the process of quantifying your RNA-seq data.

The other huge advantage is the processing power required to perform the transcript quantification using salmon or kallisto. These two programs can be run on your laptop or desktop and do not require anything more than that!

Information for ABIO681 - Spring 2019

Syllabus


Reading due 02/05
Casadevall and Fang 2009

Presentation Date Paper Presenters
12-Feb Fields and Song 1989 McCauley/Altreith
19-Feb Wang 1993 and Wilson 1991 Lin/Catizone
26-Feb DeRisi et al 1996 Koslow/McCarthy
5-Mar Burns et al 1994 and Giaever et al 2002 O’Keefe/Martin
12-Mar Krogan et al 2006 Soyer/Sammons
26-Mar Bentley et al 2008 Durham/Naik
9-Apr Ren et al 2000 and Johnson et al 2007 Waldern/Moskwa

| Presentation Date | Paper | Authors | Reviewers | |———————–|———–|————-|—————| |16-Apr | Barski et al 2007|O’Keefe/Naik| Moskwa/McCarthy| |23-Apr| Korthout et al 2018 |Altreith/Martin| Durham/Waldern|


Reading due 04/30 #1
Casadevall and Fang 2014
Reading due 04/30 #2
Berg 2016

Presentation Date Paper Authors Reviewers
7-May Lee et al 2019 McCauley/Catizone Koslow/Lin

23 Dec 2017 by sammons

Links I really like

UCSC Genome Browser
Ubiquitous and fantastically data rich, UCSC Genome Browser is my go-to.
WashU Epigenome Browswer
Excellent 3D visualization tools as well as host to a number of consortium data hubs
Cancer Cell Line Encyclopedia
Search for your favorite genes and compare expression across multiple cancer cell lines. Very well done.
HOMER
One of the best annotated software suites, this is still my favorite semi-integrated next-gen sequencing analysis software. Bonus for being great at its original function: DNA motif analysis
Bedtools
Indispensable tool for analysis of genomic intervals/peak locations.
Homebrew
Easiest way to install and update command line software tools on Mac.
Homebrew Formulas
Nice list of software that can be installed/managed by homebrew.
Gene Expression Omnibus
NIH-funded repository for genomics data of all types
Sequence Read Archive
NIH-funded repository for next-generation sequencing reads.
deeptools
Very nice suite of tools for some ChIP-seq/RNA-seq applications; many uses!

DEseq2

STAR Aligner
Super fast and accurate aligner: requires high RAM for mammalian genomes, but speed can’t be beat.
Bowtie2
My favorite short read aligner (mainly because I know how to use it).
HOCOMOCO
Nice database of DNA-binding protein motifs
Firebrowse
Integration of (primarily) the Cancer Genome Atlas (TCGA) data
cBioPortal
Integration of TCGA and many other datasets
EMBOSS
Software suite that performs numerous DNA/RNA sequence manipulations
JASPAR
Transcription factor binding motif database
Software Carpentry
Learn how to code (I need to do this)

21 Apr 2017 by sammons
Intro to Next-Gen Sequencing
A completely non-comprehensive introduction into next-gen sequencing. Primarily designed as part of a larger bootcamp held @ UAlbany
Getting Started with Next-Gen Sequencing
Tips and tricks for getting started in sequencing at UAlbany
Applications: ATAC-seq
Data I presented that turned into this paper with the Wherry lab @ the University of Pennsylvania