2017 SUMMER SCHOOL FOR BIG DATA IN BIOLOGY


REGISTRATION IS NOW OPEN FOR 2017 COURSES.

The Center for Computational Biology and Bioinformatics at The University of Texas at Austin is proud to host the 4th Annual Summer School for Big Data in Biology May 22–25, 2017. The summer school provides a unique hands-on opportunity to acquire valuable skills directly from experts in the field, with courses tailored towards novices or intermediate and advance users.

This year we are offering 8 courses. Each will meet for four half-days (either mornings or afternoons) for a total of twelve hours. Instructors will post lectures, datasets, exercises, and course information on a website accessible to enrolled participants. There will be no examinations, but participants may request certificates of completion. Academic credit will not be issued. Please carefully check the specified prerequisite knowledge before enrolling in a course. Payment information for courses is at the bottom of this page.

We now offer credit for educational professional development opportunities during our 2016 Summer School for Big Data in Biology! Teachers currently working in PK–12 settings can earn 12 hours of Continuing Professional Education (CPE) by attending one of our courses held May 22-25, 2017.

The archive of 2016 courses can be found here, and the archive of 2016 courses can be found here.

  1. UTEID: To obtain a UTEID, go here
  2. TACC: To sign up for a TACC account, go here.

Need a quick refresher before the summer school?

Many courses prefer familiarity with Unix, R, and or Python. If you are in need of a refresher for these topics, check out the following 3-hour short courses.

Introduction to R: Monday, February 20. For course details and to register, visit: https://stat.utexas.edu/training/software-short-courses

Introduction to Python: Monday, February 27. For course details and to register, visit: https://stat.utexas.edu/training/software-short-courses

Intro to Unix: Wednesday, May 3. For course details and to register, visit: http://ccbb.utexas.edu/shortcourses.html

TOPIC MORNING COURSES Mon - Thur, May 22-25, 9 a.m.-12 p.m. AFTERNOON COURSES Mon - Thur, May 22-25, 1:30 p.m.-4:30 p.m.
Programming

(New!) Intro to Biocomputing

DNA and RNA sequencing methods and analyses

Introduction to Core NGS Concepts and Tools

Genome Variant Analysis

Metagenomic Analysis of Microbial Communities

Machine Learning Methods in Gene Expression Analysis

Introduction to RNA-Seq

Proteomics Introduction to Proteomics

Data Mining

(New!) Mining Behavioral Data: From Philosophy to Application

 

Course Descriptions

TOPIC: PROGRAMMING

Introduction to Biocomputing

Day and Time: Mon-Thur 1:30 p.m. – 4:30 p.m. Location: MEZ 1.118
Description: An introduction to the Unix command line and Python. Unix basics will include file navigation, pipes, and core utilities. Python basics will cover data types, loops, conditionals, and objects. After the basics are covered, the focus will turn to bioinformatics applications. No previous programming experience assumed.
Instructor: Benjamin (Benni) Goetz, M.S., Bioinformatics Consultant
Instructor Bio: Benni is a bioinformatics consultant in CCBB. Python, Bash, and huge computing clusters are some of his favorite things. In a previous life, Benni studied pure math: differential geometry in particular.
Laptop requirement: Students must bring their own laptops. Windows users should have PuttySSH installed. Mac or Linux users will not need to install anything extra.

 Please read this disclaimer if you are using the UT ProCard for payment!

Back to top

TOPIC: DNA AND RNA SEQUENCING ANALYSIS

Introduction to Core NGS Concepts and Tools

Day and Time: Mon-Thur 9:00 p.m. - 12:00 p.m. Location: MEZ 4.144
Description: This course provides an introduction to the concepts and vocabulary of Next Generation Sequencing (NGS) with an emphasis on common protocols, tools and file formats used in NGS data analysis. Subjects covered include quality assessment and manipulation of raw NGS sequences (FastQC, cutadapt), read mapping (bwa, bowtie2), the Sequence Alignment Map (SAM) format, and tools for manipulating BAM files (samtools, bedtools). Participants will gain hands-on experience using these and other NGS tools in the Linux command line environment at TACC, as well as exposure to the many bioinformatics resources TACC makes available.
Instructors: Anna Battenhouse (Associate Research Scientist) and Amelia Weber Hall (Graduate Student)
Instructor Bios: Anna received a B.A. in English Literature from Carleton College in 1978. After a career in commercial software development from 1982-2005, Anna began her retirement career as a Research Associate in functional genomics in the Iyer lab, and obtained a B.S. in Biochemistry from UT Austin in 2013.

Amelia Weber Hall is a graduate student in the Iyer Lab. Amelia received her B.S. in Molecular Genetics from the University of Rochester in 2007. She worked as a laboratory technician for Richard Aldrich in UT's Department of Neuroscience from 2007-2010. In 2010 she began graduate work in Microbiology and is due to complete her PhD this spring.
Teaching Assistants: Rayna Harris (Graduate student in the Hofmann lab) and Dakota Derryberry (Graduate student in the Wilke lab)
Computer Lab: This course will take place in a computer lab with internet access and a terminal program.
UTEID and TACC Account required: Attendees must have UT EIDs for access to our course wiki, as well as accounts on TACC. Please be sure you know both your UT EID and your TACC username when you come to class. To obtain a UTEID, go here. To sign up for a TACC account, go here.

  Please read this disclaimer if you are using the UT ProCard for payment!

Back to top

Genome Variant Analysis

Day and Time: Mon-Thur 1:30 p.m. – 4:30 p.m. Location: MEZ 4.136
Description: This four-day course is designed to teach you how to analyze next generation sequencing data via a series of interactive tutorials designed to provide hands-on familiarity with a variety of analysis tools (such as Trimmomatic, fastQC, SAMtools, Bowtie2, BWA, breseq, IGV, GATK, and more). Major data analysis topics covered will include read pre-processing, analyzing read quality, genome assembly, read alignment, detection of single nucleotide variants, detection of structural variants, visual representation of such variants, rare variant detection within population, target enrichment strategies, and more. Initially the class will focus on prokaryotic samples as many of the same principles of analysis will apply, later portions of the class will provide an option for each participant to choose between more in-depth prokaryotic analysis or eukaryotic analysis depending on personal relevance. The class will primarily focus on Illumina sequencing data, but discussion will cover alternative library preparation methods as well as alternative technologies.
Instructor: Daniel E. Deatherage, Ph.D., Postdoctoral Researcher
Instructor Bio: Daniel Deatherage earned his doctorate at The Ohio State University studying epigenetic effects of ovarian cancer. His postdoctoral work in Dr. Jeffrey Barrick’s lab has focused on using next generation sequencing to identify ultra rare mutations within evolving populations. He has been teaching or assistant teaching this class for 5 years.
Preferred or prerequisite skills Bio: The use of interactive tutorials allows self paced progress meaning no background required; however, familiarity with command line is helpful and will allow you to complete more content during the course.
Computer Lab: Lab computers will be unix/mac based and will only require terminal access. Laptops are discouraged initially, but if they are to be used, they must be able to connect to TACC and transfer files to and from TACC. Instructions for personal laptop use will be presented before the end of class.
TACC Account required:: Attendees will need an account TACC. Please be sure you know your TACC username when you come to class. To sign up for a TACC account, go here.

  Please read this disclaimer if you are using the UT ProCard for payment!

Back to top

Metagenomic Analysis of Microbial Communities

Day and Time: Mon-Thur 9:00 a.m. - 12:00 p.m. Location: MEZ 2.102
Description: In this four-day introductory course students will learn how to analyze next-generation targeted amplicon metagenomic sequence data from entire microbial communities. We will cover theoretical aspects as well as provide different hands-on sessions to go through all the steps necessary to complete an analysis of microbial communities. The course will focus on targeted amplicon metagenomics (e.g. hyper variable region of 16S rRNA gene), but concepts of shotgun metagenomics approaches will also be presented. Real Next-Generation sequencing data (provided by the instructor) will be used in multiple interactive tutorials. These tutorials will cover raw sequence analysis through statistical analysis and data interpretation. The data will be analyzed using a number of programs but the main focus of the course will be on Qiime. Students do not need to have any prior computational skills or experience with metagenomic analyses. However, basic command line skills would be beneficial.
Instructors: Kasie Raymann, Ph.D., Post-Doctoral Fellow; and Louis-Marie Bobay, Ph.D., Post-Doctoral Fellow
Instructor Bios: Kasie Raymann earned a B.S. in biology from Indiana University Bloomington and a Ph.D. in evolutionary biology from Institut Pasteur in Paris, France. Her doctoral research focused on using large-scale comparative genomics and phylogenetics to investigate the evolutionary relationship between Archaea and Eukaryotes. Kasie started her postdoctoral research at the University of Texas at Austin in October 2014, under the mentorship of Nancy Moran. As a postdoctoral fellow, she is addressing organismal evolution at a finer scale, within the gut microbiome of animals. Her research seeks to understand the population dynamics of microbial communities and the evolutionary processes that shape communities over time.

Louis Marie Bobay earned a Masters in biology from the Ecole Normale Supérieure of Lyon and a Ph.D. in genomic evolution from Institut Pasteur in Paris. Louis-Marie's research focused on the detection and evolution of viral sequences integrated into bacterial chromosomes. Since 2014, he started his postdoc with Howard Ochman at the University of Texas where he works on different questions: the impact of recombination on bacterial population genomics and speciation and the short-term evolution of artificial microbial communities.
Preferred or prerequisite skills: Familiarity with basic Linux/Unix command line is recommended.
UTEID and TACC Account required: Attendees must have UT EIDs for access to our course wiki, as well as accounts on TACC. Please be sure you know both your UT EID and your TACC username when you come to class. To obtain a UTEID, go here. To sign up for a TACC account, go here.

  Please read this disclaimer if you are using the UT ProCard for payment!

Back to top

Machine Learning Methods in Gene Expression Analysis

Day and Time: Mon-Thur 1:30 p.m. - 4:30 p.m. Location: MEZ 1.122
Description: This four-day course will introduce a selection of machine learning methods used in bioinformatic analyses of RNA-seq and other types of gene expression data (RT-qPCR, etc.). We will cover normalization, unsupervised learning and clustering, feature selection and extraction, and supervised learning methods for classification (e.g., random forests, SVM, LDA, kNN, etc.) and regression (with an emphasis on regularization methods appropriate for high-dimensional problems). Participants will have the opportunity to apply these methods as implemented in R and python to publicly available data.
Instructor: Dennis Wylie, Ph.D., Bioinformatics Consultant
Instructor Bio:Dennis Wylie joined the CCBB Bioinformatics group in 2015. He has experience in NGS data analysis including variant calling and RNA-Seq-based biomarker discovery and predictive modeling (classification, regression, etc.). Prior to UT, he earned a PhD in Biophysics from UC Berkeley applying stochastic simulation methods to problems in immunology, did postdoctoral work modeling the transmission of infectious disease, and spent six years as a bioinformatician in industry.
Preferred or prerequisite skills: This course is recommended for students with some prior knowledge of either R or python.
Laptop requirements: Participants are expected to provide their own laptops with recent versions of R and/or python installed. Students will be instructed to download several free software packages (including R packages and python libraries such as including pandas and sklearn).
TACC Account required: Attendees will need an account TACC. Please be sure you know your TACC username when you come to class. To sign up for a TACC account, go here.

  Please read this disclaimer if you are using the UT ProCard for payment!

Back to top

Introduction to RNA-Seq

Day and Time: Mon-Thur 1:30 p.m. - 4:30 p.m. Location: MEZ 4.144
Description: This four-day course provides an introduction to methods for analysis of RNA-seq data. It assumes familiarity and comfort with Linux command line and TACC. A typical RNA-seq workflow will be featured, starting from quality assessment of raw data, mapping (bwa, HISAT2), differential expression analysis (DESeq2, ballgown), splice variant analysis (StringTie) and downstream analyses and visualization. Participants will gain hands-on experience using these tools in a Linux command line environment at TACC.
Instructor: Dhivya Arasappan, M.S., Bioinformatics Consultant
Instructor Bio: Dhivya Arasappan joined UT's Genome Sequencing and Analysis Facility (GSAF) as a Bioinformatician in 2009. Dhivya has over 7 years experience analyzing NGS data from multiple platforms including Illumina, PacBio and SOLiD. Her areas of expertise include de novo genome assembly, particularly using hybrid sequencing data, RNA-Seq analysis, exome analysis, and benchmarking of bioinformatics tools.
Preferred or prerequisite skills: This course is intended for students who are familiar with Unix, TACC, and R programming.
Computer Lab: This course will take place in a computer lab with a terminal, SSH client, and the Broad Integrative Genomics Viewer installed.
TACC Account required:: Attendees will need a TACC account. Please be sure you know your TACC username when you come to class. To sign up for a TACC account, go here.

  Please read this disclaimer if you are using the UT ProCard for payment!

Back to top

TOPIC: PROTEOMICS

Introduction to Proteomics

Day and Time: Mon-Thur 9:00 a.m. - 12:00 p.m. Location: MEZ 4.136
Description: This four-day course will focus on the fundamental knowledge and skillsets needed to utilize mass spectrometry-based proteomics for biological research. We will cover key concepts of experimental design, instrumentation, and data processing. Participants will learn how to use a standard data analysis pipeline to generate peptide and protein identifications and interpret the biological significance of results. Topics will include quantitative proteomics and post-translational modifications. The goal of this course is to provide non-experts with the foundational knowledge necessary to access the power of proteomics for their own research interests.
Instructor: Dr. Daniel Boutz, Research Associate
Instructor Bio: Daniel Boutz received his Ph.D. in Molecular Biology from the University of California, Los Angeles. As a post-doctoral researcher and now Research Associate with Edward Marcotte at the Center for Systems and Synthetic Biology at UT Austin, he specializes in the use of mass spectrometry-based proteomics for systems-level studies of cellular biology and immunology.
Teaching Assistant: TBD
Computer Lab: This course will take place in a Windows computer lab with the software preinstalled.

  Please read this disclaimer if you are using the UT ProCard for payment!

Back to top

TOPIC: DATA MINING

Mining Behavioral Data: From Philosophy to Application

Day and Time: Mon-Thur 9:00 a.m. – 12:00 p.m. Location: MEZ 1.204
Description: This class provides a high-level introduction to the philosophy and practice of behavioral data mining. Emphasis is placed on both the "why" and the "how" of data mining. The first two days will cover philosophical, measurement, and design issues, including the distinction between realist vs. antirealist views of science, the tension between prediction- and explanation-focused approaches to data analysis, and a review of foundational issues in measurement (including reliability and validity). Days three and four focus, respectively, on providing a broad comparative overview of common approaches to statistical and causal inference (e.g., frequentist vs. Bayesian methods; hypothesis-testing vs. estimation), and introducing basic machine learning methods and practices that prioritize good outcome prediction above traditional scientific understanding (e.g., cross-validation, commonly used estimators, etc.). After taking this course, participants should have a basic understanding of the critical elements involved in successfully mining behavioral data at both small and large scales.
Instructor: Dr. Tal Yarkoni, Research Assistant Professor
Instructor Bio: Tal Yarkoni is a Research Assistant Professor in the Department of Psychology at UT-Austin. His research focuses on developing and applying new methods for the acquisition, management, and analysis of behavioral and neuroimaging data, with a particular focus on the analysis of individual differences. He is a longtime proponent and practitioner of open, reproducible science.
Preferred or prerequisite skills: Not mandatory, but participants will get much more from the course if they have a working Python environment on their laptop. I recommend installing Python 3 using the Anaconda Distribution, available freely for all platforms (https://www.continuum.io/downloads)

 Please read this disclaimer if you are using the UT ProCard for payment!

Back to top

REGISTRATION AND PAYMENT

We will accept personal credit cards (American Express, MasterCard, Visa, Discover), UT ProCards (but please read this for important information regarding the use of the ProCard that could effect your registration), and IDT (interdepartmental transfer). Registration dates and fees are as follows:

Registration dates

Category

Registration Fees

March 2, 2016 - May 13, 2016

UTAustin or BEACON

  • Students* $175/course
  • Faculty or Staff* $275/course

UT-System

  • Students* $275/course
  • Faculty or Staff* $275/course

Non-UT Other

  • Students** $275/course
  • Participants $550/course
  • Groups of 5 or more from same agency or institution: $440 per person/course (Call 512-471-5261 for group registration)
  • * Our staff will confirm affiliations with UT.
  • ** Non-UT students must send us a copy of their current student identification. Send PDF scan to this email address
  • Contact our office at 512-471-5261 for more information.

Refund and Cancellation Policy
A full refund of registration fees, less a $25 cancellation fee, will be available if requested in writing and received by May 16, 2016. No refunds will be made after that date. Please note that course substitutions cannot be made. If you fail to cancel by the deadline and do not attend, you are still responsible for full payment. UT-Austin reserves the right to cancel courses and to return all fees in the event of insufficient registration.

Miscellaneous

Location: All workshops will take place in the MEZ building. Room number will be released to class roster shortly before course begins.

Food: Beverages and snacks will be served during 15 minute morning and afternoon breaks. There are also soda and snack machines located in the MEZ building.

Parking: Parking on campus is at a premium. Since the Summer School occurs during the break between the spring and summer semesters, the UT Shuttles will not be operating. The nearest parking garages are the Brazos parking garage (BRG) located at 210 E. MLK, the San Antonio Garage (SAG) located at 2420 San Antonio Street, and the AT&T Executive Education & Conference Center parking garage (CCG) located at 1900 University Avenue. Parking in any of the "A," "F," "D," "OV" or "O" spaces on campus might result in the issuance of a citation. Rates for all campus garages.

Visitor Information: 


• UT Campus Visitor Center information

UT Austin maps
Austin and the UT Drag Travel Guide