Salta ai contenuti. | Salta alla navigazione

Strumenti personali

BIOINFORMATICS AND GENOME ANALYSIS

Academic year and teacher
If you can't find the course description that you're looking for in the above list, please see the following instructions >>
Versione italiana
Academic year
2022/2023
Teacher
SILVIA FUSELLI
Credits
6
Didactic period
Primo Semestre
SSD
BIO/18

Training objectives

Genomics studies the contents, the structure, the expression and the evolution of the genetic material coding for the structures and the respective functions of living organisms, that is inherited from generation to generation. This course explores that branch of the bioinformatics that allows to analyse “in silico” the high-throughput outputs of genetics and genomics sequencing projects using methodologies of new generation.
Knowledge and understanding
The course is aimed to
- provide knowledge of the structure and organization of prokaryotic and eukaryotic genomes
- provide knowledge of the strategies and techniques used to study different genomes
- teach how to retrieve and interpret information form the most important biological databases
- provide theoretical elements of bioinformatics and computational genomics
The practical activities will allow the students to obtain basic knowledge of Linux operating system, which is needed to analyze the big data produced by the genomes sequencing projects.
Ability to apply knowledge and understanding
The students will be able to
- design a prokaryotic or eukaryotic genome analysis, design the study of a reduced representation of a genome
- retrieve information from the most important biological databases and extract genes, genomes, and, partly, protein data useful to design experiments or to analyse data from experiments of genomic sequencing
During the laboratory the students will go through the bioinformatic workflow of a sequencing project. By applying the most commonly used bioinformatic tools, they will learn how to extract the biologically meaningful information from raw data produced by second and/or third generation sequencing approaches.

Prerequisites

No formal propedeuticity. The bioinformatic analysis of genomic data requires good knowledge of genetics, in particular of the laws of inheritance and of the mutational mechanisms. Good knowledge of molecular biology is required, and in particular of the nucleic acids duplication, transcription and translation. Basic knowledge of biostatistics.

Course programme

Frontal lectures (recorded if necessary) and informatic laboratory.

The central dogma and the molecules of inheritance (4h)
Nucleic acids and genetic code. Amino acids and their substitutions. Mutation as a source of variation. Different kind of mutations, definitions, functional effect of synonymous and nonsynonymous changes

Sizes and organization of genomes (8h)
¿ Prokaryotic and eukaryotic genomes. Chromosomes: structure, numbers, ploidy, K, N e C paradoxes.
¿ Genes: traditional and extended definitions, the ENCODE project. Simple and complex genomes: how many genes e and functional regions; gene expression and epigenetics.
¿ The human genome: an example of a complex eukaryotic genome. Contents of the human genome; description and definition of genomic variation; characterizing human genomic variation: international projects; examples of genomic regions coding for the proteome: extremely short and long genes, gene families (moderately repetitive DNA).

Methods for genome analysis, Next Generation Sequencing (NGS) (10h)
¿ Frederick Sanger and the development of DNA sequencing.
¿ Second-generation sequencing methods (NGS or High Throughput Sequencing). Library preparation, controls and quantification, sequencing and signal detection.
¿ Third-generation sequencing methods (Single Molecule Real Time Technology and Nanopore sequencing)
- Chromosome conformation capture techniques

Metagenomics: new perspective in the ecology field (4h)
¿ How to use new sequencing technologies to study environmental samples. Definitions and examples. From samples to sequences: standard workflow of metagenomic analysis.
¿ Barcoding and metabarcoding.

Comparing sequences: pairwise and multiple sequence alignments (6h)
¿ Alignments: why? Similarity and homology; global and local alignments.
¿ Alignment algorithms: substitution matrices of DNA and proteins, gap penalties; exhaustive and heuristic algorithms (Needleman-Wunsch, Smith-Waterman, FASTA, BLAST)

Searching sequences in biological databases and Basic Local Alignment Search Tool (BLAST) (4h)
¿ Biological databanks (computer exercises to learn how to search specific databases): National Center for Biotechnology Information (NCBI); ENSEMBL. Students are required to bring their own laptops, 12 laptops are available.
¿ How to use BLAST: practical exercises.

Bioinformatics and Next Generation Sequencing (4+12h)
Student will use a computer to analyze sequencing data from High Throughput Sequencing (Illumina technology) or long reads. In particular, we will explore the pipelines based on common bioinformatics tools to covert raw data files into biologically meaningful information.
In detail:
basis of BASH, interpreter to run Linux operating system (4h); from raw data (sequences in FASTQ format) to variable sites (vcf format). Standard programs and modules will be run to get through the following steps: FASTQ quality control + trimming; alignment to a reference genome; bam refinement; bam check and visualization; variant calling; variant filtering and validation(12h).

Didactic methods

Frontal lectures and informatics laboratory. The course is structured in 52 hours (6 CFU): 40 hours of frontal lectures and 12 hours of practice in lab. Lectures are provided on a weekly basis in class or recorded, with power-point slides, videos showing NGS technologies, biological databases and alignment algorithms are described and explored with online exercises.
The practical part of the course requires Linux operating system

The tutorials are held on the Linux operating system in the informatic lab. Alternatively, the students will work with their own PC and, where specific programs that are difficult to install are requested, the teacher will demonstrate on her own computer how to carry out the analysis.

Learning assessment procedures

The aim of the exam is to verify at which level the learning objectives have been acquired. The exam is divided in two parts that take place the same day. A minimum score of 9/15 is required to pass each part of the exam. The exam is passed if both tests are sufficient.
Part I:
If the exam is written: 4 questions (multiple choices, short open questions). This part is aimed to verify the basic knowledge and understanding of genomics, sequencing methods, metagenomics, sequence alignments. Time: 1 h.
Part II. Written or oral.
the student will show his/her knowledge of the Linux file system and ability to use BASH commands. Few questions will be asked on the analysis workflow of the practical part of the course, specifically the reason to run specific software modules and why this allows to obtain a better result. The knowledge of the main file formats is required. This part lasts about 1h if written.

Reference texts

Fondamenti di bioinformatica
Manuela Helmer Citterich, Fabrizio Ferrè, Giulio Pavesi, Graziano Pesole, Chiara Romualdi
2018 (Zanichelli)

Next-Generation Sequencing Data Analysis by Xinkun Wang (Taylor and Francis Group) covers most of the topic of the course.

Online resources, pdf version of the course slides, examples of practical exercises with solutions (course website).