Course Outline (F2019)

BME501: Bioinformatics

Instructor(s)Eric Harley [Coordinator]
Office: ENG 287B
Phone: TBA
Email: eharley@scs.ryerson.ca
Office Hours: Thursdays 2-4 pm
Calendar DescriptionIntroduction to analysis, management, and visualization of cellular information at the molecular level. The course includes an overview of mathematical modeling and simulation, pattern matching, methods for phylogenetics, gene recognition, distributed and parallel biological computing, designing and managing biological databases (both relational and object-oriented), linking disparate databases and data, data mining, reasoning by analogy, hypothesis formation and testing by machine.
Prerequisites(BLG 600 or BLG601) and MTH312




Compulsory Text(s):
  1. “Exploring Bioinformatics, A Project-Based Approach”, Second Edition by Caroline St. Clair & Jonathan E. Visick Jones & Bartlett Learning 2015.
Reference Text(s):
  1. Data Mining, Practical Machine Learning Tools and Techniques, Third Edition, I.H. Witten, E. Frank, M.A. Hall, Elsevier, Morgan Kaufmann Publishersl, 2011.
Learning Objectives (Indicators)  

At the end of this course, the successful student will be able to:

  1. Develop further knowledge of science in support of application to engineering problems. (1a)
  2. Apply mathematical principles, skills, and tools to solve engineering problems, highlighting limitations or a range of applications; use algorithms and available software to solve mathematical models. (1b)
  3. Evaluate sources of information, check the feasibility of design based on obtained results, and assess the reliability of conclusions. (2a)
  4. Develop further knowledge of uses of modern instrumentation, data collection techniques, and equipment to conduct experiments and obtain valid data. (5a)
  5. Apply statistical procedures, investigate possible artefacts, verify experimental results, consider possible extensions of results to other areas, interpret results with regards to given assumptions, and assess accuracy of results. (5b)
  6. Discuss the responsibility of the engineer to protect the public interest when working with genes and genetic data. (8b)
  7. Discuss ethical protocols and risks when collecting, analyzing and sharing genetic data or modifying genes. (10a)

NOTE:Numbers in parentheses refer to the graduate attributes required by the Canadian Engineering Accreditation Board (CEAB).

Course Organization

3.0 hours of lecture per week for 13 weeks
0.0 hours of lab/tutorial per week for 12 weeks

Course Evaluation
Midterm Exam 25 %
Quizzes 5 %
Assignments 25 %
Final Exam 45 %
TOTAL:100 %

Note: In order for a student to pass a course with "Theory and Laboratory" components, in addition to earning a minimum overall course mark of 50%, the student must pass the Laboratory and Theory portions separately by achieving a minimum of 50% in the combined Laboratory components and 50% in the combined Theory components. Please refer to the "Course Evaluation" section for details on the Theory and Laboratory components.

ExaminationsMidterm exam in Week 8, two hours, multiple-choice, short-answer and programming, closed book (covers Weeks 1-5).
 Final exam, during exam period, three hours, closed-book, comprehensive, in a computer lab on a computer.
Course Content



Chapters /

Topic, description




  -  Introduction to BME 501
 NCBI databases
   - Parkinson's Disease primary databases and
     metadatabases genome-wide association studies
     Data mining (Ch 1 2) -- class attribute instance




Computational Manipulation of DNA
   - Introduction to Python genetic
   - screening for cystic fibrosis
   - computational algorithms string manipulation
   - Data mining (Ch 4.1) -- 0R 1R rules




Sequence Alignment
   - Origin of new influenza virus strains optimal global and
     local alignments of DNA alignment parameters
     Needleman-Wunsch algorithm EMBOSS implementation
     two dimensional arrays dynamic programming
     Data Mining (Ch 4.2) -- Naive Bayes




Database Searching and Multiple Alignment
   - searching sequence databases for matches (BLAST)
     multiple sequence alignment using ClustalW alignment
     algorithms and heuristics
   - overuse of agricultural antibiotics
   - antibiotic resistance
   - dynamic programming
   - Data mining (Ch 5) -- credibility accuracy



   - protect the public interest
   - privacy issues when collecting and analyzing data
   - risks and responsibiities when modifying genes
   - CRISPR-CAS9 potential



Midterm (Wednesday Oct 23 2h)
     Data mining (Ch 4.3) – Decision tree




Substitution Matrices and Protein Alignments
   - scoring matrices for protein alignment
   - deriving substitution matrices nested hash tables.




Distance Measurement in Molecular Phylogenetics
   - Evolutionary relationships
   - distance metrics (Jukes-Cantor, Kimura Tamura)
   - introduction to phylogenetic trees phylogeny.fr
   - Data Mining (Ch 4.8) -- clustering




Tree-building in Molecular Phylogenetics
   - How to use distance measurements
   - agglomerative clustering
   - single linkage UPGMA neighbor joining
   - probabilistic methods in phylogenetics




Sequence-Based Gene Prediction
   - Prediction of genes in a resistance plasmid
   - ORF finding and promoter prediction
   - NCBI ORF Finder NEBcutter EasyGene
   - pattern matching algorithms

