DBW – Databases and Web development. 2021-22 Exercises (Deadline 20th Feb)

Personal web site

A free format and contents personal web site, installed at server. It should include:

Links to Solved exercises (below)
A "project" section including link to the presentations and link to the running application

Web application to execute an external program (CLUSTAL-Omega)

Prepare a web application (php or python/flask, running in the course server) to perform multiple sequence alignment using Clustal-Omega (executable can be obtained from https://www.clustal.org). It should have
- Input options:
  - A set of protein sequences (in FASTA)
  - A set of Uniprot ids (sequences could be obtained from https://www.uniprot.org/uniprot/{id}.fasta)
  - A File upload as alternative input source
- Program options (minimum set):
  - output format
  - (Optional) other Clustal-O options
- check input for errors (e.g. Unkown format, No sequences available, ...)
- format the output (be aware of the possible output formats), and allow to download results.
Recommended procedure:
1. Prepare a local installation of ClustalO ([Clustal-O download and install]
2. Test the local installation using the command-line before run it through php
3. Examine ClustalO help to determine the options to include.
4. Prepare the web application. You use the Blast execution from PDBBrowser example as guideline.
5. Test and complete the local application
6. Copy the scripts to your space on the server. Adapt the details of the installation as needed, and test.

Design a Data Model

You are the manager of a bioinformatics support service and need to build a database to manage data from your users' studies. Define a data model (entities, atributes and relationships) to hold data from ONE of the following cases.

A series of GWAS (Genome Wide Association Studies) based on SNP-arrays data. Data should include: 1) a variable set of phenotipic traits. 2) Reference, supplier and genes included in SNP-array used in the study, 3) Sample and user details, 4) GWAS results: genes, annotations, and associated statistics, 5) References.
A series of proteomics analysis. Data should include: 1) Sample and user details, 2) Experimental protocols used, 3) Protein sequences found, 4) Identified proteins from sequences, 5) References