DBW – Databases and Web development. 2021-22 Exercises (Deadline 20th Feb)
Personal web site
A free format and contents personal web site, installed at server. It should include:
- Links to Solved exercises (below)
- A "project" section including link to the presentations and link to the running application
Web application to execute an external program (CLUSTAL-Omega)
- Prepare a web application (php or python/flask, running in the course server) to perform multiple sequence alignment using Clustal-Omega (executable can be obtained from https://www.clustal.org).
It should
have
- Input options:
- A set of protein sequences (in FASTA)
- A set of Uniprot ids (sequences could be obtained from https://www.uniprot.org/uniprot/{id}.fasta)
- A File upload as alternative input source
- Program options (minimum set):
- output format
- (Optional) other Clustal-O options
- check input for errors (e.g. Unkown format, No sequences available, ...)
- format the output (be aware of the possible output formats), and allow to download results.
Recommended procedure:
- Prepare a local installation of ClustalO ([Clustal-O download and install]
- Test the local installation using the command-line before run it through php
- Examine ClustalO help to determine the options to include.
- Prepare the web application. You use the Blast execution from PDBBrowser example as guideline.
- Test and complete the local application
- Copy the scripts to your space on the server. Adapt the details of the installation as needed, and test.
- Input options:
Design a Data Model
You are the manager of a bioinformatics support service and need to build a database to manage data from your users' studies. Define a data model (entities, atributes and relationships) to hold data from ONE of the following cases.
- A series of GWAS (Genome Wide Association Studies) based on SNP-arrays data. Data should include: 1) a variable set of phenotipic traits. 2) Reference, supplier and genes included in SNP-array used in the study, 3) Sample and user details, 4) GWAS results: genes, annotations, and associated statistics, 5) References.
- A series of proteomics analysis. Data should include: 1) Sample and user details, 2) Experimental protocols used, 3) Protein sequences found, 4) Identified proteins from sequences, 5) References