• Conservation of Computational Scientific Execution Environments for Workflow-based Experiments Using Ontologies

This site contains the experimental materials, as well as their descriptions, associated to the experimental section of the PhD thesis developed by Idafen Santana-Perez.


The content described below is aim to support the claims exposed on it and should not be conisdered as a publication itself.

This material is published following the Research Object specification.

Last modified: December 23, 2015

Abstract

This experiment deals with the reproduction of Makeflow's version of the BLAST workflow. Basic Local Alignment Search Tool (BLAST) algorithm is a bioinformatics implementation for comparing sequence information. Different types of genetic information, such as nucleotides or amino-acids sequences, can be queried using a database to find commonalities between several sequences. For more information about this experiment go to the 6.2.3.1 section of the thesis.

Experimental Material

In this section we include the files containing the datasets and the scripts used in the experimentation section of the thesis.

All these files are included on the blast.zip bundle. This is the main hierachy of its contents once unzipped:

  • BLAST: a folder containing the files related to the BLAST workflow use case.
    • • runBlastAws.py: a Precip file that creates a VM on AWS EC2, deploys the necessary software, and executes the BLAST workflow.
    • • Vagrantfile: a Vagrant file that creates a VM on a local Vagrant environment, deploys the necessary software, and executes the BLAST workflow.
    • • annotations: a folder containing the three datasets describing the workflow and its requirements.


This annotations folder contains the following files:
  • Workflow Requirements dataset (wfiConf.ttl): this RDF dataset contains the annotations about the requirements of the BLAST workflow.
  • Software Stacks dataset (swc.ttl): an RDF dataset containing the description of the Software Components necessary for executing the BLAST workflow.
  • Scientific Virtual Appliance dataset (sva.owl): an RDF dataset containing the annotations of the Scientific Virtual Appliances used in this experimentation.

Executing it

To obtain either the Precip or Vagrant files the Infrastructure Specification Algorithm (ISA) must be executed with the files provided in the bundle.

The lastest version of the ISA can be obtained here. To execute it, follow the instructions included on the site, using the files from the annotations folder of the workflow. The specif parameters of the workflows to be used are:

  • • BLAST:
    • • WF_URI=http://purl.org/net/wicus-reqs/resource/Workflow/blast_WF.
    • • SSH_TYPE=dsa;


Along with the executable jar, a file defining the list of available providers is also provided, as well as the text files containing the invocation sequence for each workflow.

To execute the obtained files Precip and/or Vagrant must be installed first. To do so please refer to their correspoing websites:

For executing it the Precip scripts, either the ones provided on the bundle or the ones obtained by executing the ISA, the following steps must be performed:

  • • Set the PYTHONPATH environment variable to the decompressed precip folder (e.g. "setenv PYTHONPATH $PWD/precip")
  • • Execute on the python scripts included on the zip file:
    • • AWS inctructions:
      • • Set the AMAZON_EC2_REGION, AMAZON_EC2_URL, AMAZON_EC2_ACCESS_KEY, and AMAZON_EC2_SECRET_KEY environment variables, indicating the AWS region and AWS endpoint to be used, and the access and secret keys of the user on AWS.
      • • Execute the corresponding file (e.g. "python runBlastAws.py")

For executing the vagrant scripts, once in the correspoing directory on the bundle, execute the vagrant up command. This will read the Vagrantfile file to create the environment and execute the workflow.

About the authors


Author


• Idafen Santana-Perez, (isantana@fi.upm.es) Ontology Engineering Group, Universidad Politécnica de Madrid, Madrid, Spain


Supervisors


• Oscar Corcho, (ocorcho@fi.upm.es) Ontology Engineering Group, Universidad Politécnica de Madrid, Madrid, Spain
• María Pérez-Hernández, (mperez@fi.upm.es) Ontology Engineering Group, Universidad Politécnica de Madrid, Madrid, Spain

Acknowledgements

This material is based upon work supported in part by the FPU grant from the Spanish Science and Innovation Ministry (MICINN), and the Ministerio de Economía y Competitividad (Spain) project "4V: Volumen, Velocidad, Variedad y Validez en la Gestión Innovadora de Datos" (TIN2013-46238-C4-2-R).
We would like to thank the Makeflow team for their support.