Jump to content

Scientific workflow system

From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by 146.169.4.7 (talk) at 14:32, 21 February 2012 (Scientific workflows). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

A Scientific Workflow Systems is a specialized form of a workflow management system designed specifically to compose and execute a series of computational or data manipulation steps, or a workflow, in a scientific application. A specialized form of scientific workflow systems are bioinformatics workflow management systems which focus on a specific domain of science, bioinformatics.

The rising interest in scientific workflow systems has coincided with rising interest in e-Science technologies and applications, and in grid computing. The vision of e-Science is that of distributed scientists being able to collaborate on conducting large scale scientific experiments and knowledge discovery applications using distributed systems of computing resources, data sets, and devices. Scientific workflow systems play an important role in enabling this vision.

There are many motives for differentiating scientific workflows from traditional business process workflows. These include:

  • providing an easy-to-use environment for individual application scientists themselves to create their own workflows
  • providing interactive tools for the scientists enabling them to execute their workflows and view their results in real-time
  • simplifying the process of sharing and reusing workflows between the scientists.
  • enabling scientists to track the provenance of the workflow execution results and the workflow creation steps.

By focusing on the scientists, the focus of designing scientific workflow system shifts away from the workflow scheduling activities, typically considered by grid computing environments for optimizing the execution of complex computations on predefined resources, to a domain-specific view of what data types, tools and distributed resources should be made available to the scientists and how can one make them easily accessible.

Scientific workflows

The simplest computerized scientific workflows are scripts that call in data, programs, and other inputs and produce outputs that might include visualizations and analytical results. These may be implemented in programs such as R or MATLAB, or using a scripting language such as Python or Perl with a command-line interface.

More specialized scientific workflow systems, e.g. Discovery Net, Taverna workbench and Kepler, provide a visual programming front supporting a drag-and-drop visual interface enabling users to easily construct their applications as a visual graph by connecting nodes together. Each directed edge in the graph typically represents a connection from the output of one application to the input of the next.

Scientific workflows are now recognized as a crucial element of the cyberinfrastructure, facilitating e-Science. Typically sitting on top of a middleware layer, scientific workflows are a means by which scientists can model, design, execute, debug, re-configure and re-run their analysis and visualization pipelines. Part of the established scientific method is to create a record of the origins of a result, how it was obtained, experimental methods used, machine calibrations and parameters, etc. It is the same in e-Science, except provenance data are a record of the workflow activities invoked, services and databases accessed, data sets used, and so forth. Such information is useful for a scientist to interpret their workflow results and for other scientists to establish trust in the experimental result.[1]

Examples

There are many examples of scientific workflow systems[2]:

A survey and comparison of some of the above systtems can be found in the paper Scientific workflow systems - can one size fit all? [3]

Sharing Workflows

In addition to the workflow systems themselves, communities such as the social networking site myExperiment have developed to facilitate sharing and collaborative development of scientific workflows. Galaxy provide collaborative mechanisms for editing and publication of workflow definitions and workflow results directly on the Galaxy installation.

Analysis of Scientific Workflows

A key assumption underlying all scientific workflow systems is that the scientists themselves will be able to use a workflow system to develop their applications based on some practical familiarity with programming. Workflow analysis techniques can be used to analyze the properties of such workflows to conduct verification of certain properties before executing them. An example of a theoretical formal analysis framework for the verification and profiling of the control-flow aspects of scientific workflows and their data flow aspects for the Discovery Net system is described in the paper The design and implementation of a workflow analysis tool by Curcin et al[4]. The authors note that introducing program analysis and verification into the workflow world requires detailed understandings of the execution semantics of each workflow language, including the execution properties of nodes and arcs in the workflow graph, understanding of the functional equivalencies between workflow patterns, of data type safety and many other issues. Doing such analysis manually is difficult, and addressing these issues therefore requires building on formal methods typically used in computer science research. Addressing them from a practical perspective requires building on these formal methods to develop user-level tools to reason about the properties of both workflows and workflow systems. It is the lack of such tools that is stopping workflows’ evolution from nice-to-have academic toys to production-level tools used outside the narrow circle of early adopters and workflow enthusiasts.

References

  1. ^ Automatic capture and efficient storage of e-Science experiment provenance. Concurrency Computat.: Pract. Exper. 2008; 20:419–429
  2. ^ Barker, Adam; Van Hemert, Jano (2008), Scientific Workflow: A Survey and Research Directions, Lecture Notes in Computer Science, vol. 4967, Gdansk, Poland: Springer Berlin / Heidelberg, pp. 746–753, doi:10.1007/978-3-540-68111-3_78, ISBN 978-3-540-68105-2 {{citation}}: Unknown parameter |booktitle= ignored (help)
  3. ^ Curcin, V; Ghanem, M (2008), Scientific workflow systems - can one size fit all?, Biomedical Engineering Conference, 2008. CIBEC 2008, IEEE, doi:10.1109/CIBEC.2008.4786077, ISBN 978-1-4244-2695-9
  4. ^ Attention: This template ({{cite doi}}) is deprecated. To cite the publication identified by doi: 10.1098/rsta.2010.0157, please use {{cite journal}} (if it was published in a bona fide academic journal, otherwise {{cite report}} with |doi= 10.1098/rsta.2010.0157 instead.


See also