Jump to content

Continuous analytics

From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by Werowe (talk | contribs) at 19:10, 17 May 2016. The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

Continuous Analytics is a process for releasing analytics code in a manner similar to Continuous Release or Continuous Integration for traditional Java development projects and Agile.


Analytics and Continuous Analytics Defined

Analytics is the application of mathematics and statistics to big data. Data scientists write analytics programs to look for solutions to business problems, like forecasting demand or setting an optimal price.

Traditionally data scientists have not been part of IT development teams, like regular Java programmers. This is because their skills set them apart in their own department not normally related to IT. i.e., data scientists. So it is logical to conclude that their approach to writing software code does not enjoy the same efficiencies as the traditional programming team. In particular traditional programming has adopted the Continuous Release approach to writing code and the Agile methodology. That releases software in a continuous circle, called iterations. Because operating that way has become commonplace, there are many software tools to do that.

Continuous Analytics then is the extension of the Continuous Release software development model to the big data analytics development team. The goal of the Continuous Analytics practitioner then it to find ways to add incorporate writing analytics code and installing big data software, like Apache Spark, and running automated unit and functional tests across all of that the same as the traditional Agile development project.

That means getting data scientists to write their Scala, Python, and R code in the same code repository that regular programmers use, like Git or Subversion, so that software like Jenkins can pull it from there and run it through the build process. It also means saving the configuration of the the big data cluster (sets of virtual machines) in some kind of repository as well, like Docker. That facilitates sending out analytics code and big data software and objects in the same automated way as the Continuous Integration process.


[1]

[2]

  1. ^ "Continuous Analytics Define". Southern Pacific Review. Southern Pacific Review. Retrieved 17 May 2016.
  2. ^ Pushkarev, Stepan. "Tear down the Wall between Data Science and DevOps". LinkedIN. LinkedIN. Retrieved 17 May 2016.