Julia: An Introduction for Advanced Analytics and Data Science

Author: Ray Johnson Posted In: Data, Machine Learning

There are numerous machine learning frameworks/languages and picking the right one can be challenging and confusing. And I’m about to add another language to the mix – Julia.

I have been exploring Julia for the last year and found it impressive. First, Julia was designed with performance in mind. It is extremely fast and efficient (ref. Julia Benchmarks). Parallelism and distributed computing are inherent in the language, and the package library addresses a wide variety of subjects; statistical analysis, optimization, machine learning; scientific computing and supercomputing simulations. Based on these capabilities, I feel Julia was designed with the future in mind.

Julia offers type stability, meaning that Julia has reached a 1.0 release and is positioned well to address increasing data requirements. It is one of the few languages that has joined the petaflop club and has the inherent ability to spread a workload across thousands of cores. It was used to simulate 188 million astronomical objects. With all of these, Julia is simple and expressive enough to be used as your daily go-to language.

From the data science perspective, there are packages available for clustering, deep learning, computer vision, generalized linear modeling, neural networks and data manipulation and visualization. Julia also provides a wrapper to access “R”, python, C++, functionality, along with popular frameworks such as Tensorflow.

Julia Examples

To get a feel for Julia, here are a few simple examples that highlight some of Julia’s basic features:

For those familiar with LaTex, it is possible to use unicode variables for  assignment. The simplest is value substitution for a single unicode character, in this case PI, which can be expressed as \pi followed by the tab key. This allows for increased expressiveness in calculations. The results can be used in a calculation in the following manner:

Unicode Substitution

Native expression of mathematical formulas is readily available. Also, notice that list comprehensions aid in clarifying what the function is accomplishing.

Native expression of mathematical formulas

Linear programming capabilities is a favorite of mine. Defining and solving optimization problems is very easy in Julia:

Julia Optimization Problem

Substituting the values for x and y shows the function has been maximized.

Getting Started

The presented samples were trivial and designed to give you a feel for the Julia language. Now, there are a few considerations before you start coding everything in Julia. Organizations have existing investments in languages and frameworks. Adding another can be disruptive if the introduction is not done properly.

  • Julia is maturing – Version 1.0 is your best bet for the most recent and stable environment.
  • Use the Julia community – They are very helpful and can provide a great deal of guidance.
  • Learn to think in Julia – Understanding Julia paradigms will make you more productive.
  • Optimize after the fact – After you create your solution, go back and look for opportunities for optimization.
  • Use Julia capabilities – Consider using native functionality first as it tends to be optimized.
  • Vett Packages – There are a lot of Julia packages, and some are better than others. Check with the community and do your homework.
  • Enjoy – Once you have embraced Julia, you will find it very rewarding.

There are a large number of Julia resources available. I would invest time in understanding Julia’s nature and the features that are specific to Julia. I would suggest taking an existing solution and porting it to Julia based on the solutions requirements. Do not port the solution solely based on the source language. That is to say when you port the solution, do it the Julia way. You will get better results and have a better understanding of Julia.

Summary

Julia is just one of many languages and frameworks that can be used for advanced analytics. The important thing to remember is that knowing a language or framework does not define you as a Data Scientist. You still need to understand the data science process and have a core understanding of the underlying theories and methodologies. Languages and frameworks extend the capabilities of the Data Scientist and hopefully increase your overall productivity.

I hope this very cursory overview of Julia will peak your interest and get you to delve deeper into its capabilities and consider it for your next project.

Resources

Core Julia site: https://julialang.org/

Julia Documentation: https://docs.julialang.org

Julia Package Listing https://pkg.julialang.org/

Julia on-line https://juliabox.com/

IDE with Julia support: https://atom.io/