Towards the end of last year I wrote a post (see here) about the advent of modern programming languages, including the JIT compiled Julia and visual programming language ADL from Trading Technologies. My conclusion (based on a not very scientific sample) was that we appear to be at the tipping point, where the speed of newer, high level languages languages is approaching that of the fastest compiled languages like C/C++.
Now comes a formal academic study of the topic in A Comparison of Programming Languages in Economics, Aruoba and Fernandez-Villaverde, 2014. Using the neoclassical growth model, the authors conduct a benchmark test in C++11, Fortran 2008, Java, Julia, Python, Matlab, Mathematica, and R, implementing the same algorithm, value function
iteration with grid search, in each of the languages. They report the execution times of the codes in a Mac and in a Windows computer and briefly comment on the strengths and weaknesses of each language.
The conclusions from the study mirror my own thoughts on the subject very closely. The authors find that:
- C++ and Fortran are still considerably faster than any other alternative, although one needs to be careful with the choice of compiler.
- C++ compilers have advanced enough that, contrary to the situation in the 1990s and some folk wisdom, C++ code runs slightly faster (5-7 percent) than Fortran code.
- Julia delivers outstanding performance. Execution speed is only between 2.64 and 2.70 times slower than the execution speed of the best C++ compiler.
- Baseline Python was slow. Using the Pypy implementation, it runs around 44 times slower than in C++. Using the default CPython interpreter, the code runs between 155 and 269 times slower than in C++.
- Matlab is between 9 to 11 times slower than the best C++ executable.
- R runs between 475 to 491 times slower than C++. If the code is compiled, the code is between 243 to 282 times slower.
- Hybrid programming and special approaches can deliver considerable speed ups. For example, when combined with Mex files, Matlab is only 1.24 to 1.64 times slower than C++ and when combined with Rcpp, R is between 3.66 and 5.41 times slower. Similar numbers hold for Numba (a just-in-time compiler for Python that uses decorators) and Cython (a static compiler for writing C extensions for Python) in the Python ecosystem.
- Mathematica is only about three times slower than C++, but only after a considerable rewriting of the code to take advantage of the peculiarities of the language. The baseline version of the algorithm in Mathematica is considerably slower.
C++ still represents the benchmark for speed, but not by much. It is barely faster than the old stalwart, Fortran, and only 1.5 – 3 times faster than up-and-coming rivals amongst the higher level languages (especially when you allow for hybrid programming to speed up the slowest algorithms).
So, as regards developing financial models and trading systems, my questions are (as before):
- Why would anyone prefer Python, given that there is a much faster, free alternative in Julia, which is just as easy a language to program in?
- What justification is there for preferring R to Matlab, other than cost?
- Why does anyone bother with Java? If speed is the critical issue, there are faster alternatives. If you like the relative simplicity of the syntax, Julia is cleaner, simpler and just as fast in execution.
When you reach a point where a high level language like Matlab is only around 1.5x – 2x slower than C++, you really have to question whether the latter is an appropriate choice. Yes, of course, in mission-critical applications where you need access to the hardware layer for speed purposes, C++ is the way to go. But for so many applications, that just isn’t the case.
What matters, far, far more, are the months of costly and laborious programming effort that is often required to reproduce basic functionality that is already embedded in higher level languages like Matlab or Mathematica. Not only that, but the end result of a C++ /Java development effort is likely to be notoriously inflexible by comparison. That’s a huge drawback. Rarely, if ever, does a piece of research translate flawlessly into production – it requires one to iterate towards a final solution, often making significant changes to the design of the system in the light of practical experience.
If I had to guess, based on my experience, I would say that 80% or more of development tasks in quantitative research and trading would produce a superior result if preference was given to using a higher level language for the initial development. When the system is sufficiently stable to put into production, you simply create a hybrid application by recoding any mission-critical components for which speed is an issue in C++.
Finally, where does that leave my beloved Mathematica? To be fair, while you don’t have the joys of strong typing to contend with, Mathematica’s syntax is just as demanding and uncompromising as C++ – a missed comma or incorrectly placed bracket is just as critical. But, the point is, while in C++ the syntactical rigor is just annoying, in Mathematica it’s worth putting up with because the productivity is so much greater. A competent programmer can produce, in a single line of Mathematica code, a program that would require hundreds, if not thousands of lines of C++ code to accomplish. Sure, he might get the syntax wrong at first: but it’s only a single line of code and the interactive gui interface makes debugging very simple.
That said, while Mathematica can be very tedious to use for procedural programming, it excels in three areas:
1. Symbolic programming. Anything involving mathematical symbols and equations – Mathematica is #1
2. User interface. In Mathematica, it is trivial to build a sophisticated, dynamic gui in no time at all, again, often in 1-2 lines of code
3. Functional programming. Anything that can be thought of as a function, Mathematica handles extremely well. We are not talking about finding a square root here: I mean extremely complex functions that, again, might take hundreds of lines of code in another language.
It is also worth pointing out that Mathematica comes supplied with functionality that Matlab provides only through numerous, costly add-on packages.
CONCLUSION
Before I allow a development team to start mindlessly coding up a system in Java or C++, I want to hear their reasons why they aren’t going to do it 10x faster in another, higher level language. “We always use C++/Java for production” is not a reason. Specifically, which parts of the system require the additional 1.5x speed-up, and why can’t they be coded as dlls (Matlab mex functions)?
Finally, on a cost-benefit basis, ask yourself how much you might benefit if the months and tens (or hundreds) of thousands of dollars wasted on developing in C++ were instead spent on researching and developing new trading ideas.