Algorithmic trading strategies python
Konstantinos Thanks again for the course and I must once again congratulate you on a fantastic course and learning environment with the Python Quant Platform. It has substantially increased my ability with Python and also with general Linux infrastructure such as cloud servers, etc.
Martin As a side note, I wanted to thank you for creating such a fantastic course. I really felt like I've learned a lot in a short time and definitely feel like you've given a great foundation for me to continue exploring the world of fin-tech.
So again, a huge thank you! Andrew A Perfect Symbiosis Finding the right algorithm to automatically and successfully trade in financial markets is the holy grail in finance. Not too long ago, Algorithmic Trading was only available for institutional players with deep pockets and lots of assets under management. Recent developments in open source software, cloud computing, open data as well as online trading platforms have leveled the playing field for smaller institutions and individual traders.
This makes it possible to get started in this fascinating field being equipped with a modern notebook and an Internet connection only. Nowadays, Python and its ecosystem of powerful packages is the technology platform of choice for algorithmic trading. Among others, Python allows you to do efficient data analytics with e. This is an in-depth, intensive online course about Python version 3. Such a course at the intersection of two vast and exciting fields can hardly cover all topics of relevance.
However, it can cover a range of important meta topics in-depth: An incomplete list of the technical and financial topics comprises: Have a look at the table of contents of the PDF version of the online course material. The course offers a unique learning experience with the following features and benefits. The Python Quants offer an University Certificate Program not included based, among others, on this course that provides an interactive learning experience e.
Below a short video about 4 minutes giving you a technical overview of the course material contents and Python codes on our Quant and Training Platform.
Hilpisch is founder and managing partner of The Python Quants , a group focusing on the use of open source technologies for financial data science, algorithmic trading and computational finance. He is the author of the books. Our discussion covers some libraries which might be less well-known within the Python data community.
We suggest that developers familiar with Python should jump to this part. Finally, we introduce Cuemacro's open-source financial market libraries written in Python: Chartpy visualisation , Findatapy market data and Finmarketpy backtesting trading strategies. We conclude by presenting some examples of market analysis written in Python using these libraries.
The most important aspects you need to consider when choosing a programming language are related to time. One determining factor which you need to pay attention to is execution time, or the time it takes to run your analysis. Another equally important factor is development time, or the time it takes to write the actual code.
The relative importance of execution time versus development time is a key consideration when it comes to choosing an appropriate programming language. When running a high frequency trading HFT strategy in production, execution time is likely to be crucial. This contrasts with longer term trading strategies or prototyping, where execution time is less of a consideration.
We expand upon this idea of balancing execution and development time in the following sections, in which we discuss the relative merits of different types of programming languages for financial market applications. Lower level languages tend to use static typing. Static typing involves specifying the type of data we want to store in variables at compilation, before runtime, which reduces the amount of processing needed at execution.
This means they do not need to worry about freeing up memory space once they have finished using a variable. This, of course, does not totally eliminate the chances of a memory leak in code, which can crash the program. While the bytecode is more portable than the machine code, it still needs to be translated to machine code at time of execution by the virtual machine - known as just-in-time JIT compilation.
This introduces a startup time delay to your program. Historically, JVMs have been slow at executing Java bytecode.
In recent years, however, they have become faster. Furthermore, owing to bytecode to machine code JIT compilation, you can execute the same Java bytecode on a number of different platforms without having to recompile the source code.
This adds to the convenience of using Java: It is possible in principle to compile your code on a Mac and run it on Linux or Windows, reducing development time when using multiple operating systems.
Java is not unique for being compiled to bytecode. C , which bears many similarities to Java in its syntax, and other languages from the. When the primary goal is to reduce development time, rather than execution time, we can turn to interpreted languages, which are very useful for scripting.
Common interpreted languages used in finance include Python, Matlab and R. They are chosen since they reduce development time when prototyping trading strategies. Interpreted languages are generally dynamically typed as opposed to statically typed. This means that the types of variables are associated with their assigned values at runtime, and not specified by the programmer or inferred by a compiler.
This is one feature that makes scripting languages less verbose, making it quicker to write code. On the flip side, execution can take longer. Whilst Matlab is primarily known for its matrix algebra capabilities, it also has many libraries known as toolboxes, which offer additional functionality ranging from signal processing to computational finance to image analysis.
Matlab remains popular partially because so much legacy code in financial firms is written in it. It can also interface well with many other languages with minimal effort, including Python and Java. In recent years Matlab has faced competition from R and Python. Both R and Python offer similar functionality to Matlab, but have the added benefit of being open-source languages.
However, there is an implicit cost from transitioning from Matlab to either Python or R, notably in terms of time spent learning a new language. It also takes time to rewrite legacy Matlab code in Python and R.
R is an open-source version of the statistical package S. Historically, cutting edge statistical techniques have tended to be implemented in R before other languages. This has attracted a large following among the data science community. However, if your application is not purely based around statistics, R might not be the best choice.
Julia is a more recent scripting language, which has been designed to address many of the issues associated with R and Python. For an introduction to Julia, see this issue's "Julia - A new language for technical computing", page In particular, when Julia code is first run, it generates native machine code for execution.
This contrasts with R and Python code which is executed by an interpreter. Theoretically, native machine code should be quicker than interpreted code. NumFOCUS gives a set of benchmarks that indicate the language has comparable performance with C for a number of functions such as matrix multiplication and sorting lists.
So far we have focused on imperative languages. But what about using other types of languages? Haskell is a functional language. For programmers used to imperative programming and the idea of mainly using loops, it can be challenging to adopt a functional approach to programming.
However, certain mathematical problems can be more naturally expressed in a functional framework. Lisp is another common functional language and is often used in natural language processing. Indeed, one of the biggest companies in this area, RavenPack, actively uses Lisp. F , Microsoft's functional language, also has the benefits of being part of the.
NET Framework, so it can be called easily by other. NET framework languages such as C. The JVM also has functional languages, such as Clojure.
Scala combines object-oriented development with functional elements and also compiles into Java bytecode. Q is a query-based language. It might seem odd to consider using a database language for financial analysis. This avoids the overhead associated with retrieving the data from a database. Another downside of Q is that it tends to be relatively complicated to get to grips with although there is the simpler q-SQL language which, as the name suggests, has a similar syntax to SQL.
So far we have discussed the relative merits of several languages when analysing financial data. As we have noted, the language chosen largely depends on the aims of your analysis. However, for most other purposes, where short execution time is not the primary consideration, such as when analysing lower frequency data, there are many other choices. Python can be viewed as a compromise language for market analysis. It has a lot of libraries, just as R and Matlab do. An important part of any larger programming project is the ability to reuse code.
This is facilitated by object-oriented coding, which tends to be easier in Python than in either R or Matlab. Parallelising code, or splitting up the computation into chunks which can be solved at the same time, can cut execution time.
Today, processors usually have many cores for computations, hence a processor can run multiple calculations at the same time. Notably, one drawback of Python is its global interpreter lock GIL which only allows one native thread to execute at any one time.
As a result, the GIL can make it more challenging to parallelise code. Later in the article we discuss other techniques for reducing execution time for Python code. A number of large financial organisations use Python and have adopted it in their core processes. Quartz is used for pricing trades, managing exposure and computing risk metrics across all asset classes.
Of course, this is not to say that the sell side has suddenly dumped technologies like the. Many large quant hedge funds, such as AHL, have also adopted Python. In recent years, financial firms have begun to open source some of the their code. This is likely to be helpful for the adoption of Python within the financial community. Another library, Pandas, very popular for data analysis, originally started as a project at the investment management firm AQR. Just as with R and Matlab, it is beneficial to vectorise Python code.
For example, rather than using a for-loop code structure to multiply matrices, which can be slow, we can use highly optimised matrix multiplication functions instead. Admittedly, in more complicated cases it is not always trivial to vectorise code in this way. As we discussed earlier, given that Python has the GIL, it can be more challenging to do true parallelised computation within a single process.
You need to use a work-around, such as the multiprocessing library, which creates separate Python processes in memory. This approach allows you to do computation on multiple cores. Nonetheless, this also makes it more challenging to share memory between the processes.
These include the multithreading library or the asyncio library, which handle asynchronous IO requests without blocking. Cython presents with another way to speed up Python code. Cython is a static compiler for Python, which also lets you call C functions and declare C types. Python code has dynamic typing, unlike, for example, Java, which has static typing. If you declare C types in Cython, it allows you to convert your slow Python for loops into C.
Cython also makes it possible to release the GIL in order to use multithreading directly in functions without the need to create separate processes. Many libraries in Python extensively use Cython. Admittedly, Cython is not a magic bullet to reduce the execution time of Python code. In some cases, it can be time consuming to rewrite Python code for Cython's compiler. This can be the case when your Python code contains more complicated syntax which cannot be converted easily into low level C code by Cython.
An alternative to Cython is Numba. It is a low level virtual machine LLVM that generates machine code from Python at runtime, which can also be done on a static basis. GPUs are typically useful for large scale computations with repeatable operations, like matrix multiplication. Python now boasts a lot more data libraries than it did several years ago. This has encouraged quants to use Python. In this section we discuss some of the most popular Python data libraries.
The SciPy stack comprises several popular libraries for scientific and technical computing. The first step of learning Python is developing a basic understanding of the syntax. For those wishing to analyse financial markets, it is important to have an understanding of the SciPy stack. In particular, we would recommend focusing on NumPy and Pandas, given that financial market data often consists of time series data.
NumPy is at the core of the stack and offers a large number of functions to deal with matrix manipulation of 'ndarray' objects, which are n-dimensional arrays. These types of functions are at the source of much of the computation in financial analysis. NumPy can be viewed as the Python equivalent of Matlab's matrix functionality. Pandas is a Python data analysis library which deals with time series.
It offers functions to perform common manipulations of time series, such as aligning or sorting them. At its core are several data structures: These data structures can be seen as Python's equivalent of R's data frames. The underlying dates and data within these data structures are stored as NumPy arrays.
IPython is an interactive notebook-based environment for Python code. We can combine Python code, text and results in a single file with IPython. It enables us to create interactive research documents, where the code and results of our output are in a single place.
This contrasts with the typical alternative, such as a static PDF file. One of the reasons for R's popularity is its ggplot library which produces high quality visualisations. Matplotlib is the most popular visualisation library for Python and it is designed to replicate much of the functionality of ggplot. Matplotlib can generate a multitude of plots, ranging from simple 2D plots to more complicated 3D plots and animations.
However, some of its functionality can be challenging to use, which has led to the development of wrappers to simplify its interface. These include the libraries Seaborn and Chartpy. The SciPy library - not to be confused with the SciPy stack - provides methods for a number of different computations used in financial analysis, including numerical integration, optimisation, interpolation, linear algebra, statistics and image processing.
As computing power has become cheaper and more datasets have become available, the interest in machine learning has grown significantly. In a nutshell, the idea of machine learning is to make inferences between different variables within a dataset where we do not know the underlying function or a process beforehand.
Python has many libraries for machine learning; we describe a few popular ones. Scikit-learn is perhaps the best known of the machine learning libraries for Python. It can be used for a number of tasks including classification, regression, clustering, dimensionality reduction, model selection and pre-processing.
The algorithms range from linear regressions to techniques which can handle non-linear relationships, like support vector machines and k-nearest neighbours. The deep learning library TensorFlow was released by Google in TFLearn provides a simplified interface for using TensorFlow, similar to scikit-learn.