numpy.rst revision 11986
111986Sandreas.sandberg@arm.com.. _numpy: 211986Sandreas.sandberg@arm.com 311986Sandreas.sandberg@arm.comNumPy 411986Sandreas.sandberg@arm.com##### 511986Sandreas.sandberg@arm.com 611986Sandreas.sandberg@arm.comBuffer protocol 711986Sandreas.sandberg@arm.com=============== 811986Sandreas.sandberg@arm.com 911986Sandreas.sandberg@arm.comPython supports an extremely general and convenient approach for exchanging 1011986Sandreas.sandberg@arm.comdata between plugin libraries. Types can expose a buffer view [#f2]_, which 1111986Sandreas.sandberg@arm.comprovides fast direct access to the raw internal data representation. Suppose we 1211986Sandreas.sandberg@arm.comwant to bind the following simplistic Matrix class: 1311986Sandreas.sandberg@arm.com 1411986Sandreas.sandberg@arm.com.. code-block:: cpp 1511986Sandreas.sandberg@arm.com 1611986Sandreas.sandberg@arm.com class Matrix { 1711986Sandreas.sandberg@arm.com public: 1811986Sandreas.sandberg@arm.com Matrix(size_t rows, size_t cols) : m_rows(rows), m_cols(cols) { 1911986Sandreas.sandberg@arm.com m_data = new float[rows*cols]; 2011986Sandreas.sandberg@arm.com } 2111986Sandreas.sandberg@arm.com float *data() { return m_data; } 2211986Sandreas.sandberg@arm.com size_t rows() const { return m_rows; } 2311986Sandreas.sandberg@arm.com size_t cols() const { return m_cols; } 2411986Sandreas.sandberg@arm.com private: 2511986Sandreas.sandberg@arm.com size_t m_rows, m_cols; 2611986Sandreas.sandberg@arm.com float *m_data; 2711986Sandreas.sandberg@arm.com }; 2811986Sandreas.sandberg@arm.com 2911986Sandreas.sandberg@arm.comThe following binding code exposes the ``Matrix`` contents as a buffer object, 3011986Sandreas.sandberg@arm.commaking it possible to cast Matrices into NumPy arrays. It is even possible to 3111986Sandreas.sandberg@arm.comcompletely avoid copy operations with Python expressions like 3211986Sandreas.sandberg@arm.com``np.array(matrix_instance, copy = False)``. 3311986Sandreas.sandberg@arm.com 3411986Sandreas.sandberg@arm.com.. code-block:: cpp 3511986Sandreas.sandberg@arm.com 3611986Sandreas.sandberg@arm.com py::class_<Matrix>(m, "Matrix") 3711986Sandreas.sandberg@arm.com .def_buffer([](Matrix &m) -> py::buffer_info { 3811986Sandreas.sandberg@arm.com return py::buffer_info( 3911986Sandreas.sandberg@arm.com m.data(), /* Pointer to buffer */ 4011986Sandreas.sandberg@arm.com sizeof(float), /* Size of one scalar */ 4111986Sandreas.sandberg@arm.com py::format_descriptor<float>::format(), /* Python struct-style format descriptor */ 4211986Sandreas.sandberg@arm.com 2, /* Number of dimensions */ 4311986Sandreas.sandberg@arm.com { m.rows(), m.cols() }, /* Buffer dimensions */ 4411986Sandreas.sandberg@arm.com { sizeof(float) * m.rows(), /* Strides (in bytes) for each index */ 4511986Sandreas.sandberg@arm.com sizeof(float) } 4611986Sandreas.sandberg@arm.com ); 4711986Sandreas.sandberg@arm.com }); 4811986Sandreas.sandberg@arm.com 4911986Sandreas.sandberg@arm.comThe snippet above binds a lambda function, which can create ``py::buffer_info`` 5011986Sandreas.sandberg@arm.comdescription records on demand describing a given matrix. The contents of 5111986Sandreas.sandberg@arm.com``py::buffer_info`` mirror the Python buffer protocol specification. 5211986Sandreas.sandberg@arm.com 5311986Sandreas.sandberg@arm.com.. code-block:: cpp 5411986Sandreas.sandberg@arm.com 5511986Sandreas.sandberg@arm.com struct buffer_info { 5611986Sandreas.sandberg@arm.com void *ptr; 5711986Sandreas.sandberg@arm.com size_t itemsize; 5811986Sandreas.sandberg@arm.com std::string format; 5911986Sandreas.sandberg@arm.com int ndim; 6011986Sandreas.sandberg@arm.com std::vector<size_t> shape; 6111986Sandreas.sandberg@arm.com std::vector<size_t> strides; 6211986Sandreas.sandberg@arm.com }; 6311986Sandreas.sandberg@arm.com 6411986Sandreas.sandberg@arm.comTo create a C++ function that can take a Python buffer object as an argument, 6511986Sandreas.sandberg@arm.comsimply use the type ``py::buffer`` as one of its arguments. Buffers can exist 6611986Sandreas.sandberg@arm.comin a great variety of configurations, hence some safety checks are usually 6711986Sandreas.sandberg@arm.comnecessary in the function body. Below, you can see an basic example on how to 6811986Sandreas.sandberg@arm.comdefine a custom constructor for the Eigen double precision matrix 6911986Sandreas.sandberg@arm.com(``Eigen::MatrixXd``) type, which supports initialization from compatible 7011986Sandreas.sandberg@arm.combuffer objects (e.g. a NumPy matrix). 7111986Sandreas.sandberg@arm.com 7211986Sandreas.sandberg@arm.com.. code-block:: cpp 7311986Sandreas.sandberg@arm.com 7411986Sandreas.sandberg@arm.com /* Bind MatrixXd (or some other Eigen type) to Python */ 7511986Sandreas.sandberg@arm.com typedef Eigen::MatrixXd Matrix; 7611986Sandreas.sandberg@arm.com 7711986Sandreas.sandberg@arm.com typedef Matrix::Scalar Scalar; 7811986Sandreas.sandberg@arm.com constexpr bool rowMajor = Matrix::Flags & Eigen::RowMajorBit; 7911986Sandreas.sandberg@arm.com 8011986Sandreas.sandberg@arm.com py::class_<Matrix>(m, "Matrix") 8111986Sandreas.sandberg@arm.com .def("__init__", [](Matrix &m, py::buffer b) { 8211986Sandreas.sandberg@arm.com typedef Eigen::Stride<Eigen::Dynamic, Eigen::Dynamic> Strides; 8311986Sandreas.sandberg@arm.com 8411986Sandreas.sandberg@arm.com /* Request a buffer descriptor from Python */ 8511986Sandreas.sandberg@arm.com py::buffer_info info = b.request(); 8611986Sandreas.sandberg@arm.com 8711986Sandreas.sandberg@arm.com /* Some sanity checks ... */ 8811986Sandreas.sandberg@arm.com if (info.format != py::format_descriptor<Scalar>::format()) 8911986Sandreas.sandberg@arm.com throw std::runtime_error("Incompatible format: expected a double array!"); 9011986Sandreas.sandberg@arm.com 9111986Sandreas.sandberg@arm.com if (info.ndim != 2) 9211986Sandreas.sandberg@arm.com throw std::runtime_error("Incompatible buffer dimension!"); 9311986Sandreas.sandberg@arm.com 9411986Sandreas.sandberg@arm.com auto strides = Strides( 9511986Sandreas.sandberg@arm.com info.strides[rowMajor ? 0 : 1] / sizeof(Scalar), 9611986Sandreas.sandberg@arm.com info.strides[rowMajor ? 1 : 0] / sizeof(Scalar)); 9711986Sandreas.sandberg@arm.com 9811986Sandreas.sandberg@arm.com auto map = Eigen::Map<Matrix, 0, Strides>( 9911986Sandreas.sandberg@arm.com static_cat<Scalar *>(info.ptr), info.shape[0], info.shape[1], strides); 10011986Sandreas.sandberg@arm.com 10111986Sandreas.sandberg@arm.com new (&m) Matrix(map); 10211986Sandreas.sandberg@arm.com }); 10311986Sandreas.sandberg@arm.com 10411986Sandreas.sandberg@arm.comFor reference, the ``def_buffer()`` call for this Eigen data type should look 10511986Sandreas.sandberg@arm.comas follows: 10611986Sandreas.sandberg@arm.com 10711986Sandreas.sandberg@arm.com.. code-block:: cpp 10811986Sandreas.sandberg@arm.com 10911986Sandreas.sandberg@arm.com .def_buffer([](Matrix &m) -> py::buffer_info { 11011986Sandreas.sandberg@arm.com return py::buffer_info( 11111986Sandreas.sandberg@arm.com m.data(), /* Pointer to buffer */ 11211986Sandreas.sandberg@arm.com sizeof(Scalar), /* Size of one scalar */ 11311986Sandreas.sandberg@arm.com /* Python struct-style format descriptor */ 11411986Sandreas.sandberg@arm.com py::format_descriptor<Scalar>::format(), 11511986Sandreas.sandberg@arm.com /* Number of dimensions */ 11611986Sandreas.sandberg@arm.com 2, 11711986Sandreas.sandberg@arm.com /* Buffer dimensions */ 11811986Sandreas.sandberg@arm.com { (size_t) m.rows(), 11911986Sandreas.sandberg@arm.com (size_t) m.cols() }, 12011986Sandreas.sandberg@arm.com /* Strides (in bytes) for each index */ 12111986Sandreas.sandberg@arm.com { sizeof(Scalar) * (rowMajor ? m.cols() : 1), 12211986Sandreas.sandberg@arm.com sizeof(Scalar) * (rowMajor ? 1 : m.rows()) } 12311986Sandreas.sandberg@arm.com ); 12411986Sandreas.sandberg@arm.com }) 12511986Sandreas.sandberg@arm.com 12611986Sandreas.sandberg@arm.comFor a much easier approach of binding Eigen types (although with some 12711986Sandreas.sandberg@arm.comlimitations), refer to the section on :doc:`/advanced/cast/eigen`. 12811986Sandreas.sandberg@arm.com 12911986Sandreas.sandberg@arm.com.. seealso:: 13011986Sandreas.sandberg@arm.com 13111986Sandreas.sandberg@arm.com The file :file:`tests/test_buffers.cpp` contains a complete example 13211986Sandreas.sandberg@arm.com that demonstrates using the buffer protocol with pybind11 in more detail. 13311986Sandreas.sandberg@arm.com 13411986Sandreas.sandberg@arm.com.. [#f2] http://docs.python.org/3/c-api/buffer.html 13511986Sandreas.sandberg@arm.com 13611986Sandreas.sandberg@arm.comArrays 13711986Sandreas.sandberg@arm.com====== 13811986Sandreas.sandberg@arm.com 13911986Sandreas.sandberg@arm.comBy exchanging ``py::buffer`` with ``py::array`` in the above snippet, we can 14011986Sandreas.sandberg@arm.comrestrict the function so that it only accepts NumPy arrays (rather than any 14111986Sandreas.sandberg@arm.comtype of Python object satisfying the buffer protocol). 14211986Sandreas.sandberg@arm.com 14311986Sandreas.sandberg@arm.comIn many situations, we want to define a function which only accepts a NumPy 14411986Sandreas.sandberg@arm.comarray of a certain data type. This is possible via the ``py::array_t<T>`` 14511986Sandreas.sandberg@arm.comtemplate. For instance, the following function requires the argument to be a 14611986Sandreas.sandberg@arm.comNumPy array containing double precision values. 14711986Sandreas.sandberg@arm.com 14811986Sandreas.sandberg@arm.com.. code-block:: cpp 14911986Sandreas.sandberg@arm.com 15011986Sandreas.sandberg@arm.com void f(py::array_t<double> array); 15111986Sandreas.sandberg@arm.com 15211986Sandreas.sandberg@arm.comWhen it is invoked with a different type (e.g. an integer or a list of 15311986Sandreas.sandberg@arm.comintegers), the binding code will attempt to cast the input into a NumPy array 15411986Sandreas.sandberg@arm.comof the requested type. Note that this feature requires the 15511986Sandreas.sandberg@arm.com:file:``pybind11/numpy.h`` header to be included. 15611986Sandreas.sandberg@arm.com 15711986Sandreas.sandberg@arm.comData in NumPy arrays is not guaranteed to packed in a dense manner; 15811986Sandreas.sandberg@arm.comfurthermore, entries can be separated by arbitrary column and row strides. 15911986Sandreas.sandberg@arm.comSometimes, it can be useful to require a function to only accept dense arrays 16011986Sandreas.sandberg@arm.comusing either the C (row-major) or Fortran (column-major) ordering. This can be 16111986Sandreas.sandberg@arm.comaccomplished via a second template argument with values ``py::array::c_style`` 16211986Sandreas.sandberg@arm.comor ``py::array::f_style``. 16311986Sandreas.sandberg@arm.com 16411986Sandreas.sandberg@arm.com.. code-block:: cpp 16511986Sandreas.sandberg@arm.com 16611986Sandreas.sandberg@arm.com void f(py::array_t<double, py::array::c_style | py::array::forcecast> array); 16711986Sandreas.sandberg@arm.com 16811986Sandreas.sandberg@arm.comThe ``py::array::forcecast`` argument is the default value of the second 16911986Sandreas.sandberg@arm.comtemplate parameter, and it ensures that non-conforming arguments are converted 17011986Sandreas.sandberg@arm.cominto an array satisfying the specified requirements instead of trying the next 17111986Sandreas.sandberg@arm.comfunction overload. 17211986Sandreas.sandberg@arm.com 17311986Sandreas.sandberg@arm.comStructured types 17411986Sandreas.sandberg@arm.com================ 17511986Sandreas.sandberg@arm.com 17611986Sandreas.sandberg@arm.comIn order for ``py::array_t`` to work with structured (record) types, we first need 17711986Sandreas.sandberg@arm.comto register the memory layout of the type. This can be done via ``PYBIND11_NUMPY_DTYPE`` 17811986Sandreas.sandberg@arm.commacro which expects the type followed by field names: 17911986Sandreas.sandberg@arm.com 18011986Sandreas.sandberg@arm.com.. code-block:: cpp 18111986Sandreas.sandberg@arm.com 18211986Sandreas.sandberg@arm.com struct A { 18311986Sandreas.sandberg@arm.com int x; 18411986Sandreas.sandberg@arm.com double y; 18511986Sandreas.sandberg@arm.com }; 18611986Sandreas.sandberg@arm.com 18711986Sandreas.sandberg@arm.com struct B { 18811986Sandreas.sandberg@arm.com int z; 18911986Sandreas.sandberg@arm.com A a; 19011986Sandreas.sandberg@arm.com }; 19111986Sandreas.sandberg@arm.com 19211986Sandreas.sandberg@arm.com PYBIND11_NUMPY_DTYPE(A, x, y); 19311986Sandreas.sandberg@arm.com PYBIND11_NUMPY_DTYPE(B, z, a); 19411986Sandreas.sandberg@arm.com 19511986Sandreas.sandberg@arm.com /* now both A and B can be used as template arguments to py::array_t */ 19611986Sandreas.sandberg@arm.com 19711986Sandreas.sandberg@arm.comVectorizing functions 19811986Sandreas.sandberg@arm.com===================== 19911986Sandreas.sandberg@arm.com 20011986Sandreas.sandberg@arm.comSuppose we want to bind a function with the following signature to Python so 20111986Sandreas.sandberg@arm.comthat it can process arbitrary NumPy array arguments (vectors, matrices, general 20211986Sandreas.sandberg@arm.comN-D arrays) in addition to its normal arguments: 20311986Sandreas.sandberg@arm.com 20411986Sandreas.sandberg@arm.com.. code-block:: cpp 20511986Sandreas.sandberg@arm.com 20611986Sandreas.sandberg@arm.com double my_func(int x, float y, double z); 20711986Sandreas.sandberg@arm.com 20811986Sandreas.sandberg@arm.comAfter including the ``pybind11/numpy.h`` header, this is extremely simple: 20911986Sandreas.sandberg@arm.com 21011986Sandreas.sandberg@arm.com.. code-block:: cpp 21111986Sandreas.sandberg@arm.com 21211986Sandreas.sandberg@arm.com m.def("vectorized_func", py::vectorize(my_func)); 21311986Sandreas.sandberg@arm.com 21411986Sandreas.sandberg@arm.comInvoking the function like below causes 4 calls to be made to ``my_func`` with 21511986Sandreas.sandberg@arm.comeach of the array elements. The significant advantage of this compared to 21611986Sandreas.sandberg@arm.comsolutions like ``numpy.vectorize()`` is that the loop over the elements runs 21711986Sandreas.sandberg@arm.comentirely on the C++ side and can be crunched down into a tight, optimized loop 21811986Sandreas.sandberg@arm.comby the compiler. The result is returned as a NumPy array of type 21911986Sandreas.sandberg@arm.com``numpy.dtype.float64``. 22011986Sandreas.sandberg@arm.com 22111986Sandreas.sandberg@arm.com.. code-block:: pycon 22211986Sandreas.sandberg@arm.com 22311986Sandreas.sandberg@arm.com >>> x = np.array([[1, 3],[5, 7]]) 22411986Sandreas.sandberg@arm.com >>> y = np.array([[2, 4],[6, 8]]) 22511986Sandreas.sandberg@arm.com >>> z = 3 22611986Sandreas.sandberg@arm.com >>> result = vectorized_func(x, y, z) 22711986Sandreas.sandberg@arm.com 22811986Sandreas.sandberg@arm.comThe scalar argument ``z`` is transparently replicated 4 times. The input 22911986Sandreas.sandberg@arm.comarrays ``x`` and ``y`` are automatically converted into the right types (they 23011986Sandreas.sandberg@arm.comare of type ``numpy.dtype.int64`` but need to be ``numpy.dtype.int32`` and 23111986Sandreas.sandberg@arm.com``numpy.dtype.float32``, respectively) 23211986Sandreas.sandberg@arm.com 23311986Sandreas.sandberg@arm.comSometimes we might want to explicitly exclude an argument from the vectorization 23411986Sandreas.sandberg@arm.combecause it makes little sense to wrap it in a NumPy array. For instance, 23511986Sandreas.sandberg@arm.comsuppose the function signature was 23611986Sandreas.sandberg@arm.com 23711986Sandreas.sandberg@arm.com.. code-block:: cpp 23811986Sandreas.sandberg@arm.com 23911986Sandreas.sandberg@arm.com double my_func(int x, float y, my_custom_type *z); 24011986Sandreas.sandberg@arm.com 24111986Sandreas.sandberg@arm.comThis can be done with a stateful Lambda closure: 24211986Sandreas.sandberg@arm.com 24311986Sandreas.sandberg@arm.com.. code-block:: cpp 24411986Sandreas.sandberg@arm.com 24511986Sandreas.sandberg@arm.com // Vectorize a lambda function with a capture object (e.g. to exclude some arguments from the vectorization) 24611986Sandreas.sandberg@arm.com m.def("vectorized_func", 24711986Sandreas.sandberg@arm.com [](py::array_t<int> x, py::array_t<float> y, my_custom_type *z) { 24811986Sandreas.sandberg@arm.com auto stateful_closure = [z](int x, float y) { return my_func(x, y, z); }; 24911986Sandreas.sandberg@arm.com return py::vectorize(stateful_closure)(x, y); 25011986Sandreas.sandberg@arm.com } 25111986Sandreas.sandberg@arm.com ); 25211986Sandreas.sandberg@arm.com 25311986Sandreas.sandberg@arm.comIn cases where the computation is too complicated to be reduced to 25411986Sandreas.sandberg@arm.com``vectorize``, it will be necessary to create and access the buffer contents 25511986Sandreas.sandberg@arm.commanually. The following snippet contains a complete example that shows how this 25611986Sandreas.sandberg@arm.comworks (the code is somewhat contrived, since it could have been done more 25711986Sandreas.sandberg@arm.comsimply using ``vectorize``). 25811986Sandreas.sandberg@arm.com 25911986Sandreas.sandberg@arm.com.. code-block:: cpp 26011986Sandreas.sandberg@arm.com 26111986Sandreas.sandberg@arm.com #include <pybind11/pybind11.h> 26211986Sandreas.sandberg@arm.com #include <pybind11/numpy.h> 26311986Sandreas.sandberg@arm.com 26411986Sandreas.sandberg@arm.com namespace py = pybind11; 26511986Sandreas.sandberg@arm.com 26611986Sandreas.sandberg@arm.com py::array_t<double> add_arrays(py::array_t<double> input1, py::array_t<double> input2) { 26711986Sandreas.sandberg@arm.com auto buf1 = input1.request(), buf2 = input2.request(); 26811986Sandreas.sandberg@arm.com 26911986Sandreas.sandberg@arm.com if (buf1.ndim != 1 || buf2.ndim != 1) 27011986Sandreas.sandberg@arm.com throw std::runtime_error("Number of dimensions must be one"); 27111986Sandreas.sandberg@arm.com 27211986Sandreas.sandberg@arm.com if (buf1.size != buf2.size) 27311986Sandreas.sandberg@arm.com throw std::runtime_error("Input shapes must match"); 27411986Sandreas.sandberg@arm.com 27511986Sandreas.sandberg@arm.com /* No pointer is passed, so NumPy will allocate the buffer */ 27611986Sandreas.sandberg@arm.com auto result = py::array_t<double>(buf1.size); 27711986Sandreas.sandberg@arm.com 27811986Sandreas.sandberg@arm.com auto buf3 = result.request(); 27911986Sandreas.sandberg@arm.com 28011986Sandreas.sandberg@arm.com double *ptr1 = (double *) buf1.ptr, 28111986Sandreas.sandberg@arm.com *ptr2 = (double *) buf2.ptr, 28211986Sandreas.sandberg@arm.com *ptr3 = (double *) buf3.ptr; 28311986Sandreas.sandberg@arm.com 28411986Sandreas.sandberg@arm.com for (size_t idx = 0; idx < buf1.shape[0]; idx++) 28511986Sandreas.sandberg@arm.com ptr3[idx] = ptr1[idx] + ptr2[idx]; 28611986Sandreas.sandberg@arm.com 28711986Sandreas.sandberg@arm.com return result; 28811986Sandreas.sandberg@arm.com } 28911986Sandreas.sandberg@arm.com 29011986Sandreas.sandberg@arm.com PYBIND11_PLUGIN(test) { 29111986Sandreas.sandberg@arm.com py::module m("test"); 29211986Sandreas.sandberg@arm.com m.def("add_arrays", &add_arrays, "Add two NumPy arrays"); 29311986Sandreas.sandberg@arm.com return m.ptr(); 29411986Sandreas.sandberg@arm.com } 29511986Sandreas.sandberg@arm.com 29611986Sandreas.sandberg@arm.com.. seealso:: 29711986Sandreas.sandberg@arm.com 29811986Sandreas.sandberg@arm.com The file :file:`tests/test_numpy_vectorize.cpp` contains a complete 29911986Sandreas.sandberg@arm.com example that demonstrates using :func:`vectorize` in more detail. 300