numpy.rst revision 11986:c12e4625ab56
12889Sbinkertn@umich.edu.. _numpy:
22889Sbinkertn@umich.edu
32889Sbinkertn@umich.eduNumPy
42889Sbinkertn@umich.edu#####
52889Sbinkertn@umich.edu
62889Sbinkertn@umich.eduBuffer protocol
72889Sbinkertn@umich.edu===============
82889Sbinkertn@umich.edu
92889Sbinkertn@umich.eduPython supports an extremely general and convenient approach for exchanging
102889Sbinkertn@umich.edudata between plugin libraries. Types can expose a buffer view [#f2]_, which
112889Sbinkertn@umich.eduprovides fast direct access to the raw internal data representation. Suppose we
122889Sbinkertn@umich.eduwant to bind the following simplistic Matrix class:
132889Sbinkertn@umich.edu
142889Sbinkertn@umich.edu.. code-block:: cpp
152889Sbinkertn@umich.edu
162889Sbinkertn@umich.edu    class Matrix {
172889Sbinkertn@umich.edu    public:
182889Sbinkertn@umich.edu        Matrix(size_t rows, size_t cols) : m_rows(rows), m_cols(cols) {
192889Sbinkertn@umich.edu            m_data = new float[rows*cols];
202889Sbinkertn@umich.edu        }
212889Sbinkertn@umich.edu        float *data() { return m_data; }
222889Sbinkertn@umich.edu        size_t rows() const { return m_rows; }
232889Sbinkertn@umich.edu        size_t cols() const { return m_cols; }
242889Sbinkertn@umich.edu    private:
252889Sbinkertn@umich.edu        size_t m_rows, m_cols;
262889Sbinkertn@umich.edu        float *m_data;
272889Sbinkertn@umich.edu    };
282889Sbinkertn@umich.edu
294850Snate@binkert.orgThe following binding code exposes the ``Matrix`` contents as a buffer object,
304850Snate@binkert.orgmaking it possible to cast Matrices into NumPy arrays. It is even possible to
314850Snate@binkert.orgcompletely avoid copy operations with Python expressions like
324850Snate@binkert.org``np.array(matrix_instance, copy = False)``.
334850Snate@binkert.org
344850Snate@binkert.org.. code-block:: cpp
352889Sbinkertn@umich.edu
362889Sbinkertn@umich.edu    py::class_<Matrix>(m, "Matrix")
378327Sgblack@eecs.umich.edu       .def_buffer([](Matrix &m) -> py::buffer_info {
385470Snate@binkert.org            return py::buffer_info(
398333Snate@binkert.org                m.data(),                               /* Pointer to buffer */
408333Snate@binkert.org                sizeof(float),                          /* Size of one scalar */
412889Sbinkertn@umich.edu                py::format_descriptor<float>::format(), /* Python struct-style format descriptor */
428234Snate@binkert.org                2,                                      /* Number of dimensions */
438234Snate@binkert.org                { m.rows(), m.cols() },                 /* Buffer dimensions */
448234Snate@binkert.org                { sizeof(float) * m.rows(),             /* Strides (in bytes) for each index */
452889Sbinkertn@umich.edu                  sizeof(float) }
468234Snate@binkert.org            );
478234Snate@binkert.org        });
488234Snate@binkert.org
498234Snate@binkert.orgThe snippet above binds a lambda function, which can create ``py::buffer_info``
502889Sbinkertn@umich.edudescription records on demand describing a given matrix. The contents of
518234Snate@binkert.org``py::buffer_info`` mirror the Python buffer protocol specification.
528234Snate@binkert.org
538234Snate@binkert.org.. code-block:: cpp
548234Snate@binkert.org
558234Snate@binkert.org    struct buffer_info {
568234Snate@binkert.org        void *ptr;
578234Snate@binkert.org        size_t itemsize;
582889Sbinkertn@umich.edu        std::string format;
598234Snate@binkert.org        int ndim;
608234Snate@binkert.org        std::vector<size_t> shape;
618234Snate@binkert.org        std::vector<size_t> strides;
628234Snate@binkert.org    };
638234Snate@binkert.org
648234Snate@binkert.orgTo create a C++ function that can take a Python buffer object as an argument,
658234Snate@binkert.orgsimply use the type ``py::buffer`` as one of its arguments. Buffers can exist
668234Snate@binkert.orgin a great variety of configurations, hence some safety checks are usually
678234Snate@binkert.orgnecessary in the function body. Below, you can see an basic example on how to
688234Snate@binkert.orgdefine a custom constructor for the Eigen double precision matrix
698234Snate@binkert.org(``Eigen::MatrixXd``) type, which supports initialization from compatible
708234Snate@binkert.orgbuffer objects (e.g. a NumPy matrix).
718234Snate@binkert.org
728234Snate@binkert.org.. code-block:: cpp
738234Snate@binkert.org
748234Snate@binkert.org    /* Bind MatrixXd (or some other Eigen type) to Python */
758234Snate@binkert.org    typedef Eigen::MatrixXd Matrix;
768234Snate@binkert.org
778234Snate@binkert.org    typedef Matrix::Scalar Scalar;
788234Snate@binkert.org    constexpr bool rowMajor = Matrix::Flags & Eigen::RowMajorBit;
798234Snate@binkert.org
802889Sbinkertn@umich.edu    py::class_<Matrix>(m, "Matrix")
818234Snate@binkert.org        .def("__init__", [](Matrix &m, py::buffer b) {
828234Snate@binkert.org            typedef Eigen::Stride<Eigen::Dynamic, Eigen::Dynamic> Strides;
838234Snate@binkert.org
848234Snate@binkert.org            /* Request a buffer descriptor from Python */
855773Snate@binkert.org            py::buffer_info info = b.request();
868234Snate@binkert.org
878234Snate@binkert.org            /* Some sanity checks ... */
888234Snate@binkert.org            if (info.format != py::format_descriptor<Scalar>::format())
898234Snate@binkert.org                throw std::runtime_error("Incompatible format: expected a double array!");
908664SAli.Saidi@ARM.com
918664SAli.Saidi@ARM.com            if (info.ndim != 2)
928998Suri.wiener@arm.com                throw std::runtime_error("Incompatible buffer dimension!");
938998Suri.wiener@arm.com
942889Sbinkertn@umich.edu            auto strides = Strides(
958234Snate@binkert.org                info.strides[rowMajor ? 0 : 1] / sizeof(Scalar),
968234Snate@binkert.org                info.strides[rowMajor ? 1 : 0] / sizeof(Scalar));
978234Snate@binkert.org
989960Sandreas.hansson@arm.com            auto map = Eigen::Map<Matrix, 0, Strides>(
998234Snate@binkert.org                static_cat<Scalar *>(info.ptr), info.shape[0], info.shape[1], strides);
1009960Sandreas.hansson@arm.com
1018234Snate@binkert.org            new (&m) Matrix(map);
1029960Sandreas.hansson@arm.com        });
1039960Sandreas.hansson@arm.com
1049960Sandreas.hansson@arm.comFor reference, the ``def_buffer()`` call for this Eigen data type should look
1059960Sandreas.hansson@arm.comas follows:
1069960Sandreas.hansson@arm.com
1079960Sandreas.hansson@arm.com.. code-block:: cpp
1089960Sandreas.hansson@arm.com
1098234Snate@binkert.org    .def_buffer([](Matrix &m) -> py::buffer_info {
1108234Snate@binkert.org        return py::buffer_info(
1112889Sbinkertn@umich.edu            m.data(),                /* Pointer to buffer */
1128234Snate@binkert.org            sizeof(Scalar),          /* Size of one scalar */
1138234Snate@binkert.org            /* Python struct-style format descriptor */
1148234Snate@binkert.org            py::format_descriptor<Scalar>::format(),
1158234Snate@binkert.org            /* Number of dimensions */
1166171Snate@binkert.org            2,
1178234Snate@binkert.org            /* Buffer dimensions */
1188234Snate@binkert.org            { (size_t) m.rows(),
1198234Snate@binkert.org              (size_t) m.cols() },
1208234Snate@binkert.org            /* Strides (in bytes) for each index */
1218234Snate@binkert.org            { sizeof(Scalar) * (rowMajor ? m.cols() : 1),
1228234Snate@binkert.org              sizeof(Scalar) * (rowMajor ? 1 : m.rows()) }
1238234Snate@binkert.org        );
1248234Snate@binkert.org     })
1258234Snate@binkert.org
1266171Snate@binkert.orgFor a much easier approach of binding Eigen types (although with some
1278219Snate@binkert.orglimitations), refer to the section on :doc:`/advanced/cast/eigen`.
1288327Sgblack@eecs.umich.edu
1299512Sandreas@sandberg.pp.se.. seealso::
1309512Sandreas@sandberg.pp.se
1319512Sandreas@sandberg.pp.se    The file :file:`tests/test_buffers.cpp` contains a complete example
1329512Sandreas@sandberg.pp.se    that demonstrates using the buffer protocol with pybind11 in more detail.
1339512Sandreas@sandberg.pp.se
1349512Sandreas@sandberg.pp.se.. [#f2] http://docs.python.org/3/c-api/buffer.html
1358219Snate@binkert.org
1368219Snate@binkert.orgArrays
1379512Sandreas@sandberg.pp.se======
1389512Sandreas@sandberg.pp.se
1399512Sandreas@sandberg.pp.seBy exchanging ``py::buffer`` with ``py::array`` in the above snippet, we can
1409512Sandreas@sandberg.pp.serestrict the function so that it only accepts NumPy arrays (rather than any
1419512Sandreas@sandberg.pp.setype of Python object satisfying the buffer protocol).
1429512Sandreas@sandberg.pp.se
1439512Sandreas@sandberg.pp.seIn many situations, we want to define a function which only accepts a NumPy
1449512Sandreas@sandberg.pp.searray of a certain data type. This is possible via the ``py::array_t<T>``
1459512Sandreas@sandberg.pp.setemplate. For instance, the following function requires the argument to be a
1469512Sandreas@sandberg.pp.seNumPy array containing double precision values.
1479512Sandreas@sandberg.pp.se
1489512Sandreas@sandberg.pp.se.. code-block:: cpp
1499512Sandreas@sandberg.pp.se
1509512Sandreas@sandberg.pp.se    void f(py::array_t<double> array);
1519512Sandreas@sandberg.pp.se
1529512Sandreas@sandberg.pp.seWhen it is invoked with a different type (e.g. an integer or a list of
1539512Sandreas@sandberg.pp.seintegers), the binding code will attempt to cast the input into a NumPy array
1549512Sandreas@sandberg.pp.seof the requested type. Note that this feature requires the
1559512Sandreas@sandberg.pp.se:file:``pybind11/numpy.h`` header to be included.
1569512Sandreas@sandberg.pp.se
1579512Sandreas@sandberg.pp.seData in NumPy arrays is not guaranteed to packed in a dense manner;
1589512Sandreas@sandberg.pp.sefurthermore, entries can be separated by arbitrary column and row strides.
1598219Snate@binkert.orgSometimes, it can be useful to require a function to only accept dense arrays
1609512Sandreas@sandberg.pp.seusing either the C (row-major) or Fortran (column-major) ordering. This can be
1619512Sandreas@sandberg.pp.seaccomplished via a second template argument with values ``py::array::c_style``
1629512Sandreas@sandberg.pp.seor ``py::array::f_style``.
1638219Snate@binkert.org
1648219Snate@binkert.org.. code-block:: cpp
1658234Snate@binkert.org
1668245Snate@binkert.org    void f(py::array_t<double, py::array::c_style | py::array::forcecast> array);
1678245Snate@binkert.org
1685801Snate@binkert.orgThe ``py::array::forcecast`` argument is the default value of the second
1695801Snate@binkert.orgtemplate parameter, and it ensures that non-conforming arguments are converted
1705801Snate@binkert.orginto an array satisfying the specified requirements instead of trying the next
1714167Sbinkertn@umich.edufunction overload.
1724042Sbinkertn@umich.edu
1735801Snate@binkert.orgStructured types
1745799Snate@binkert.org================
1755799Snate@binkert.org
1768234Snate@binkert.orgIn order for ``py::array_t`` to work with structured (record) types, we first need
1778234Snate@binkert.orgto register the memory layout of the type. This can be done via ``PYBIND11_NUMPY_DTYPE``
1788234Snate@binkert.orgmacro which expects the type followed by field names:
1798234Snate@binkert.org
1808234Snate@binkert.org.. code-block:: cpp
1818234Snate@binkert.org
1828234Snate@binkert.org    struct A {
1838234Snate@binkert.org        int x;
1848234Snate@binkert.org        double y;
1858245Snate@binkert.org    };
1868245Snate@binkert.org
1875799Snate@binkert.org    struct B {
1885799Snate@binkert.org        int z;
1895799Snate@binkert.org        A a;
1905799Snate@binkert.org    };
1915802Snate@binkert.org
1922889Sbinkertn@umich.edu    PYBIND11_NUMPY_DTYPE(A, x, y);
1939983Sstever@gmail.com    PYBIND11_NUMPY_DTYPE(B, z, a);
1949983Sstever@gmail.com
1959983Sstever@gmail.com    /* now both A and B can be used as template arguments to py::array_t */
1969983Sstever@gmail.com
1975524Sstever@gmail.comVectorizing functions
1985524Sstever@gmail.com=====================
1995524Sstever@gmail.com
2005524Sstever@gmail.comSuppose we want to bind a function with the following signature to Python so
2015524Sstever@gmail.comthat it can process arbitrary NumPy array arguments (vectors, matrices, general
2025524Sstever@gmail.comN-D arrays) in addition to its normal arguments:
2035524Sstever@gmail.com
2045524Sstever@gmail.com.. code-block:: cpp
2055524Sstever@gmail.com
2065524Sstever@gmail.com    double my_func(int x, float y, double z);
2075524Sstever@gmail.com
2085524Sstever@gmail.comAfter including the ``pybind11/numpy.h`` header, this is extremely simple:
2095524Sstever@gmail.com
2105524Sstever@gmail.com.. code-block:: cpp
2115524Sstever@gmail.com
2125524Sstever@gmail.com    m.def("vectorized_func", py::vectorize(my_func));
2135524Sstever@gmail.com
2145524Sstever@gmail.comInvoking the function like below causes 4 calls to be made to ``my_func`` with
2155524Sstever@gmail.comeach of the array elements. The significant advantage of this compared to
2165524Sstever@gmail.comsolutions like ``numpy.vectorize()`` is that the loop over the elements runs
2175524Sstever@gmail.comentirely on the C++ side and can be crunched down into a tight, optimized loop
2185524Sstever@gmail.comby the compiler. The result is returned as a NumPy array of type
2195524Sstever@gmail.com``numpy.dtype.float64``.
2205524Sstever@gmail.com
2215524Sstever@gmail.com.. code-block:: pycon
2225524Sstever@gmail.com
2235524Sstever@gmail.com    >>> x = np.array([[1, 3],[5, 7]])
2242889Sbinkertn@umich.edu    >>> y = np.array([[2, 4],[6, 8]])
2254850Snate@binkert.org    >>> z = 3
2264850Snate@binkert.org    >>> result = vectorized_func(x, y, z)
2274850Snate@binkert.org
2284850Snate@binkert.orgThe scalar argument ``z`` is transparently replicated 4 times.  The input
2294850Snate@binkert.orgarrays ``x`` and ``y`` are automatically converted into the right types (they
2305801Snate@binkert.orgare of type  ``numpy.dtype.int64`` but need to be ``numpy.dtype.int32`` and
2314850Snate@binkert.org``numpy.dtype.float32``, respectively)
2325801Snate@binkert.org
2334850Snate@binkert.orgSometimes we might want to explicitly exclude an argument from the vectorization
2344850Snate@binkert.orgbecause it makes little sense to wrap it in a NumPy array. For instance,
2355801Snate@binkert.orgsuppose the function signature was
2364850Snate@binkert.org
2374850Snate@binkert.org.. code-block:: cpp
2384850Snate@binkert.org
2392889Sbinkertn@umich.edu    double my_func(int x, float y, my_custom_type *z);
2402889Sbinkertn@umich.edu
2418333Snate@binkert.orgThis can be done with a stateful Lambda closure:
2422889Sbinkertn@umich.edu
2432889Sbinkertn@umich.edu.. code-block:: cpp
2442889Sbinkertn@umich.edu
2452889Sbinkertn@umich.edu    // Vectorize a lambda function with a capture object (e.g. to exclude some arguments from the vectorization)
2462889Sbinkertn@umich.edu    m.def("vectorized_func",
2472889Sbinkertn@umich.edu        [](py::array_t<int> x, py::array_t<float> y, my_custom_type *z) {
2482889Sbinkertn@umich.edu            auto stateful_closure = [z](int x, float y) { return my_func(x, y, z); };
2492889Sbinkertn@umich.edu            return py::vectorize(stateful_closure)(x, y);
2502889Sbinkertn@umich.edu        }
2518232Snate@binkert.org    );
2524053Sbinkertn@umich.edu
2535799Snate@binkert.orgIn cases where the computation is too complicated to be reduced to
2548232Snate@binkert.org``vectorize``, it will be necessary to create and access the buffer contents
2554053Sbinkertn@umich.edumanually. The following snippet contains a complete example that shows how this
2565473Snate@binkert.orgworks (the code is somewhat contrived, since it could have been done more
2575473Snate@binkert.orgsimply using ``vectorize``).
2585473Snate@binkert.org
2595473Snate@binkert.org.. code-block:: cpp
2605473Snate@binkert.org
2615473Snate@binkert.org    #include <pybind11/pybind11.h>
2625473Snate@binkert.org    #include <pybind11/numpy.h>
2635473Snate@binkert.org
2645473Snate@binkert.org    namespace py = pybind11;
2655473Snate@binkert.org
2665473Snate@binkert.org    py::array_t<double> add_arrays(py::array_t<double> input1, py::array_t<double> input2) {
2675473Snate@binkert.org        auto buf1 = input1.request(), buf2 = input2.request();
2685473Snate@binkert.org
2695473Snate@binkert.org        if (buf1.ndim != 1 || buf2.ndim != 1)
2705473Snate@binkert.org            throw std::runtime_error("Number of dimensions must be one");
2715473Snate@binkert.org
2725473Snate@binkert.org        if (buf1.size != buf2.size)
2735473Snate@binkert.org            throw std::runtime_error("Input shapes must match");
2745473Snate@binkert.org
2755473Snate@binkert.org        /* No pointer is passed, so NumPy will allocate the buffer */
2765473Snate@binkert.org        auto result = py::array_t<double>(buf1.size);
2772889Sbinkertn@umich.edu
2782889Sbinkertn@umich.edu        auto buf3 = result.request();
2792889Sbinkertn@umich.edu
2805470Snate@binkert.org        double *ptr1 = (double *) buf1.ptr,
2815470Snate@binkert.org               *ptr2 = (double *) buf2.ptr,
2825470Snate@binkert.org               *ptr3 = (double *) buf3.ptr;
2835470Snate@binkert.org
2845470Snate@binkert.org        for (size_t idx = 0; idx < buf1.shape[0]; idx++)
28510134Sstan.czerniawski@arm.com            ptr3[idx] = ptr1[idx] + ptr2[idx];
2868333Snate@binkert.org
2872889Sbinkertn@umich.edu        return result;
2882889Sbinkertn@umich.edu    }
2895801Snate@binkert.org
2908327Sgblack@eecs.umich.edu    PYBIND11_PLUGIN(test) {
2915456Ssaidi@eecs.umich.edu        py::module m("test");
2928327Sgblack@eecs.umich.edu        m.def("add_arrays", &add_arrays, "Add two NumPy arrays");
2938327Sgblack@eecs.umich.edu        return m.ptr();
2948327Sgblack@eecs.umich.edu    }
2955528Sstever@gmail.com
29610758Ssteve.reinhardt@amd.com.. seealso::
29710758Ssteve.reinhardt@amd.com
29810758Ssteve.reinhardt@amd.com    The file :file:`tests/test_numpy_vectorize.cpp` contains a complete
2992967Sktlim@umich.edu    example that demonstrates using :func:`vectorize` in more detail.
3002889Sbinkertn@umich.edu