numpy.rst revision 12391
12740SN/A.. _numpy:
27534Ssteve.reinhardt@amd.com
31046SN/ANumPy
41046SN/A#####
51046SN/A
61046SN/ABuffer protocol
71046SN/A===============
81046SN/A
91046SN/APython supports an extremely general and convenient approach for exchanging
101046SN/Adata between plugin libraries. Types can expose a buffer view [#f2]_, which
111046SN/Aprovides fast direct access to the raw internal data representation. Suppose we
121046SN/Awant to bind the following simplistic Matrix class:
131046SN/A
141046SN/A.. code-block:: cpp
151046SN/A
161046SN/A    class Matrix {
171046SN/A    public:
181046SN/A        Matrix(size_t rows, size_t cols) : m_rows(rows), m_cols(cols) {
191046SN/A            m_data = new float[rows*cols];
201046SN/A        }
211046SN/A        float *data() { return m_data; }
221046SN/A        size_t rows() const { return m_rows; }
231046SN/A        size_t cols() const { return m_cols; }
241046SN/A    private:
251046SN/A        size_t m_rows, m_cols;
261046SN/A        float *m_data;
272665SN/A    };
282665SN/A
292665SN/AThe following binding code exposes the ``Matrix`` contents as a buffer object,
301046SN/Amaking it possible to cast Matrices into NumPy arrays. It is even possible to
315766Snate@binkert.orgcompletely avoid copy operations with Python expressions like
328331Ssteve.reinhardt@amd.com``np.array(matrix_instance, copy = False)``.
331438SN/A
346654Snate@binkert.org.. code-block:: cpp
356654Snate@binkert.org
366654Snate@binkert.org    py::class_<Matrix>(m, "Matrix", py::buffer_protocol())
376654Snate@binkert.org       .def_buffer([](Matrix &m) -> py::buffer_info {
386654Snate@binkert.org            return py::buffer_info(
394762Snate@binkert.org                m.data(),                               /* Pointer to buffer */
406654Snate@binkert.org                sizeof(float),                          /* Size of one scalar */
413102Sstever@eecs.umich.edu                py::format_descriptor<float>::format(), /* Python struct-style format descriptor */
423102Sstever@eecs.umich.edu                2,                                      /* Number of dimensions */
433102Sstever@eecs.umich.edu                { m.rows(), m.cols() },                 /* Buffer dimensions */
443102Sstever@eecs.umich.edu                { sizeof(float) * m.rows(),             /* Strides (in bytes) for each index */
456654Snate@binkert.org                  sizeof(float) }
463102Sstever@eecs.umich.edu            );
473102Sstever@eecs.umich.edu        });
487528Ssteve.reinhardt@amd.com
497528Ssteve.reinhardt@amd.comSupporting the buffer protocol in a new type involves specifying the special
503102Sstever@eecs.umich.edu``py::buffer_protocol()`` tag in the ``py::class_`` constructor and calling the
516654Snate@binkert.org``def_buffer()`` method with a lambda function that creates a
526654Snate@binkert.org``py::buffer_info`` description record on demand describing a given matrix
53679SN/Ainstance. The contents of ``py::buffer_info`` mirror the Python buffer protocol
54679SN/Aspecification.
55679SN/A
56679SN/A.. code-block:: cpp
57679SN/A
58679SN/A    struct buffer_info {
591692SN/A        void *ptr;
60679SN/A        ssize_t itemsize;
61679SN/A        std::string format;
62679SN/A        ssize_t ndim;
63679SN/A        std::vector<ssize_t> shape;
64679SN/A        std::vector<ssize_t> strides;
65679SN/A    };
66679SN/A
67679SN/ATo create a C++ function that can take a Python buffer object as an argument,
68679SN/Asimply use the type ``py::buffer`` as one of its arguments. Buffers can exist
69679SN/Ain a great variety of configurations, hence some safety checks are usually
70679SN/Anecessary in the function body. Below, you can see an basic example on how to
71679SN/Adefine a custom constructor for the Eigen double precision matrix
72679SN/A(``Eigen::MatrixXd``) type, which supports initialization from compatible
73679SN/Abuffer objects (e.g. a NumPy matrix).
741692SN/A
75679SN/A.. code-block:: cpp
76679SN/A
77679SN/A    /* Bind MatrixXd (or some other Eigen type) to Python */
78679SN/A    typedef Eigen::MatrixXd Matrix;
79679SN/A
80679SN/A    typedef Matrix::Scalar Scalar;
81679SN/A    constexpr bool rowMajor = Matrix::Flags & Eigen::RowMajorBit;
82679SN/A
83679SN/A    py::class_<Matrix>(m, "Matrix", py::buffer_protocol())
84679SN/A        .def("__init__", [](Matrix &m, py::buffer b) {
85679SN/A            typedef Eigen::Stride<Eigen::Dynamic, Eigen::Dynamic> Strides;
86679SN/A
87679SN/A            /* Request a buffer descriptor from Python */
88679SN/A            py::buffer_info info = b.request();
89679SN/A
902740SN/A            /* Some sanity checks ... */
91679SN/A            if (info.format != py::format_descriptor<Scalar>::format())
92679SN/A                throw std::runtime_error("Incompatible format: expected a double array!");
93679SN/A
944762Snate@binkert.org            if (info.ndim != 2)
954762Snate@binkert.org                throw std::runtime_error("Incompatible buffer dimension!");
964762Snate@binkert.org
972738SN/A            auto strides = Strides(
982738SN/A                info.strides[rowMajor ? 0 : 1] / (py::ssize_t)sizeof(Scalar),
992738SN/A                info.strides[rowMajor ? 1 : 0] / (py::ssize_t)sizeof(Scalar));
1007673Snate@binkert.org
1017673Snate@binkert.org            auto map = Eigen::Map<Matrix, 0, Strides>(
1028331Ssteve.reinhardt@amd.com                static_cast<Scalar *>(info.ptr), info.shape[0], info.shape[1], strides);
1038331Ssteve.reinhardt@amd.com
1047673Snate@binkert.org            new (&m) Matrix(map);
1052740SN/A        });
1062740SN/A
1072740SN/AFor reference, the ``def_buffer()`` call for this Eigen data type should look
1082740SN/Aas follows:
1091692SN/A
1101427SN/A.. code-block:: cpp
1117493Ssteve.reinhardt@amd.com
1127493Ssteve.reinhardt@amd.com    .def_buffer([](Matrix &m) -> py::buffer_info {
1137493Ssteve.reinhardt@amd.com        return py::buffer_info(
1147493Ssteve.reinhardt@amd.com            m.data(),                                /* Pointer to buffer */
1151427SN/A            sizeof(Scalar),                          /* Size of one scalar */
1167493Ssteve.reinhardt@amd.com            py::format_descriptor<Scalar>::format(), /* Python struct-style format descriptor */
117679SN/A            2,                                       /* Number of dimensions */
118679SN/A            { m.rows(), m.cols() },                  /* Buffer dimensions */
119679SN/A            { sizeof(Scalar) * (rowMajor ? m.cols() : 1),
1202740SN/A              sizeof(Scalar) * (rowMajor ? 1 : m.rows()) }
121679SN/A                                                     /* Strides (in bytes) for each index */
122679SN/A        );
1231310SN/A     })
1246654Snate@binkert.org
1254762Snate@binkert.orgFor a much easier approach of binding Eigen types (although with some
1262740SN/Alimitations), refer to the section on :doc:`/advanced/cast/eigen`.
1272740SN/A
1282740SN/A.. seealso::
1292740SN/A
1302740SN/A    The file :file:`tests/test_buffers.cpp` contains a complete example
1312740SN/A    that demonstrates using the buffer protocol with pybind11 in more detail.
1327673Snate@binkert.org
1332740SN/A.. [#f2] http://docs.python.org/3/c-api/buffer.html
1342740SN/A
1352740SN/AArrays
1362740SN/A======
1374762Snate@binkert.org
1384762Snate@binkert.orgBy exchanging ``py::buffer`` with ``py::array`` in the above snippet, we can
1392740SN/Arestrict the function so that it only accepts NumPy arrays (rather than any
1404762Snate@binkert.orgtype of Python object satisfying the buffer protocol).
1414762Snate@binkert.org
1424762Snate@binkert.orgIn many situations, we want to define a function which only accepts a NumPy
1434762Snate@binkert.orgarray of a certain data type. This is possible via the ``py::array_t<T>``
144679SN/Atemplate. For instance, the following function requires the argument to be a
1452711SN/ANumPy array containing double precision values.
146679SN/A
1472711SN/A.. code-block:: cpp
1482711SN/A
1491692SN/A    void f(py::array_t<double> array);
1501310SN/A
1511427SN/AWhen it is invoked with a different type (e.g. an integer or a list of
1522740SN/Aintegers), the binding code will attempt to cast the input into a NumPy array
1532740SN/Aof the requested type. Note that this feature requires the
1542740SN/A:file:`pybind11/numpy.h` header to be included.
1552740SN/A
1562740SN/AData in NumPy arrays is not guaranteed to packed in a dense manner;
1572740SN/Afurthermore, entries can be separated by arbitrary column and row strides.
1582740SN/ASometimes, it can be useful to require a function to only accept dense arrays
1597528Ssteve.reinhardt@amd.comusing either the C (row-major) or Fortran (column-major) ordering. This can be
1603105Sstever@eecs.umich.eduaccomplished via a second template argument with values ``py::array::c_style``
1612740SN/Aor ``py::array::f_style``.
1621310SN/A
1631692SN/A.. code-block:: cpp
1641585SN/A
1651692SN/A    void f(py::array_t<double, py::array::c_style | py::array::forcecast> array);
1661692SN/A
1671692SN/AThe ``py::array::forcecast`` argument is the default value of the second
1681692SN/Atemplate parameter, and it ensures that non-conforming arguments are converted
1691692SN/Ainto an array satisfying the specified requirements instead of trying the next
1702740SN/Afunction overload.
1712740SN/A
1722740SN/AStructured types
1732740SN/A================
1741692SN/A
1755610Snate@binkert.orgIn order for ``py::array_t`` to work with structured (record) types, we first
1761692SN/Aneed to register the memory layout of the type. This can be done via
1772740SN/A``PYBIND11_NUMPY_DTYPE`` macro, called in the plugin definition code, which
1781692SN/Aexpects the type followed by field names:
1797528Ssteve.reinhardt@amd.com
1803105Sstever@eecs.umich.edu.. code-block:: cpp
1812740SN/A
1822712SN/A    struct A {
1835610Snate@binkert.org        int x;
1845610Snate@binkert.org        double y;
1851692SN/A    };
1864762Snate@binkert.org
1874762Snate@binkert.org    struct B {
1884762Snate@binkert.org        int z;
1895610Snate@binkert.org        A a;
1904762Snate@binkert.org    };
1915610Snate@binkert.org
1924859Snate@binkert.org    // ...
1932740SN/A    PYBIND11_MODULE(test, m) {
1942740SN/A        // ...
1952740SN/A
1962740SN/A        PYBIND11_NUMPY_DTYPE(A, x, y);
1972740SN/A        PYBIND11_NUMPY_DTYPE(B, z, a);
1982740SN/A        /* now both A and B can be used as template arguments to py::array_t */
1992740SN/A    }
2002740SN/A
2011527SN/AThe structure should consist of fundamental arithmetic types, ``std::complex``,
2022740SN/Apreviously registered substructures, and arrays of any of the above. Both C++
2031585SN/Aarrays and ``std::array`` are supported. While there is a static assertion to
2041427SN/Aprevent many types of unsupported structures, it is still the user's
2052738SN/Aresponsibility to use only "plain" structures that can be safely manipulated as
2062738SN/Araw memory without violating invariants.
2073105Sstever@eecs.umich.edu
2082738SN/AVectorizing functions
2091427SN/A=====================
2101427SN/A
2111427SN/ASuppose we want to bind a function with the following signature to Python so
2121427SN/Athat it can process arbitrary NumPy array arguments (vectors, matrices, general
2131427SN/AN-D arrays) in addition to its normal arguments:
2141427SN/A
2151427SN/A.. code-block:: cpp
2161427SN/A
2171427SN/A    double my_func(int x, float y, double z);
2181427SN/A
2191427SN/AAfter including the ``pybind11/numpy.h`` header, this is extremely simple:
2201427SN/A
2217493Ssteve.reinhardt@amd.com.. code-block:: cpp
2221427SN/A
2231427SN/A    m.def("vectorized_func", py::vectorize(my_func));
2241427SN/A
2253100SN/AInvoking the function like below causes 4 calls to be made to ``my_func`` with
2263100SN/Aeach of the array elements. The significant advantage of this compared to
2273100SN/Asolutions like ``numpy.vectorize()`` is that the loop over the elements runs
2283100SN/Aentirely on the C++ side and can be crunched down into a tight, optimized loop
2293100SN/Aby the compiler. The result is returned as a NumPy array of type
2303100SN/A``numpy.dtype.float64``.
2313105Sstever@eecs.umich.edu
2323105Sstever@eecs.umich.edu.. code-block:: pycon
2333105Sstever@eecs.umich.edu
2343105Sstever@eecs.umich.edu    >>> x = np.array([[1, 3],[5, 7]])
2353105Sstever@eecs.umich.edu    >>> y = np.array([[2, 4],[6, 8]])
2368321Ssteve.reinhardt@amd.com    >>> z = 3
2373105Sstever@eecs.umich.edu    >>> result = vectorized_func(x, y, z)
2383105Sstever@eecs.umich.edu
2393105Sstever@eecs.umich.eduThe scalar argument ``z`` is transparently replicated 4 times.  The input
2403105Sstever@eecs.umich.eduarrays ``x`` and ``y`` are automatically converted into the right types (they
2413105Sstever@eecs.umich.eduare of type  ``numpy.dtype.int64`` but need to be ``numpy.dtype.int32`` and
2428321Ssteve.reinhardt@amd.com``numpy.dtype.float32``, respectively).
2438321Ssteve.reinhardt@amd.com
2448321Ssteve.reinhardt@amd.com.. note::
2458321Ssteve.reinhardt@amd.com
2468321Ssteve.reinhardt@amd.com    Only arithmetic, complex, and POD types passed by value or by ``const &``
2478321Ssteve.reinhardt@amd.com    reference are vectorized; all other arguments are passed through as-is.
2488321Ssteve.reinhardt@amd.com    Functions taking rvalue reference arguments cannot be vectorized.
2498321Ssteve.reinhardt@amd.com
2508321Ssteve.reinhardt@amd.comIn cases where the computation is too complicated to be reduced to
2518321Ssteve.reinhardt@amd.com``vectorize``, it will be necessary to create and access the buffer contents
2528321Ssteve.reinhardt@amd.commanually. The following snippet contains a complete example that shows how this
2538321Ssteve.reinhardt@amd.comworks (the code is somewhat contrived, since it could have been done more
2548321Ssteve.reinhardt@amd.comsimply using ``vectorize``).
2558321Ssteve.reinhardt@amd.com
2563105Sstever@eecs.umich.edu.. code-block:: cpp
2573105Sstever@eecs.umich.edu
2583105Sstever@eecs.umich.edu    #include <pybind11/pybind11.h>
2593105Sstever@eecs.umich.edu    #include <pybind11/numpy.h>
2603105Sstever@eecs.umich.edu
2613105Sstever@eecs.umich.edu    namespace py = pybind11;
2623105Sstever@eecs.umich.edu
2633105Sstever@eecs.umich.edu    py::array_t<double> add_arrays(py::array_t<double> input1, py::array_t<double> input2) {
2643105Sstever@eecs.umich.edu        auto buf1 = input1.request(), buf2 = input2.request();
2653105Sstever@eecs.umich.edu
2663105Sstever@eecs.umich.edu        if (buf1.ndim != 1 || buf2.ndim != 1)
2673105Sstever@eecs.umich.edu            throw std::runtime_error("Number of dimensions must be one");
2683105Sstever@eecs.umich.edu
2693105Sstever@eecs.umich.edu        if (buf1.size != buf2.size)
2703105Sstever@eecs.umich.edu            throw std::runtime_error("Input shapes must match");
2713105Sstever@eecs.umich.edu
2723105Sstever@eecs.umich.edu        /* No pointer is passed, so NumPy will allocate the buffer */
2733105Sstever@eecs.umich.edu        auto result = py::array_t<double>(buf1.size);
2743105Sstever@eecs.umich.edu
2751585SN/A        auto buf3 = result.request();
2761310SN/A
2771310SN/A        double *ptr1 = (double *) buf1.ptr,
2781310SN/A               *ptr2 = (double *) buf2.ptr,
2791310SN/A               *ptr3 = (double *) buf3.ptr;
2807673Snate@binkert.org
2811310SN/A        for (size_t idx = 0; idx < buf1.shape[0]; idx++)
2821310SN/A            ptr3[idx] = ptr1[idx] + ptr2[idx];
2831310SN/A
2841310SN/A        return result;
2851427SN/A    }
2861310SN/A
2871310SN/A    PYBIND11_MODULE(test, m) {
2882738SN/A        m.def("add_arrays", &add_arrays, "Add two NumPy arrays");
2893105Sstever@eecs.umich.edu    }
2902738SN/A
2912738SN/A.. seealso::
2922740SN/A
2932740SN/A    The file :file:`tests/test_numpy_vectorize.cpp` contains a complete
2942740SN/A    example that demonstrates using :func:`vectorize` in more detail.
2952740SN/A
2962740SN/ADirect access
2972740SN/A=============
2982740SN/A
2993105Sstever@eecs.umich.eduFor performance reasons, particularly when dealing with very large arrays, it
3001310SN/Ais often desirable to directly access array elements without internal checking
3013105Sstever@eecs.umich.eduof dimensions and bounds on every access when indices are known to be already
3023105Sstever@eecs.umich.eduvalid.  To avoid such checks, the ``array`` class and ``array_t<T>`` template
3033105Sstever@eecs.umich.educlass offer an unchecked proxy object that can be used for this unchecked
3043105Sstever@eecs.umich.eduaccess through the ``unchecked<N>`` and ``mutable_unchecked<N>`` methods,
3053105Sstever@eecs.umich.eduwhere ``N`` gives the required dimensionality of the array:
3068321Ssteve.reinhardt@amd.com
3073105Sstever@eecs.umich.edu.. code-block:: cpp
3083105Sstever@eecs.umich.edu
3093105Sstever@eecs.umich.edu    m.def("sum_3d", [](py::array_t<double> x) {
3103105Sstever@eecs.umich.edu        auto r = x.unchecked<3>(); // x must have ndim = 3; can be non-writeable
3113105Sstever@eecs.umich.edu        double sum = 0;
3121310SN/A        for (ssize_t i = 0; i < r.shape(0); i++)
3131585SN/A            for (ssize_t j = 0; j < r.shape(1); j++)
3147675Snate@binkert.org                for (ssize_t k = 0; k < r.shape(2); k++)
3157675Snate@binkert.org                    sum += r(i, j, k);
3167675Snate@binkert.org        return sum;
3177675Snate@binkert.org    });
3187675Snate@binkert.org    m.def("increment_3d", [](py::array_t<double> x) {
3197675Snate@binkert.org        auto r = x.mutable_unchecked<3>(); // Will throw if ndim != 3 or flags.writeable is false
3207675Snate@binkert.org        for (ssize_t i = 0; i < r.shape(0); i++)
3217675Snate@binkert.org            for (ssize_t j = 0; j < r.shape(1); j++)
3227675Snate@binkert.org                for (ssize_t k = 0; k < r.shape(2); k++)
3231692SN/A                    r(i, j, k) += 1.0;
3241692SN/A    }, py::arg().noconvert());
3251585SN/A
3267528Ssteve.reinhardt@amd.comTo obtain the proxy from an ``array`` object, you must specify both the data
3277528Ssteve.reinhardt@amd.comtype and number of dimensions as template arguments, such as ``auto r =
3287528Ssteve.reinhardt@amd.commyarray.mutable_unchecked<float, 2>()``.
3291585SN/A
3301585SN/AIf the number of dimensions is not known at compile time, you can omit the
3311585SN/Adimensions template parameter (i.e. calling ``arr_t.unchecked()`` or
3323100SN/A``arr.unchecked<T>()``.  This will give you a proxy object that works in the
3333100SN/Asame way, but results in less optimizable code and thus a small efficiency
3343100SN/Aloss in tight loops.
3358596Ssteve.reinhardt@amd.com
3368596Ssteve.reinhardt@amd.comNote that the returned proxy object directly references the array's data, and
3378596Ssteve.reinhardt@amd.comonly reads its shape, strides, and writeable flag when constructed.  You must
3388596Ssteve.reinhardt@amd.comtake care to ensure that the referenced array is not destroyed or reshaped for
3398596Ssteve.reinhardt@amd.comthe duration of the returned object, typically by limiting the scope of the
3408596Ssteve.reinhardt@amd.comreturned instance.
3418596Ssteve.reinhardt@amd.com
3428596Ssteve.reinhardt@amd.comThe returned proxy object supports some of the same methods as ``py::array`` so
3438596Ssteve.reinhardt@amd.comthat it can be used as a drop-in replacement for some existing, index-checked
3448596Ssteve.reinhardt@amd.comuses of ``py::array``:
3458596Ssteve.reinhardt@amd.com
3468596Ssteve.reinhardt@amd.com- ``r.ndim()`` returns the number of dimensions
3478596Ssteve.reinhardt@amd.com
3488596Ssteve.reinhardt@amd.com- ``r.data(1, 2, ...)`` and ``r.mutable_data(1, 2, ...)``` returns a pointer to
3498596Ssteve.reinhardt@amd.com  the ``const T`` or ``T`` data, respectively, at the given indices.  The
3508596Ssteve.reinhardt@amd.com  latter is only available to proxies obtained via ``a.mutable_unchecked()``.
3518596Ssteve.reinhardt@amd.com
3528596Ssteve.reinhardt@amd.com- ``itemsize()`` returns the size of an item in bytes, i.e. ``sizeof(T)``.
3538596Ssteve.reinhardt@amd.com
3548596Ssteve.reinhardt@amd.com- ``ndim()`` returns the number of dimensions.
3558596Ssteve.reinhardt@amd.com
3568596Ssteve.reinhardt@amd.com- ``shape(n)`` returns the size of dimension ``n``
3578596Ssteve.reinhardt@amd.com
3588596Ssteve.reinhardt@amd.com- ``size()`` returns the total number of elements (i.e. the product of the shapes).
3598596Ssteve.reinhardt@amd.com
3608596Ssteve.reinhardt@amd.com- ``nbytes()`` returns the number of bytes used by the referenced elements
3618596Ssteve.reinhardt@amd.com  (i.e. ``itemsize()`` times ``size()``).
3628596Ssteve.reinhardt@amd.com
3638596Ssteve.reinhardt@amd.com.. seealso::
3648596Ssteve.reinhardt@amd.com
3658596Ssteve.reinhardt@amd.com    The file :file:`tests/test_numpy_array.cpp` contains additional examples
3668596Ssteve.reinhardt@amd.com    demonstrating the use of this feature.
3678596Ssteve.reinhardt@amd.com