numpy.rst revision 14299:2fbea9df56d2
1.. _numpy:
2
3NumPy
4#####
5
6Buffer protocol
7===============
8
9Python supports an extremely general and convenient approach for exchanging
10data between plugin libraries. Types can expose a buffer view [#f2]_, which
11provides fast direct access to the raw internal data representation. Suppose we
12want to bind the following simplistic Matrix class:
13
14.. code-block:: cpp
15
16    class Matrix {
17    public:
18        Matrix(size_t rows, size_t cols) : m_rows(rows), m_cols(cols) {
19            m_data = new float[rows*cols];
20        }
21        float *data() { return m_data; }
22        size_t rows() const { return m_rows; }
23        size_t cols() const { return m_cols; }
24    private:
25        size_t m_rows, m_cols;
26        float *m_data;
27    };
28
29The following binding code exposes the ``Matrix`` contents as a buffer object,
30making it possible to cast Matrices into NumPy arrays. It is even possible to
31completely avoid copy operations with Python expressions like
32``np.array(matrix_instance, copy = False)``.
33
34.. code-block:: cpp
35
36    py::class_<Matrix>(m, "Matrix", py::buffer_protocol())
37       .def_buffer([](Matrix &m) -> py::buffer_info {
38            return py::buffer_info(
39                m.data(),                               /* Pointer to buffer */
40                sizeof(float),                          /* Size of one scalar */
41                py::format_descriptor<float>::format(), /* Python struct-style format descriptor */
42                2,                                      /* Number of dimensions */
43                { m.rows(), m.cols() },                 /* Buffer dimensions */
44                { sizeof(float) * m.cols(),             /* Strides (in bytes) for each index */
45                  sizeof(float) }
46            );
47        });
48
49Supporting the buffer protocol in a new type involves specifying the special
50``py::buffer_protocol()`` tag in the ``py::class_`` constructor and calling the
51``def_buffer()`` method with a lambda function that creates a
52``py::buffer_info`` description record on demand describing a given matrix
53instance. The contents of ``py::buffer_info`` mirror the Python buffer protocol
54specification.
55
56.. code-block:: cpp
57
58    struct buffer_info {
59        void *ptr;
60        ssize_t itemsize;
61        std::string format;
62        ssize_t ndim;
63        std::vector<ssize_t> shape;
64        std::vector<ssize_t> strides;
65    };
66
67To create a C++ function that can take a Python buffer object as an argument,
68simply use the type ``py::buffer`` as one of its arguments. Buffers can exist
69in a great variety of configurations, hence some safety checks are usually
70necessary in the function body. Below, you can see an basic example on how to
71define a custom constructor for the Eigen double precision matrix
72(``Eigen::MatrixXd``) type, which supports initialization from compatible
73buffer objects (e.g. a NumPy matrix).
74
75.. code-block:: cpp
76
77    /* Bind MatrixXd (or some other Eigen type) to Python */
78    typedef Eigen::MatrixXd Matrix;
79
80    typedef Matrix::Scalar Scalar;
81    constexpr bool rowMajor = Matrix::Flags & Eigen::RowMajorBit;
82
83    py::class_<Matrix>(m, "Matrix", py::buffer_protocol())
84        .def("__init__", [](Matrix &m, py::buffer b) {
85            typedef Eigen::Stride<Eigen::Dynamic, Eigen::Dynamic> Strides;
86
87            /* Request a buffer descriptor from Python */
88            py::buffer_info info = b.request();
89
90            /* Some sanity checks ... */
91            if (info.format != py::format_descriptor<Scalar>::format())
92                throw std::runtime_error("Incompatible format: expected a double array!");
93
94            if (info.ndim != 2)
95                throw std::runtime_error("Incompatible buffer dimension!");
96
97            auto strides = Strides(
98                info.strides[rowMajor ? 0 : 1] / (py::ssize_t)sizeof(Scalar),
99                info.strides[rowMajor ? 1 : 0] / (py::ssize_t)sizeof(Scalar));
100
101            auto map = Eigen::Map<Matrix, 0, Strides>(
102                static_cast<Scalar *>(info.ptr), info.shape[0], info.shape[1], strides);
103
104            new (&m) Matrix(map);
105        });
106
107For reference, the ``def_buffer()`` call for this Eigen data type should look
108as follows:
109
110.. code-block:: cpp
111
112    .def_buffer([](Matrix &m) -> py::buffer_info {
113        return py::buffer_info(
114            m.data(),                                /* Pointer to buffer */
115            sizeof(Scalar),                          /* Size of one scalar */
116            py::format_descriptor<Scalar>::format(), /* Python struct-style format descriptor */
117            2,                                       /* Number of dimensions */
118            { m.rows(), m.cols() },                  /* Buffer dimensions */
119            { sizeof(Scalar) * (rowMajor ? m.cols() : 1),
120              sizeof(Scalar) * (rowMajor ? 1 : m.rows()) }
121                                                     /* Strides (in bytes) for each index */
122        );
123     })
124
125For a much easier approach of binding Eigen types (although with some
126limitations), refer to the section on :doc:`/advanced/cast/eigen`.
127
128.. seealso::
129
130    The file :file:`tests/test_buffers.cpp` contains a complete example
131    that demonstrates using the buffer protocol with pybind11 in more detail.
132
133.. [#f2] http://docs.python.org/3/c-api/buffer.html
134
135Arrays
136======
137
138By exchanging ``py::buffer`` with ``py::array`` in the above snippet, we can
139restrict the function so that it only accepts NumPy arrays (rather than any
140type of Python object satisfying the buffer protocol).
141
142In many situations, we want to define a function which only accepts a NumPy
143array of a certain data type. This is possible via the ``py::array_t<T>``
144template. For instance, the following function requires the argument to be a
145NumPy array containing double precision values.
146
147.. code-block:: cpp
148
149    void f(py::array_t<double> array);
150
151When it is invoked with a different type (e.g. an integer or a list of
152integers), the binding code will attempt to cast the input into a NumPy array
153of the requested type. Note that this feature requires the
154:file:`pybind11/numpy.h` header to be included.
155
156Data in NumPy arrays is not guaranteed to packed in a dense manner;
157furthermore, entries can be separated by arbitrary column and row strides.
158Sometimes, it can be useful to require a function to only accept dense arrays
159using either the C (row-major) or Fortran (column-major) ordering. This can be
160accomplished via a second template argument with values ``py::array::c_style``
161or ``py::array::f_style``.
162
163.. code-block:: cpp
164
165    void f(py::array_t<double, py::array::c_style | py::array::forcecast> array);
166
167The ``py::array::forcecast`` argument is the default value of the second
168template parameter, and it ensures that non-conforming arguments are converted
169into an array satisfying the specified requirements instead of trying the next
170function overload.
171
172Structured types
173================
174
175In order for ``py::array_t`` to work with structured (record) types, we first
176need to register the memory layout of the type. This can be done via
177``PYBIND11_NUMPY_DTYPE`` macro, called in the plugin definition code, which
178expects the type followed by field names:
179
180.. code-block:: cpp
181
182    struct A {
183        int x;
184        double y;
185    };
186
187    struct B {
188        int z;
189        A a;
190    };
191
192    // ...
193    PYBIND11_MODULE(test, m) {
194        // ...
195
196        PYBIND11_NUMPY_DTYPE(A, x, y);
197        PYBIND11_NUMPY_DTYPE(B, z, a);
198        /* now both A and B can be used as template arguments to py::array_t */
199    }
200
201The structure should consist of fundamental arithmetic types, ``std::complex``,
202previously registered substructures, and arrays of any of the above. Both C++
203arrays and ``std::array`` are supported. While there is a static assertion to
204prevent many types of unsupported structures, it is still the user's
205responsibility to use only "plain" structures that can be safely manipulated as
206raw memory without violating invariants.
207
208Vectorizing functions
209=====================
210
211Suppose we want to bind a function with the following signature to Python so
212that it can process arbitrary NumPy array arguments (vectors, matrices, general
213N-D arrays) in addition to its normal arguments:
214
215.. code-block:: cpp
216
217    double my_func(int x, float y, double z);
218
219After including the ``pybind11/numpy.h`` header, this is extremely simple:
220
221.. code-block:: cpp
222
223    m.def("vectorized_func", py::vectorize(my_func));
224
225Invoking the function like below causes 4 calls to be made to ``my_func`` with
226each of the array elements. The significant advantage of this compared to
227solutions like ``numpy.vectorize()`` is that the loop over the elements runs
228entirely on the C++ side and can be crunched down into a tight, optimized loop
229by the compiler. The result is returned as a NumPy array of type
230``numpy.dtype.float64``.
231
232.. code-block:: pycon
233
234    >>> x = np.array([[1, 3],[5, 7]])
235    >>> y = np.array([[2, 4],[6, 8]])
236    >>> z = 3
237    >>> result = vectorized_func(x, y, z)
238
239The scalar argument ``z`` is transparently replicated 4 times.  The input
240arrays ``x`` and ``y`` are automatically converted into the right types (they
241are of type  ``numpy.dtype.int64`` but need to be ``numpy.dtype.int32`` and
242``numpy.dtype.float32``, respectively).
243
244.. note::
245
246    Only arithmetic, complex, and POD types passed by value or by ``const &``
247    reference are vectorized; all other arguments are passed through as-is.
248    Functions taking rvalue reference arguments cannot be vectorized.
249
250In cases where the computation is too complicated to be reduced to
251``vectorize``, it will be necessary to create and access the buffer contents
252manually. The following snippet contains a complete example that shows how this
253works (the code is somewhat contrived, since it could have been done more
254simply using ``vectorize``).
255
256.. code-block:: cpp
257
258    #include <pybind11/pybind11.h>
259    #include <pybind11/numpy.h>
260
261    namespace py = pybind11;
262
263    py::array_t<double> add_arrays(py::array_t<double> input1, py::array_t<double> input2) {
264        py::buffer_info buf1 = input1.request(), buf2 = input2.request();
265
266        if (buf1.ndim != 1 || buf2.ndim != 1)
267            throw std::runtime_error("Number of dimensions must be one");
268
269        if (buf1.size != buf2.size)
270            throw std::runtime_error("Input shapes must match");
271
272        /* No pointer is passed, so NumPy will allocate the buffer */
273        auto result = py::array_t<double>(buf1.size);
274
275        py::buffer_info buf3 = result.request();
276
277        double *ptr1 = (double *) buf1.ptr,
278               *ptr2 = (double *) buf2.ptr,
279               *ptr3 = (double *) buf3.ptr;
280
281        for (size_t idx = 0; idx < buf1.shape[0]; idx++)
282            ptr3[idx] = ptr1[idx] + ptr2[idx];
283
284        return result;
285    }
286
287    PYBIND11_MODULE(test, m) {
288        m.def("add_arrays", &add_arrays, "Add two NumPy arrays");
289    }
290
291.. seealso::
292
293    The file :file:`tests/test_numpy_vectorize.cpp` contains a complete
294    example that demonstrates using :func:`vectorize` in more detail.
295
296Direct access
297=============
298
299For performance reasons, particularly when dealing with very large arrays, it
300is often desirable to directly access array elements without internal checking
301of dimensions and bounds on every access when indices are known to be already
302valid.  To avoid such checks, the ``array`` class and ``array_t<T>`` template
303class offer an unchecked proxy object that can be used for this unchecked
304access through the ``unchecked<N>`` and ``mutable_unchecked<N>`` methods,
305where ``N`` gives the required dimensionality of the array:
306
307.. code-block:: cpp
308
309    m.def("sum_3d", [](py::array_t<double> x) {
310        auto r = x.unchecked<3>(); // x must have ndim = 3; can be non-writeable
311        double sum = 0;
312        for (ssize_t i = 0; i < r.shape(0); i++)
313            for (ssize_t j = 0; j < r.shape(1); j++)
314                for (ssize_t k = 0; k < r.shape(2); k++)
315                    sum += r(i, j, k);
316        return sum;
317    });
318    m.def("increment_3d", [](py::array_t<double> x) {
319        auto r = x.mutable_unchecked<3>(); // Will throw if ndim != 3 or flags.writeable is false
320        for (ssize_t i = 0; i < r.shape(0); i++)
321            for (ssize_t j = 0; j < r.shape(1); j++)
322                for (ssize_t k = 0; k < r.shape(2); k++)
323                    r(i, j, k) += 1.0;
324    }, py::arg().noconvert());
325
326To obtain the proxy from an ``array`` object, you must specify both the data
327type and number of dimensions as template arguments, such as ``auto r =
328myarray.mutable_unchecked<float, 2>()``.
329
330If the number of dimensions is not known at compile time, you can omit the
331dimensions template parameter (i.e. calling ``arr_t.unchecked()`` or
332``arr.unchecked<T>()``.  This will give you a proxy object that works in the
333same way, but results in less optimizable code and thus a small efficiency
334loss in tight loops.
335
336Note that the returned proxy object directly references the array's data, and
337only reads its shape, strides, and writeable flag when constructed.  You must
338take care to ensure that the referenced array is not destroyed or reshaped for
339the duration of the returned object, typically by limiting the scope of the
340returned instance.
341
342The returned proxy object supports some of the same methods as ``py::array`` so
343that it can be used as a drop-in replacement for some existing, index-checked
344uses of ``py::array``:
345
346- ``r.ndim()`` returns the number of dimensions
347
348- ``r.data(1, 2, ...)`` and ``r.mutable_data(1, 2, ...)``` returns a pointer to
349  the ``const T`` or ``T`` data, respectively, at the given indices.  The
350  latter is only available to proxies obtained via ``a.mutable_unchecked()``.
351
352- ``itemsize()`` returns the size of an item in bytes, i.e. ``sizeof(T)``.
353
354- ``ndim()`` returns the number of dimensions.
355
356- ``shape(n)`` returns the size of dimension ``n``
357
358- ``size()`` returns the total number of elements (i.e. the product of the shapes).
359
360- ``nbytes()`` returns the number of bytes used by the referenced elements
361  (i.e. ``itemsize()`` times ``size()``).
362
363.. seealso::
364
365    The file :file:`tests/test_numpy_array.cpp` contains additional examples
366    demonstrating the use of this feature.
367
368Ellipsis
369========
370
371Python 3 provides a convenient ``...`` ellipsis notation that is often used to
372slice multidimensional arrays. For instance, the following snippet extracts the
373middle dimensions of a tensor with the first and last index set to zero.
374
375.. code-block:: python
376
377   a = # a NumPy array
378   b = a[0, ..., 0]
379
380The function ``py::ellipsis()`` function can be used to perform the same
381operation on the C++ side:
382
383.. code-block:: cpp
384
385   py::array a = /* A NumPy array */;
386   py::array b = a[py::make_tuple(0, py::ellipsis(), 0)];
387