numpy.rst revision 12037:d28054ac6ec9
1.. _numpy:
2
3NumPy
4#####
5
6Buffer protocol
7===============
8
9Python supports an extremely general and convenient approach for exchanging
10data between plugin libraries. Types can expose a buffer view [#f2]_, which
11provides fast direct access to the raw internal data representation. Suppose we
12want to bind the following simplistic Matrix class:
13
14.. code-block:: cpp
15
16    class Matrix {
17    public:
18        Matrix(size_t rows, size_t cols) : m_rows(rows), m_cols(cols) {
19            m_data = new float[rows*cols];
20        }
21        float *data() { return m_data; }
22        size_t rows() const { return m_rows; }
23        size_t cols() const { return m_cols; }
24    private:
25        size_t m_rows, m_cols;
26        float *m_data;
27    };
28
29The following binding code exposes the ``Matrix`` contents as a buffer object,
30making it possible to cast Matrices into NumPy arrays. It is even possible to
31completely avoid copy operations with Python expressions like
32``np.array(matrix_instance, copy = False)``.
33
34.. code-block:: cpp
35
36    py::class_<Matrix>(m, "Matrix", py::buffer_protocol())
37       .def_buffer([](Matrix &m) -> py::buffer_info {
38            return py::buffer_info(
39                m.data(),                               /* Pointer to buffer */
40                sizeof(float),                          /* Size of one scalar */
41                py::format_descriptor<float>::format(), /* Python struct-style format descriptor */
42                2,                                      /* Number of dimensions */
43                { m.rows(), m.cols() },                 /* Buffer dimensions */
44                { sizeof(float) * m.rows(),             /* Strides (in bytes) for each index */
45                  sizeof(float) }
46            );
47        });
48
49Supporting the buffer protocol in a new type involves specifying the special
50``py::buffer_protocol()`` tag in the ``py::class_`` constructor and calling the
51``def_buffer()`` method with a lambda function that creates a
52``py::buffer_info`` description record on demand describing a given matrix
53instance. The contents of ``py::buffer_info`` mirror the Python buffer protocol
54specification.
55
56.. code-block:: cpp
57
58    struct buffer_info {
59        void *ptr;
60        size_t itemsize;
61        std::string format;
62        int ndim;
63        std::vector<size_t> shape;
64        std::vector<size_t> strides;
65    };
66
67To create a C++ function that can take a Python buffer object as an argument,
68simply use the type ``py::buffer`` as one of its arguments. Buffers can exist
69in a great variety of configurations, hence some safety checks are usually
70necessary in the function body. Below, you can see an basic example on how to
71define a custom constructor for the Eigen double precision matrix
72(``Eigen::MatrixXd``) type, which supports initialization from compatible
73buffer objects (e.g. a NumPy matrix).
74
75.. code-block:: cpp
76
77    /* Bind MatrixXd (or some other Eigen type) to Python */
78    typedef Eigen::MatrixXd Matrix;
79
80    typedef Matrix::Scalar Scalar;
81    constexpr bool rowMajor = Matrix::Flags & Eigen::RowMajorBit;
82
83    py::class_<Matrix>(m, "Matrix", py::buffer_protocol())
84        .def("__init__", [](Matrix &m, py::buffer b) {
85            typedef Eigen::Stride<Eigen::Dynamic, Eigen::Dynamic> Strides;
86
87            /* Request a buffer descriptor from Python */
88            py::buffer_info info = b.request();
89
90            /* Some sanity checks ... */
91            if (info.format != py::format_descriptor<Scalar>::format())
92                throw std::runtime_error("Incompatible format: expected a double array!");
93
94            if (info.ndim != 2)
95                throw std::runtime_error("Incompatible buffer dimension!");
96
97            auto strides = Strides(
98                info.strides[rowMajor ? 0 : 1] / sizeof(Scalar),
99                info.strides[rowMajor ? 1 : 0] / sizeof(Scalar));
100
101            auto map = Eigen::Map<Matrix, 0, Strides>(
102                static_cat<Scalar *>(info.ptr), info.shape[0], info.shape[1], strides);
103
104            new (&m) Matrix(map);
105        });
106
107For reference, the ``def_buffer()`` call for this Eigen data type should look
108as follows:
109
110.. code-block:: cpp
111
112    .def_buffer([](Matrix &m) -> py::buffer_info {
113        return py::buffer_info(
114            m.data(),                /* Pointer to buffer */
115            sizeof(Scalar),          /* Size of one scalar */
116            /* Python struct-style format descriptor */
117            py::format_descriptor<Scalar>::format(),
118            /* Number of dimensions */
119            2,
120            /* Buffer dimensions */
121            { (size_t) m.rows(),
122              (size_t) m.cols() },
123            /* Strides (in bytes) for each index */
124            { sizeof(Scalar) * (rowMajor ? m.cols() : 1),
125              sizeof(Scalar) * (rowMajor ? 1 : m.rows()) }
126        );
127     })
128
129For a much easier approach of binding Eigen types (although with some
130limitations), refer to the section on :doc:`/advanced/cast/eigen`.
131
132.. seealso::
133
134    The file :file:`tests/test_buffers.cpp` contains a complete example
135    that demonstrates using the buffer protocol with pybind11 in more detail.
136
137.. [#f2] http://docs.python.org/3/c-api/buffer.html
138
139Arrays
140======
141
142By exchanging ``py::buffer`` with ``py::array`` in the above snippet, we can
143restrict the function so that it only accepts NumPy arrays (rather than any
144type of Python object satisfying the buffer protocol).
145
146In many situations, we want to define a function which only accepts a NumPy
147array of a certain data type. This is possible via the ``py::array_t<T>``
148template. For instance, the following function requires the argument to be a
149NumPy array containing double precision values.
150
151.. code-block:: cpp
152
153    void f(py::array_t<double> array);
154
155When it is invoked with a different type (e.g. an integer or a list of
156integers), the binding code will attempt to cast the input into a NumPy array
157of the requested type. Note that this feature requires the
158:file:`pybind11/numpy.h` header to be included.
159
160Data in NumPy arrays is not guaranteed to packed in a dense manner;
161furthermore, entries can be separated by arbitrary column and row strides.
162Sometimes, it can be useful to require a function to only accept dense arrays
163using either the C (row-major) or Fortran (column-major) ordering. This can be
164accomplished via a second template argument with values ``py::array::c_style``
165or ``py::array::f_style``.
166
167.. code-block:: cpp
168
169    void f(py::array_t<double, py::array::c_style | py::array::forcecast> array);
170
171The ``py::array::forcecast`` argument is the default value of the second
172template parameter, and it ensures that non-conforming arguments are converted
173into an array satisfying the specified requirements instead of trying the next
174function overload.
175
176Structured types
177================
178
179In order for ``py::array_t`` to work with structured (record) types, we first
180need to register the memory layout of the type. This can be done via
181``PYBIND11_NUMPY_DTYPE`` macro, called in the plugin definition code, which
182expects the type followed by field names:
183
184.. code-block:: cpp
185
186    struct A {
187        int x;
188        double y;
189    };
190
191    struct B {
192        int z;
193        A a;
194    };
195
196    // ...
197    PYBIND11_PLUGIN(test) {
198        // ...
199
200        PYBIND11_NUMPY_DTYPE(A, x, y);
201        PYBIND11_NUMPY_DTYPE(B, z, a);
202        /* now both A and B can be used as template arguments to py::array_t */
203    }
204
205Vectorizing functions
206=====================
207
208Suppose we want to bind a function with the following signature to Python so
209that it can process arbitrary NumPy array arguments (vectors, matrices, general
210N-D arrays) in addition to its normal arguments:
211
212.. code-block:: cpp
213
214    double my_func(int x, float y, double z);
215
216After including the ``pybind11/numpy.h`` header, this is extremely simple:
217
218.. code-block:: cpp
219
220    m.def("vectorized_func", py::vectorize(my_func));
221
222Invoking the function like below causes 4 calls to be made to ``my_func`` with
223each of the array elements. The significant advantage of this compared to
224solutions like ``numpy.vectorize()`` is that the loop over the elements runs
225entirely on the C++ side and can be crunched down into a tight, optimized loop
226by the compiler. The result is returned as a NumPy array of type
227``numpy.dtype.float64``.
228
229.. code-block:: pycon
230
231    >>> x = np.array([[1, 3],[5, 7]])
232    >>> y = np.array([[2, 4],[6, 8]])
233    >>> z = 3
234    >>> result = vectorized_func(x, y, z)
235
236The scalar argument ``z`` is transparently replicated 4 times.  The input
237arrays ``x`` and ``y`` are automatically converted into the right types (they
238are of type  ``numpy.dtype.int64`` but need to be ``numpy.dtype.int32`` and
239``numpy.dtype.float32``, respectively)
240
241Sometimes we might want to explicitly exclude an argument from the vectorization
242because it makes little sense to wrap it in a NumPy array. For instance,
243suppose the function signature was
244
245.. code-block:: cpp
246
247    double my_func(int x, float y, my_custom_type *z);
248
249This can be done with a stateful Lambda closure:
250
251.. code-block:: cpp
252
253    // Vectorize a lambda function with a capture object (e.g. to exclude some arguments from the vectorization)
254    m.def("vectorized_func",
255        [](py::array_t<int> x, py::array_t<float> y, my_custom_type *z) {
256            auto stateful_closure = [z](int x, float y) { return my_func(x, y, z); };
257            return py::vectorize(stateful_closure)(x, y);
258        }
259    );
260
261In cases where the computation is too complicated to be reduced to
262``vectorize``, it will be necessary to create and access the buffer contents
263manually. The following snippet contains a complete example that shows how this
264works (the code is somewhat contrived, since it could have been done more
265simply using ``vectorize``).
266
267.. code-block:: cpp
268
269    #include <pybind11/pybind11.h>
270    #include <pybind11/numpy.h>
271
272    namespace py = pybind11;
273
274    py::array_t<double> add_arrays(py::array_t<double> input1, py::array_t<double> input2) {
275        auto buf1 = input1.request(), buf2 = input2.request();
276
277        if (buf1.ndim != 1 || buf2.ndim != 1)
278            throw std::runtime_error("Number of dimensions must be one");
279
280        if (buf1.size != buf2.size)
281            throw std::runtime_error("Input shapes must match");
282
283        /* No pointer is passed, so NumPy will allocate the buffer */
284        auto result = py::array_t<double>(buf1.size);
285
286        auto buf3 = result.request();
287
288        double *ptr1 = (double *) buf1.ptr,
289               *ptr2 = (double *) buf2.ptr,
290               *ptr3 = (double *) buf3.ptr;
291
292        for (size_t idx = 0; idx < buf1.shape[0]; idx++)
293            ptr3[idx] = ptr1[idx] + ptr2[idx];
294
295        return result;
296    }
297
298    PYBIND11_PLUGIN(test) {
299        py::module m("test");
300        m.def("add_arrays", &add_arrays, "Add two NumPy arrays");
301        return m.ptr();
302    }
303
304.. seealso::
305
306    The file :file:`tests/test_numpy_vectorize.cpp` contains a complete
307    example that demonstrates using :func:`vectorize` in more detail.
308
309Direct access
310=============
311
312For performance reasons, particularly when dealing with very large arrays, it
313is often desirable to directly access array elements without internal checking
314of dimensions and bounds on every access when indices are known to be already
315valid.  To avoid such checks, the ``array`` class and ``array_t<T>`` template
316class offer an unchecked proxy object that can be used for this unchecked
317access through the ``unchecked<N>`` and ``mutable_unchecked<N>`` methods,
318where ``N`` gives the required dimensionality of the array:
319
320.. code-block:: cpp
321
322    m.def("sum_3d", [](py::array_t<double> x) {
323        auto r = x.unchecked<3>(); // x must have ndim = 3; can be non-writeable
324        double sum = 0;
325        for (size_t i = 0; i < r.shape(0); i++)
326            for (size_t j = 0; j < r.shape(1); j++)
327                for (size_t k = 0; k < r.shape(2); k++)
328                    sum += r(i, j, k);
329        return sum;
330    });
331    m.def("increment_3d", [](py::array_t<double> x) {
332        auto r = x.mutable_unchecked<3>(); // Will throw if ndim != 3 or flags.writeable is false
333        for (size_t i = 0; i < r.shape(0); i++)
334            for (size_t j = 0; j < r.shape(1); j++)
335                for (size_t k = 0; k < r.shape(2); k++)
336                    r(i, j, k) += 1.0;
337    }, py::arg().noconvert());
338
339To obtain the proxy from an ``array`` object, you must specify both the data
340type and number of dimensions as template arguments, such as ``auto r =
341myarray.mutable_unchecked<float, 2>()``.
342
343If the number of dimensions is not known at compile time, you can omit the
344dimensions template parameter (i.e. calling ``arr_t.unchecked()`` or
345``arr.unchecked<T>()``.  This will give you a proxy object that works in the
346same way, but results in less optimizable code and thus a small efficiency
347loss in tight loops.
348
349Note that the returned proxy object directly references the array's data, and
350only reads its shape, strides, and writeable flag when constructed.  You must
351take care to ensure that the referenced array is not destroyed or reshaped for
352the duration of the returned object, typically by limiting the scope of the
353returned instance.
354
355The returned proxy object supports some of the same methods as ``py::array`` so
356that it can be used as a drop-in replacement for some existing, index-checked
357uses of ``py::array``:
358
359- ``r.ndim()`` returns the number of dimensions
360
361- ``r.data(1, 2, ...)`` and ``r.mutable_data(1, 2, ...)``` returns a pointer to
362  the ``const T`` or ``T`` data, respectively, at the given indices.  The
363  latter is only available to proxies obtained via ``a.mutable_unchecked()``.
364
365- ``itemsize()`` returns the size of an item in bytes, i.e. ``sizeof(T)``.
366
367- ``ndim()`` returns the number of dimensions.
368
369- ``shape(n)`` returns the size of dimension ``n``
370
371- ``size()`` returns the total number of elements (i.e. the product of the shapes).
372
373- ``nbytes()`` returns the number of bytes used by the referenced elements
374  (i.e. ``itemsize()`` times ``size()``).
375
376.. seealso::
377
378    The file :file:`tests/test_numpy_array.cpp` contains additional examples
379    demonstrating the use of this feature.
380