numpy.rst revision 11986:c12e4625ab56
1.. _numpy:
2
3NumPy
4#####
5
6Buffer protocol
7===============
8
9Python supports an extremely general and convenient approach for exchanging
10data between plugin libraries. Types can expose a buffer view [#f2]_, which
11provides fast direct access to the raw internal data representation. Suppose we
12want to bind the following simplistic Matrix class:
13
14.. code-block:: cpp
15
16    class Matrix {
17    public:
18        Matrix(size_t rows, size_t cols) : m_rows(rows), m_cols(cols) {
19            m_data = new float[rows*cols];
20        }
21        float *data() { return m_data; }
22        size_t rows() const { return m_rows; }
23        size_t cols() const { return m_cols; }
24    private:
25        size_t m_rows, m_cols;
26        float *m_data;
27    };
28
29The following binding code exposes the ``Matrix`` contents as a buffer object,
30making it possible to cast Matrices into NumPy arrays. It is even possible to
31completely avoid copy operations with Python expressions like
32``np.array(matrix_instance, copy = False)``.
33
34.. code-block:: cpp
35
36    py::class_<Matrix>(m, "Matrix")
37       .def_buffer([](Matrix &m) -> py::buffer_info {
38            return py::buffer_info(
39                m.data(),                               /* Pointer to buffer */
40                sizeof(float),                          /* Size of one scalar */
41                py::format_descriptor<float>::format(), /* Python struct-style format descriptor */
42                2,                                      /* Number of dimensions */
43                { m.rows(), m.cols() },                 /* Buffer dimensions */
44                { sizeof(float) * m.rows(),             /* Strides (in bytes) for each index */
45                  sizeof(float) }
46            );
47        });
48
49The snippet above binds a lambda function, which can create ``py::buffer_info``
50description records on demand describing a given matrix. The contents of
51``py::buffer_info`` mirror the Python buffer protocol specification.
52
53.. code-block:: cpp
54
55    struct buffer_info {
56        void *ptr;
57        size_t itemsize;
58        std::string format;
59        int ndim;
60        std::vector<size_t> shape;
61        std::vector<size_t> strides;
62    };
63
64To create a C++ function that can take a Python buffer object as an argument,
65simply use the type ``py::buffer`` as one of its arguments. Buffers can exist
66in a great variety of configurations, hence some safety checks are usually
67necessary in the function body. Below, you can see an basic example on how to
68define a custom constructor for the Eigen double precision matrix
69(``Eigen::MatrixXd``) type, which supports initialization from compatible
70buffer objects (e.g. a NumPy matrix).
71
72.. code-block:: cpp
73
74    /* Bind MatrixXd (or some other Eigen type) to Python */
75    typedef Eigen::MatrixXd Matrix;
76
77    typedef Matrix::Scalar Scalar;
78    constexpr bool rowMajor = Matrix::Flags & Eigen::RowMajorBit;
79
80    py::class_<Matrix>(m, "Matrix")
81        .def("__init__", [](Matrix &m, py::buffer b) {
82            typedef Eigen::Stride<Eigen::Dynamic, Eigen::Dynamic> Strides;
83
84            /* Request a buffer descriptor from Python */
85            py::buffer_info info = b.request();
86
87            /* Some sanity checks ... */
88            if (info.format != py::format_descriptor<Scalar>::format())
89                throw std::runtime_error("Incompatible format: expected a double array!");
90
91            if (info.ndim != 2)
92                throw std::runtime_error("Incompatible buffer dimension!");
93
94            auto strides = Strides(
95                info.strides[rowMajor ? 0 : 1] / sizeof(Scalar),
96                info.strides[rowMajor ? 1 : 0] / sizeof(Scalar));
97
98            auto map = Eigen::Map<Matrix, 0, Strides>(
99                static_cat<Scalar *>(info.ptr), info.shape[0], info.shape[1], strides);
100
101            new (&m) Matrix(map);
102        });
103
104For reference, the ``def_buffer()`` call for this Eigen data type should look
105as follows:
106
107.. code-block:: cpp
108
109    .def_buffer([](Matrix &m) -> py::buffer_info {
110        return py::buffer_info(
111            m.data(),                /* Pointer to buffer */
112            sizeof(Scalar),          /* Size of one scalar */
113            /* Python struct-style format descriptor */
114            py::format_descriptor<Scalar>::format(),
115            /* Number of dimensions */
116            2,
117            /* Buffer dimensions */
118            { (size_t) m.rows(),
119              (size_t) m.cols() },
120            /* Strides (in bytes) for each index */
121            { sizeof(Scalar) * (rowMajor ? m.cols() : 1),
122              sizeof(Scalar) * (rowMajor ? 1 : m.rows()) }
123        );
124     })
125
126For a much easier approach of binding Eigen types (although with some
127limitations), refer to the section on :doc:`/advanced/cast/eigen`.
128
129.. seealso::
130
131    The file :file:`tests/test_buffers.cpp` contains a complete example
132    that demonstrates using the buffer protocol with pybind11 in more detail.
133
134.. [#f2] http://docs.python.org/3/c-api/buffer.html
135
136Arrays
137======
138
139By exchanging ``py::buffer`` with ``py::array`` in the above snippet, we can
140restrict the function so that it only accepts NumPy arrays (rather than any
141type of Python object satisfying the buffer protocol).
142
143In many situations, we want to define a function which only accepts a NumPy
144array of a certain data type. This is possible via the ``py::array_t<T>``
145template. For instance, the following function requires the argument to be a
146NumPy array containing double precision values.
147
148.. code-block:: cpp
149
150    void f(py::array_t<double> array);
151
152When it is invoked with a different type (e.g. an integer or a list of
153integers), the binding code will attempt to cast the input into a NumPy array
154of the requested type. Note that this feature requires the
155:file:``pybind11/numpy.h`` header to be included.
156
157Data in NumPy arrays is not guaranteed to packed in a dense manner;
158furthermore, entries can be separated by arbitrary column and row strides.
159Sometimes, it can be useful to require a function to only accept dense arrays
160using either the C (row-major) or Fortran (column-major) ordering. This can be
161accomplished via a second template argument with values ``py::array::c_style``
162or ``py::array::f_style``.
163
164.. code-block:: cpp
165
166    void f(py::array_t<double, py::array::c_style | py::array::forcecast> array);
167
168The ``py::array::forcecast`` argument is the default value of the second
169template parameter, and it ensures that non-conforming arguments are converted
170into an array satisfying the specified requirements instead of trying the next
171function overload.
172
173Structured types
174================
175
176In order for ``py::array_t`` to work with structured (record) types, we first need
177to register the memory layout of the type. This can be done via ``PYBIND11_NUMPY_DTYPE``
178macro which expects the type followed by field names:
179
180.. code-block:: cpp
181
182    struct A {
183        int x;
184        double y;
185    };
186
187    struct B {
188        int z;
189        A a;
190    };
191
192    PYBIND11_NUMPY_DTYPE(A, x, y);
193    PYBIND11_NUMPY_DTYPE(B, z, a);
194
195    /* now both A and B can be used as template arguments to py::array_t */
196
197Vectorizing functions
198=====================
199
200Suppose we want to bind a function with the following signature to Python so
201that it can process arbitrary NumPy array arguments (vectors, matrices, general
202N-D arrays) in addition to its normal arguments:
203
204.. code-block:: cpp
205
206    double my_func(int x, float y, double z);
207
208After including the ``pybind11/numpy.h`` header, this is extremely simple:
209
210.. code-block:: cpp
211
212    m.def("vectorized_func", py::vectorize(my_func));
213
214Invoking the function like below causes 4 calls to be made to ``my_func`` with
215each of the array elements. The significant advantage of this compared to
216solutions like ``numpy.vectorize()`` is that the loop over the elements runs
217entirely on the C++ side and can be crunched down into a tight, optimized loop
218by the compiler. The result is returned as a NumPy array of type
219``numpy.dtype.float64``.
220
221.. code-block:: pycon
222
223    >>> x = np.array([[1, 3],[5, 7]])
224    >>> y = np.array([[2, 4],[6, 8]])
225    >>> z = 3
226    >>> result = vectorized_func(x, y, z)
227
228The scalar argument ``z`` is transparently replicated 4 times.  The input
229arrays ``x`` and ``y`` are automatically converted into the right types (they
230are of type  ``numpy.dtype.int64`` but need to be ``numpy.dtype.int32`` and
231``numpy.dtype.float32``, respectively)
232
233Sometimes we might want to explicitly exclude an argument from the vectorization
234because it makes little sense to wrap it in a NumPy array. For instance,
235suppose the function signature was
236
237.. code-block:: cpp
238
239    double my_func(int x, float y, my_custom_type *z);
240
241This can be done with a stateful Lambda closure:
242
243.. code-block:: cpp
244
245    // Vectorize a lambda function with a capture object (e.g. to exclude some arguments from the vectorization)
246    m.def("vectorized_func",
247        [](py::array_t<int> x, py::array_t<float> y, my_custom_type *z) {
248            auto stateful_closure = [z](int x, float y) { return my_func(x, y, z); };
249            return py::vectorize(stateful_closure)(x, y);
250        }
251    );
252
253In cases where the computation is too complicated to be reduced to
254``vectorize``, it will be necessary to create and access the buffer contents
255manually. The following snippet contains a complete example that shows how this
256works (the code is somewhat contrived, since it could have been done more
257simply using ``vectorize``).
258
259.. code-block:: cpp
260
261    #include <pybind11/pybind11.h>
262    #include <pybind11/numpy.h>
263
264    namespace py = pybind11;
265
266    py::array_t<double> add_arrays(py::array_t<double> input1, py::array_t<double> input2) {
267        auto buf1 = input1.request(), buf2 = input2.request();
268
269        if (buf1.ndim != 1 || buf2.ndim != 1)
270            throw std::runtime_error("Number of dimensions must be one");
271
272        if (buf1.size != buf2.size)
273            throw std::runtime_error("Input shapes must match");
274
275        /* No pointer is passed, so NumPy will allocate the buffer */
276        auto result = py::array_t<double>(buf1.size);
277
278        auto buf3 = result.request();
279
280        double *ptr1 = (double *) buf1.ptr,
281               *ptr2 = (double *) buf2.ptr,
282               *ptr3 = (double *) buf3.ptr;
283
284        for (size_t idx = 0; idx < buf1.shape[0]; idx++)
285            ptr3[idx] = ptr1[idx] + ptr2[idx];
286
287        return result;
288    }
289
290    PYBIND11_PLUGIN(test) {
291        py::module m("test");
292        m.def("add_arrays", &add_arrays, "Add two NumPy arrays");
293        return m.ptr();
294    }
295
296.. seealso::
297
298    The file :file:`tests/test_numpy_vectorize.cpp` contains a complete
299    example that demonstrates using :func:`vectorize` in more detail.
300