numpy.rst revision 12037
1.. _numpy: 2 3NumPy 4##### 5 6Buffer protocol 7=============== 8 9Python supports an extremely general and convenient approach for exchanging 10data between plugin libraries. Types can expose a buffer view [#f2]_, which 11provides fast direct access to the raw internal data representation. Suppose we 12want to bind the following simplistic Matrix class: 13 14.. code-block:: cpp 15 16 class Matrix { 17 public: 18 Matrix(size_t rows, size_t cols) : m_rows(rows), m_cols(cols) { 19 m_data = new float[rows*cols]; 20 } 21 float *data() { return m_data; } 22 size_t rows() const { return m_rows; } 23 size_t cols() const { return m_cols; } 24 private: 25 size_t m_rows, m_cols; 26 float *m_data; 27 }; 28 29The following binding code exposes the ``Matrix`` contents as a buffer object, 30making it possible to cast Matrices into NumPy arrays. It is even possible to 31completely avoid copy operations with Python expressions like 32``np.array(matrix_instance, copy = False)``. 33 34.. code-block:: cpp 35 36 py::class_<Matrix>(m, "Matrix", py::buffer_protocol()) 37 .def_buffer([](Matrix &m) -> py::buffer_info { 38 return py::buffer_info( 39 m.data(), /* Pointer to buffer */ 40 sizeof(float), /* Size of one scalar */ 41 py::format_descriptor<float>::format(), /* Python struct-style format descriptor */ 42 2, /* Number of dimensions */ 43 { m.rows(), m.cols() }, /* Buffer dimensions */ 44 { sizeof(float) * m.rows(), /* Strides (in bytes) for each index */ 45 sizeof(float) } 46 ); 47 }); 48 49Supporting the buffer protocol in a new type involves specifying the special 50``py::buffer_protocol()`` tag in the ``py::class_`` constructor and calling the 51``def_buffer()`` method with a lambda function that creates a 52``py::buffer_info`` description record on demand describing a given matrix 53instance. The contents of ``py::buffer_info`` mirror the Python buffer protocol 54specification. 55 56.. code-block:: cpp 57 58 struct buffer_info { 59 void *ptr; 60 size_t itemsize; 61 std::string format; 62 int ndim; 63 std::vector<size_t> shape; 64 std::vector<size_t> strides; 65 }; 66 67To create a C++ function that can take a Python buffer object as an argument, 68simply use the type ``py::buffer`` as one of its arguments. Buffers can exist 69in a great variety of configurations, hence some safety checks are usually 70necessary in the function body. Below, you can see an basic example on how to 71define a custom constructor for the Eigen double precision matrix 72(``Eigen::MatrixXd``) type, which supports initialization from compatible 73buffer objects (e.g. a NumPy matrix). 74 75.. code-block:: cpp 76 77 /* Bind MatrixXd (or some other Eigen type) to Python */ 78 typedef Eigen::MatrixXd Matrix; 79 80 typedef Matrix::Scalar Scalar; 81 constexpr bool rowMajor = Matrix::Flags & Eigen::RowMajorBit; 82 83 py::class_<Matrix>(m, "Matrix", py::buffer_protocol()) 84 .def("__init__", [](Matrix &m, py::buffer b) { 85 typedef Eigen::Stride<Eigen::Dynamic, Eigen::Dynamic> Strides; 86 87 /* Request a buffer descriptor from Python */ 88 py::buffer_info info = b.request(); 89 90 /* Some sanity checks ... */ 91 if (info.format != py::format_descriptor<Scalar>::format()) 92 throw std::runtime_error("Incompatible format: expected a double array!"); 93 94 if (info.ndim != 2) 95 throw std::runtime_error("Incompatible buffer dimension!"); 96 97 auto strides = Strides( 98 info.strides[rowMajor ? 0 : 1] / sizeof(Scalar), 99 info.strides[rowMajor ? 1 : 0] / sizeof(Scalar)); 100 101 auto map = Eigen::Map<Matrix, 0, Strides>( 102 static_cat<Scalar *>(info.ptr), info.shape[0], info.shape[1], strides); 103 104 new (&m) Matrix(map); 105 }); 106 107For reference, the ``def_buffer()`` call for this Eigen data type should look 108as follows: 109 110.. code-block:: cpp 111 112 .def_buffer([](Matrix &m) -> py::buffer_info { 113 return py::buffer_info( 114 m.data(), /* Pointer to buffer */ 115 sizeof(Scalar), /* Size of one scalar */ 116 /* Python struct-style format descriptor */ 117 py::format_descriptor<Scalar>::format(), 118 /* Number of dimensions */ 119 2, 120 /* Buffer dimensions */ 121 { (size_t) m.rows(), 122 (size_t) m.cols() }, 123 /* Strides (in bytes) for each index */ 124 { sizeof(Scalar) * (rowMajor ? m.cols() : 1), 125 sizeof(Scalar) * (rowMajor ? 1 : m.rows()) } 126 ); 127 }) 128 129For a much easier approach of binding Eigen types (although with some 130limitations), refer to the section on :doc:`/advanced/cast/eigen`. 131 132.. seealso:: 133 134 The file :file:`tests/test_buffers.cpp` contains a complete example 135 that demonstrates using the buffer protocol with pybind11 in more detail. 136 137.. [#f2] http://docs.python.org/3/c-api/buffer.html 138 139Arrays 140====== 141 142By exchanging ``py::buffer`` with ``py::array`` in the above snippet, we can 143restrict the function so that it only accepts NumPy arrays (rather than any 144type of Python object satisfying the buffer protocol). 145 146In many situations, we want to define a function which only accepts a NumPy 147array of a certain data type. This is possible via the ``py::array_t<T>`` 148template. For instance, the following function requires the argument to be a 149NumPy array containing double precision values. 150 151.. code-block:: cpp 152 153 void f(py::array_t<double> array); 154 155When it is invoked with a different type (e.g. an integer or a list of 156integers), the binding code will attempt to cast the input into a NumPy array 157of the requested type. Note that this feature requires the 158:file:`pybind11/numpy.h` header to be included. 159 160Data in NumPy arrays is not guaranteed to packed in a dense manner; 161furthermore, entries can be separated by arbitrary column and row strides. 162Sometimes, it can be useful to require a function to only accept dense arrays 163using either the C (row-major) or Fortran (column-major) ordering. This can be 164accomplished via a second template argument with values ``py::array::c_style`` 165or ``py::array::f_style``. 166 167.. code-block:: cpp 168 169 void f(py::array_t<double, py::array::c_style | py::array::forcecast> array); 170 171The ``py::array::forcecast`` argument is the default value of the second 172template parameter, and it ensures that non-conforming arguments are converted 173into an array satisfying the specified requirements instead of trying the next 174function overload. 175 176Structured types 177================ 178 179In order for ``py::array_t`` to work with structured (record) types, we first 180need to register the memory layout of the type. This can be done via 181``PYBIND11_NUMPY_DTYPE`` macro, called in the plugin definition code, which 182expects the type followed by field names: 183 184.. code-block:: cpp 185 186 struct A { 187 int x; 188 double y; 189 }; 190 191 struct B { 192 int z; 193 A a; 194 }; 195 196 // ... 197 PYBIND11_PLUGIN(test) { 198 // ... 199 200 PYBIND11_NUMPY_DTYPE(A, x, y); 201 PYBIND11_NUMPY_DTYPE(B, z, a); 202 /* now both A and B can be used as template arguments to py::array_t */ 203 } 204 205Vectorizing functions 206===================== 207 208Suppose we want to bind a function with the following signature to Python so 209that it can process arbitrary NumPy array arguments (vectors, matrices, general 210N-D arrays) in addition to its normal arguments: 211 212.. code-block:: cpp 213 214 double my_func(int x, float y, double z); 215 216After including the ``pybind11/numpy.h`` header, this is extremely simple: 217 218.. code-block:: cpp 219 220 m.def("vectorized_func", py::vectorize(my_func)); 221 222Invoking the function like below causes 4 calls to be made to ``my_func`` with 223each of the array elements. The significant advantage of this compared to 224solutions like ``numpy.vectorize()`` is that the loop over the elements runs 225entirely on the C++ side and can be crunched down into a tight, optimized loop 226by the compiler. The result is returned as a NumPy array of type 227``numpy.dtype.float64``. 228 229.. code-block:: pycon 230 231 >>> x = np.array([[1, 3],[5, 7]]) 232 >>> y = np.array([[2, 4],[6, 8]]) 233 >>> z = 3 234 >>> result = vectorized_func(x, y, z) 235 236The scalar argument ``z`` is transparently replicated 4 times. The input 237arrays ``x`` and ``y`` are automatically converted into the right types (they 238are of type ``numpy.dtype.int64`` but need to be ``numpy.dtype.int32`` and 239``numpy.dtype.float32``, respectively) 240 241Sometimes we might want to explicitly exclude an argument from the vectorization 242because it makes little sense to wrap it in a NumPy array. For instance, 243suppose the function signature was 244 245.. code-block:: cpp 246 247 double my_func(int x, float y, my_custom_type *z); 248 249This can be done with a stateful Lambda closure: 250 251.. code-block:: cpp 252 253 // Vectorize a lambda function with a capture object (e.g. to exclude some arguments from the vectorization) 254 m.def("vectorized_func", 255 [](py::array_t<int> x, py::array_t<float> y, my_custom_type *z) { 256 auto stateful_closure = [z](int x, float y) { return my_func(x, y, z); }; 257 return py::vectorize(stateful_closure)(x, y); 258 } 259 ); 260 261In cases where the computation is too complicated to be reduced to 262``vectorize``, it will be necessary to create and access the buffer contents 263manually. The following snippet contains a complete example that shows how this 264works (the code is somewhat contrived, since it could have been done more 265simply using ``vectorize``). 266 267.. code-block:: cpp 268 269 #include <pybind11/pybind11.h> 270 #include <pybind11/numpy.h> 271 272 namespace py = pybind11; 273 274 py::array_t<double> add_arrays(py::array_t<double> input1, py::array_t<double> input2) { 275 auto buf1 = input1.request(), buf2 = input2.request(); 276 277 if (buf1.ndim != 1 || buf2.ndim != 1) 278 throw std::runtime_error("Number of dimensions must be one"); 279 280 if (buf1.size != buf2.size) 281 throw std::runtime_error("Input shapes must match"); 282 283 /* No pointer is passed, so NumPy will allocate the buffer */ 284 auto result = py::array_t<double>(buf1.size); 285 286 auto buf3 = result.request(); 287 288 double *ptr1 = (double *) buf1.ptr, 289 *ptr2 = (double *) buf2.ptr, 290 *ptr3 = (double *) buf3.ptr; 291 292 for (size_t idx = 0; idx < buf1.shape[0]; idx++) 293 ptr3[idx] = ptr1[idx] + ptr2[idx]; 294 295 return result; 296 } 297 298 PYBIND11_PLUGIN(test) { 299 py::module m("test"); 300 m.def("add_arrays", &add_arrays, "Add two NumPy arrays"); 301 return m.ptr(); 302 } 303 304.. seealso:: 305 306 The file :file:`tests/test_numpy_vectorize.cpp` contains a complete 307 example that demonstrates using :func:`vectorize` in more detail. 308 309Direct access 310============= 311 312For performance reasons, particularly when dealing with very large arrays, it 313is often desirable to directly access array elements without internal checking 314of dimensions and bounds on every access when indices are known to be already 315valid. To avoid such checks, the ``array`` class and ``array_t<T>`` template 316class offer an unchecked proxy object that can be used for this unchecked 317access through the ``unchecked<N>`` and ``mutable_unchecked<N>`` methods, 318where ``N`` gives the required dimensionality of the array: 319 320.. code-block:: cpp 321 322 m.def("sum_3d", [](py::array_t<double> x) { 323 auto r = x.unchecked<3>(); // x must have ndim = 3; can be non-writeable 324 double sum = 0; 325 for (size_t i = 0; i < r.shape(0); i++) 326 for (size_t j = 0; j < r.shape(1); j++) 327 for (size_t k = 0; k < r.shape(2); k++) 328 sum += r(i, j, k); 329 return sum; 330 }); 331 m.def("increment_3d", [](py::array_t<double> x) { 332 auto r = x.mutable_unchecked<3>(); // Will throw if ndim != 3 or flags.writeable is false 333 for (size_t i = 0; i < r.shape(0); i++) 334 for (size_t j = 0; j < r.shape(1); j++) 335 for (size_t k = 0; k < r.shape(2); k++) 336 r(i, j, k) += 1.0; 337 }, py::arg().noconvert()); 338 339To obtain the proxy from an ``array`` object, you must specify both the data 340type and number of dimensions as template arguments, such as ``auto r = 341myarray.mutable_unchecked<float, 2>()``. 342 343If the number of dimensions is not known at compile time, you can omit the 344dimensions template parameter (i.e. calling ``arr_t.unchecked()`` or 345``arr.unchecked<T>()``. This will give you a proxy object that works in the 346same way, but results in less optimizable code and thus a small efficiency 347loss in tight loops. 348 349Note that the returned proxy object directly references the array's data, and 350only reads its shape, strides, and writeable flag when constructed. You must 351take care to ensure that the referenced array is not destroyed or reshaped for 352the duration of the returned object, typically by limiting the scope of the 353returned instance. 354 355The returned proxy object supports some of the same methods as ``py::array`` so 356that it can be used as a drop-in replacement for some existing, index-checked 357uses of ``py::array``: 358 359- ``r.ndim()`` returns the number of dimensions 360 361- ``r.data(1, 2, ...)`` and ``r.mutable_data(1, 2, ...)``` returns a pointer to 362 the ``const T`` or ``T`` data, respectively, at the given indices. The 363 latter is only available to proxies obtained via ``a.mutable_unchecked()``. 364 365- ``itemsize()`` returns the size of an item in bytes, i.e. ``sizeof(T)``. 366 367- ``ndim()`` returns the number of dimensions. 368 369- ``shape(n)`` returns the size of dimension ``n`` 370 371- ``size()`` returns the total number of elements (i.e. the product of the shapes). 372 373- ``nbytes()`` returns the number of bytes used by the referenced elements 374 (i.e. ``itemsize()`` times ``size()``). 375 376.. seealso:: 377 378 The file :file:`tests/test_numpy_array.cpp` contains additional examples 379 demonstrating the use of this feature. 380