numpy.rst revision 11986:c12e4625ab56
12889Sbinkertn@umich.edu.. _numpy: 22889Sbinkertn@umich.edu 32889Sbinkertn@umich.eduNumPy 42889Sbinkertn@umich.edu##### 52889Sbinkertn@umich.edu 62889Sbinkertn@umich.eduBuffer protocol 72889Sbinkertn@umich.edu=============== 82889Sbinkertn@umich.edu 92889Sbinkertn@umich.eduPython supports an extremely general and convenient approach for exchanging 102889Sbinkertn@umich.edudata between plugin libraries. Types can expose a buffer view [#f2]_, which 112889Sbinkertn@umich.eduprovides fast direct access to the raw internal data representation. Suppose we 122889Sbinkertn@umich.eduwant to bind the following simplistic Matrix class: 132889Sbinkertn@umich.edu 142889Sbinkertn@umich.edu.. code-block:: cpp 152889Sbinkertn@umich.edu 162889Sbinkertn@umich.edu class Matrix { 172889Sbinkertn@umich.edu public: 182889Sbinkertn@umich.edu Matrix(size_t rows, size_t cols) : m_rows(rows), m_cols(cols) { 192889Sbinkertn@umich.edu m_data = new float[rows*cols]; 202889Sbinkertn@umich.edu } 212889Sbinkertn@umich.edu float *data() { return m_data; } 222889Sbinkertn@umich.edu size_t rows() const { return m_rows; } 232889Sbinkertn@umich.edu size_t cols() const { return m_cols; } 242889Sbinkertn@umich.edu private: 252889Sbinkertn@umich.edu size_t m_rows, m_cols; 262889Sbinkertn@umich.edu float *m_data; 272889Sbinkertn@umich.edu }; 282889Sbinkertn@umich.edu 294850Snate@binkert.orgThe following binding code exposes the ``Matrix`` contents as a buffer object, 304850Snate@binkert.orgmaking it possible to cast Matrices into NumPy arrays. It is even possible to 314850Snate@binkert.orgcompletely avoid copy operations with Python expressions like 324850Snate@binkert.org``np.array(matrix_instance, copy = False)``. 334850Snate@binkert.org 344850Snate@binkert.org.. code-block:: cpp 352889Sbinkertn@umich.edu 362889Sbinkertn@umich.edu py::class_<Matrix>(m, "Matrix") 378327Sgblack@eecs.umich.edu .def_buffer([](Matrix &m) -> py::buffer_info { 385470Snate@binkert.org return py::buffer_info( 398333Snate@binkert.org m.data(), /* Pointer to buffer */ 408333Snate@binkert.org sizeof(float), /* Size of one scalar */ 412889Sbinkertn@umich.edu py::format_descriptor<float>::format(), /* Python struct-style format descriptor */ 428234Snate@binkert.org 2, /* Number of dimensions */ 438234Snate@binkert.org { m.rows(), m.cols() }, /* Buffer dimensions */ 448234Snate@binkert.org { sizeof(float) * m.rows(), /* Strides (in bytes) for each index */ 452889Sbinkertn@umich.edu sizeof(float) } 468234Snate@binkert.org ); 478234Snate@binkert.org }); 488234Snate@binkert.org 498234Snate@binkert.orgThe snippet above binds a lambda function, which can create ``py::buffer_info`` 502889Sbinkertn@umich.edudescription records on demand describing a given matrix. The contents of 518234Snate@binkert.org``py::buffer_info`` mirror the Python buffer protocol specification. 528234Snate@binkert.org 538234Snate@binkert.org.. code-block:: cpp 548234Snate@binkert.org 558234Snate@binkert.org struct buffer_info { 568234Snate@binkert.org void *ptr; 578234Snate@binkert.org size_t itemsize; 582889Sbinkertn@umich.edu std::string format; 598234Snate@binkert.org int ndim; 608234Snate@binkert.org std::vector<size_t> shape; 618234Snate@binkert.org std::vector<size_t> strides; 628234Snate@binkert.org }; 638234Snate@binkert.org 648234Snate@binkert.orgTo create a C++ function that can take a Python buffer object as an argument, 658234Snate@binkert.orgsimply use the type ``py::buffer`` as one of its arguments. Buffers can exist 668234Snate@binkert.orgin a great variety of configurations, hence some safety checks are usually 678234Snate@binkert.orgnecessary in the function body. Below, you can see an basic example on how to 688234Snate@binkert.orgdefine a custom constructor for the Eigen double precision matrix 698234Snate@binkert.org(``Eigen::MatrixXd``) type, which supports initialization from compatible 708234Snate@binkert.orgbuffer objects (e.g. a NumPy matrix). 718234Snate@binkert.org 728234Snate@binkert.org.. code-block:: cpp 738234Snate@binkert.org 748234Snate@binkert.org /* Bind MatrixXd (or some other Eigen type) to Python */ 758234Snate@binkert.org typedef Eigen::MatrixXd Matrix; 768234Snate@binkert.org 778234Snate@binkert.org typedef Matrix::Scalar Scalar; 788234Snate@binkert.org constexpr bool rowMajor = Matrix::Flags & Eigen::RowMajorBit; 798234Snate@binkert.org 802889Sbinkertn@umich.edu py::class_<Matrix>(m, "Matrix") 818234Snate@binkert.org .def("__init__", [](Matrix &m, py::buffer b) { 828234Snate@binkert.org typedef Eigen::Stride<Eigen::Dynamic, Eigen::Dynamic> Strides; 838234Snate@binkert.org 848234Snate@binkert.org /* Request a buffer descriptor from Python */ 855773Snate@binkert.org py::buffer_info info = b.request(); 868234Snate@binkert.org 878234Snate@binkert.org /* Some sanity checks ... */ 888234Snate@binkert.org if (info.format != py::format_descriptor<Scalar>::format()) 898234Snate@binkert.org throw std::runtime_error("Incompatible format: expected a double array!"); 908664SAli.Saidi@ARM.com 918664SAli.Saidi@ARM.com if (info.ndim != 2) 928998Suri.wiener@arm.com throw std::runtime_error("Incompatible buffer dimension!"); 938998Suri.wiener@arm.com 942889Sbinkertn@umich.edu auto strides = Strides( 958234Snate@binkert.org info.strides[rowMajor ? 0 : 1] / sizeof(Scalar), 968234Snate@binkert.org info.strides[rowMajor ? 1 : 0] / sizeof(Scalar)); 978234Snate@binkert.org 989960Sandreas.hansson@arm.com auto map = Eigen::Map<Matrix, 0, Strides>( 998234Snate@binkert.org static_cat<Scalar *>(info.ptr), info.shape[0], info.shape[1], strides); 1009960Sandreas.hansson@arm.com 1018234Snate@binkert.org new (&m) Matrix(map); 1029960Sandreas.hansson@arm.com }); 1039960Sandreas.hansson@arm.com 1049960Sandreas.hansson@arm.comFor reference, the ``def_buffer()`` call for this Eigen data type should look 1059960Sandreas.hansson@arm.comas follows: 1069960Sandreas.hansson@arm.com 1079960Sandreas.hansson@arm.com.. code-block:: cpp 1089960Sandreas.hansson@arm.com 1098234Snate@binkert.org .def_buffer([](Matrix &m) -> py::buffer_info { 1108234Snate@binkert.org return py::buffer_info( 1112889Sbinkertn@umich.edu m.data(), /* Pointer to buffer */ 1128234Snate@binkert.org sizeof(Scalar), /* Size of one scalar */ 1138234Snate@binkert.org /* Python struct-style format descriptor */ 1148234Snate@binkert.org py::format_descriptor<Scalar>::format(), 1158234Snate@binkert.org /* Number of dimensions */ 1166171Snate@binkert.org 2, 1178234Snate@binkert.org /* Buffer dimensions */ 1188234Snate@binkert.org { (size_t) m.rows(), 1198234Snate@binkert.org (size_t) m.cols() }, 1208234Snate@binkert.org /* Strides (in bytes) for each index */ 1218234Snate@binkert.org { sizeof(Scalar) * (rowMajor ? m.cols() : 1), 1228234Snate@binkert.org sizeof(Scalar) * (rowMajor ? 1 : m.rows()) } 1238234Snate@binkert.org ); 1248234Snate@binkert.org }) 1258234Snate@binkert.org 1266171Snate@binkert.orgFor a much easier approach of binding Eigen types (although with some 1278219Snate@binkert.orglimitations), refer to the section on :doc:`/advanced/cast/eigen`. 1288327Sgblack@eecs.umich.edu 1299512Sandreas@sandberg.pp.se.. seealso:: 1309512Sandreas@sandberg.pp.se 1319512Sandreas@sandberg.pp.se The file :file:`tests/test_buffers.cpp` contains a complete example 1329512Sandreas@sandberg.pp.se that demonstrates using the buffer protocol with pybind11 in more detail. 1339512Sandreas@sandberg.pp.se 1349512Sandreas@sandberg.pp.se.. [#f2] http://docs.python.org/3/c-api/buffer.html 1358219Snate@binkert.org 1368219Snate@binkert.orgArrays 1379512Sandreas@sandberg.pp.se====== 1389512Sandreas@sandberg.pp.se 1399512Sandreas@sandberg.pp.seBy exchanging ``py::buffer`` with ``py::array`` in the above snippet, we can 1409512Sandreas@sandberg.pp.serestrict the function so that it only accepts NumPy arrays (rather than any 1419512Sandreas@sandberg.pp.setype of Python object satisfying the buffer protocol). 1429512Sandreas@sandberg.pp.se 1439512Sandreas@sandberg.pp.seIn many situations, we want to define a function which only accepts a NumPy 1449512Sandreas@sandberg.pp.searray of a certain data type. This is possible via the ``py::array_t<T>`` 1459512Sandreas@sandberg.pp.setemplate. For instance, the following function requires the argument to be a 1469512Sandreas@sandberg.pp.seNumPy array containing double precision values. 1479512Sandreas@sandberg.pp.se 1489512Sandreas@sandberg.pp.se.. code-block:: cpp 1499512Sandreas@sandberg.pp.se 1509512Sandreas@sandberg.pp.se void f(py::array_t<double> array); 1519512Sandreas@sandberg.pp.se 1529512Sandreas@sandberg.pp.seWhen it is invoked with a different type (e.g. an integer or a list of 1539512Sandreas@sandberg.pp.seintegers), the binding code will attempt to cast the input into a NumPy array 1549512Sandreas@sandberg.pp.seof the requested type. Note that this feature requires the 1559512Sandreas@sandberg.pp.se:file:``pybind11/numpy.h`` header to be included. 1569512Sandreas@sandberg.pp.se 1579512Sandreas@sandberg.pp.seData in NumPy arrays is not guaranteed to packed in a dense manner; 1589512Sandreas@sandberg.pp.sefurthermore, entries can be separated by arbitrary column and row strides. 1598219Snate@binkert.orgSometimes, it can be useful to require a function to only accept dense arrays 1609512Sandreas@sandberg.pp.seusing either the C (row-major) or Fortran (column-major) ordering. This can be 1619512Sandreas@sandberg.pp.seaccomplished via a second template argument with values ``py::array::c_style`` 1629512Sandreas@sandberg.pp.seor ``py::array::f_style``. 1638219Snate@binkert.org 1648219Snate@binkert.org.. code-block:: cpp 1658234Snate@binkert.org 1668245Snate@binkert.org void f(py::array_t<double, py::array::c_style | py::array::forcecast> array); 1678245Snate@binkert.org 1685801Snate@binkert.orgThe ``py::array::forcecast`` argument is the default value of the second 1695801Snate@binkert.orgtemplate parameter, and it ensures that non-conforming arguments are converted 1705801Snate@binkert.orginto an array satisfying the specified requirements instead of trying the next 1714167Sbinkertn@umich.edufunction overload. 1724042Sbinkertn@umich.edu 1735801Snate@binkert.orgStructured types 1745799Snate@binkert.org================ 1755799Snate@binkert.org 1768234Snate@binkert.orgIn order for ``py::array_t`` to work with structured (record) types, we first need 1778234Snate@binkert.orgto register the memory layout of the type. This can be done via ``PYBIND11_NUMPY_DTYPE`` 1788234Snate@binkert.orgmacro which expects the type followed by field names: 1798234Snate@binkert.org 1808234Snate@binkert.org.. code-block:: cpp 1818234Snate@binkert.org 1828234Snate@binkert.org struct A { 1838234Snate@binkert.org int x; 1848234Snate@binkert.org double y; 1858245Snate@binkert.org }; 1868245Snate@binkert.org 1875799Snate@binkert.org struct B { 1885799Snate@binkert.org int z; 1895799Snate@binkert.org A a; 1905799Snate@binkert.org }; 1915802Snate@binkert.org 1922889Sbinkertn@umich.edu PYBIND11_NUMPY_DTYPE(A, x, y); 1939983Sstever@gmail.com PYBIND11_NUMPY_DTYPE(B, z, a); 1949983Sstever@gmail.com 1959983Sstever@gmail.com /* now both A and B can be used as template arguments to py::array_t */ 1969983Sstever@gmail.com 1975524Sstever@gmail.comVectorizing functions 1985524Sstever@gmail.com===================== 1995524Sstever@gmail.com 2005524Sstever@gmail.comSuppose we want to bind a function with the following signature to Python so 2015524Sstever@gmail.comthat it can process arbitrary NumPy array arguments (vectors, matrices, general 2025524Sstever@gmail.comN-D arrays) in addition to its normal arguments: 2035524Sstever@gmail.com 2045524Sstever@gmail.com.. code-block:: cpp 2055524Sstever@gmail.com 2065524Sstever@gmail.com double my_func(int x, float y, double z); 2075524Sstever@gmail.com 2085524Sstever@gmail.comAfter including the ``pybind11/numpy.h`` header, this is extremely simple: 2095524Sstever@gmail.com 2105524Sstever@gmail.com.. code-block:: cpp 2115524Sstever@gmail.com 2125524Sstever@gmail.com m.def("vectorized_func", py::vectorize(my_func)); 2135524Sstever@gmail.com 2145524Sstever@gmail.comInvoking the function like below causes 4 calls to be made to ``my_func`` with 2155524Sstever@gmail.comeach of the array elements. The significant advantage of this compared to 2165524Sstever@gmail.comsolutions like ``numpy.vectorize()`` is that the loop over the elements runs 2175524Sstever@gmail.comentirely on the C++ side and can be crunched down into a tight, optimized loop 2185524Sstever@gmail.comby the compiler. The result is returned as a NumPy array of type 2195524Sstever@gmail.com``numpy.dtype.float64``. 2205524Sstever@gmail.com 2215524Sstever@gmail.com.. code-block:: pycon 2225524Sstever@gmail.com 2235524Sstever@gmail.com >>> x = np.array([[1, 3],[5, 7]]) 2242889Sbinkertn@umich.edu >>> y = np.array([[2, 4],[6, 8]]) 2254850Snate@binkert.org >>> z = 3 2264850Snate@binkert.org >>> result = vectorized_func(x, y, z) 2274850Snate@binkert.org 2284850Snate@binkert.orgThe scalar argument ``z`` is transparently replicated 4 times. The input 2294850Snate@binkert.orgarrays ``x`` and ``y`` are automatically converted into the right types (they 2305801Snate@binkert.orgare of type ``numpy.dtype.int64`` but need to be ``numpy.dtype.int32`` and 2314850Snate@binkert.org``numpy.dtype.float32``, respectively) 2325801Snate@binkert.org 2334850Snate@binkert.orgSometimes we might want to explicitly exclude an argument from the vectorization 2344850Snate@binkert.orgbecause it makes little sense to wrap it in a NumPy array. For instance, 2355801Snate@binkert.orgsuppose the function signature was 2364850Snate@binkert.org 2374850Snate@binkert.org.. code-block:: cpp 2384850Snate@binkert.org 2392889Sbinkertn@umich.edu double my_func(int x, float y, my_custom_type *z); 2402889Sbinkertn@umich.edu 2418333Snate@binkert.orgThis can be done with a stateful Lambda closure: 2422889Sbinkertn@umich.edu 2432889Sbinkertn@umich.edu.. code-block:: cpp 2442889Sbinkertn@umich.edu 2452889Sbinkertn@umich.edu // Vectorize a lambda function with a capture object (e.g. to exclude some arguments from the vectorization) 2462889Sbinkertn@umich.edu m.def("vectorized_func", 2472889Sbinkertn@umich.edu [](py::array_t<int> x, py::array_t<float> y, my_custom_type *z) { 2482889Sbinkertn@umich.edu auto stateful_closure = [z](int x, float y) { return my_func(x, y, z); }; 2492889Sbinkertn@umich.edu return py::vectorize(stateful_closure)(x, y); 2502889Sbinkertn@umich.edu } 2518232Snate@binkert.org ); 2524053Sbinkertn@umich.edu 2535799Snate@binkert.orgIn cases where the computation is too complicated to be reduced to 2548232Snate@binkert.org``vectorize``, it will be necessary to create and access the buffer contents 2554053Sbinkertn@umich.edumanually. The following snippet contains a complete example that shows how this 2565473Snate@binkert.orgworks (the code is somewhat contrived, since it could have been done more 2575473Snate@binkert.orgsimply using ``vectorize``). 2585473Snate@binkert.org 2595473Snate@binkert.org.. code-block:: cpp 2605473Snate@binkert.org 2615473Snate@binkert.org #include <pybind11/pybind11.h> 2625473Snate@binkert.org #include <pybind11/numpy.h> 2635473Snate@binkert.org 2645473Snate@binkert.org namespace py = pybind11; 2655473Snate@binkert.org 2665473Snate@binkert.org py::array_t<double> add_arrays(py::array_t<double> input1, py::array_t<double> input2) { 2675473Snate@binkert.org auto buf1 = input1.request(), buf2 = input2.request(); 2685473Snate@binkert.org 2695473Snate@binkert.org if (buf1.ndim != 1 || buf2.ndim != 1) 2705473Snate@binkert.org throw std::runtime_error("Number of dimensions must be one"); 2715473Snate@binkert.org 2725473Snate@binkert.org if (buf1.size != buf2.size) 2735473Snate@binkert.org throw std::runtime_error("Input shapes must match"); 2745473Snate@binkert.org 2755473Snate@binkert.org /* No pointer is passed, so NumPy will allocate the buffer */ 2765473Snate@binkert.org auto result = py::array_t<double>(buf1.size); 2772889Sbinkertn@umich.edu 2782889Sbinkertn@umich.edu auto buf3 = result.request(); 2792889Sbinkertn@umich.edu 2805470Snate@binkert.org double *ptr1 = (double *) buf1.ptr, 2815470Snate@binkert.org *ptr2 = (double *) buf2.ptr, 2825470Snate@binkert.org *ptr3 = (double *) buf3.ptr; 2835470Snate@binkert.org 2845470Snate@binkert.org for (size_t idx = 0; idx < buf1.shape[0]; idx++) 28510134Sstan.czerniawski@arm.com ptr3[idx] = ptr1[idx] + ptr2[idx]; 2868333Snate@binkert.org 2872889Sbinkertn@umich.edu return result; 2882889Sbinkertn@umich.edu } 2895801Snate@binkert.org 2908327Sgblack@eecs.umich.edu PYBIND11_PLUGIN(test) { 2915456Ssaidi@eecs.umich.edu py::module m("test"); 2928327Sgblack@eecs.umich.edu m.def("add_arrays", &add_arrays, "Add two NumPy arrays"); 2938327Sgblack@eecs.umich.edu return m.ptr(); 2948327Sgblack@eecs.umich.edu } 2955528Sstever@gmail.com 29610758Ssteve.reinhardt@amd.com.. seealso:: 29710758Ssteve.reinhardt@amd.com 29810758Ssteve.reinhardt@amd.com The file :file:`tests/test_numpy_vectorize.cpp` contains a complete 2992967Sktlim@umich.edu example that demonstrates using :func:`vectorize` in more detail. 3002889Sbinkertn@umich.edu