Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

CS602 – Data-Driven Development with – Fall’19

Handout 9

NumPy package

NumPy package (https://docs.scipy.org/doc/numpy/) provides efficient operations on arrays of numerical data. Many other packages, including scientific computation, use NumPy.

import numpy import numpy as np

Package defines following data types:

integer int8, int16, int32, int64, uint8, . . .

float float16, float32, float64, . . . complex complex64, complex128, . . . boolean bool8

NDARRAY

multidimensional, homogeneous array of items (a.k.a. numpy array).

import numpy as np

>>> a = np.array([10, 20, 30, 40, 50])

>>> type(a) numpy.ndarray

>>> a

array([10, 20, 30, 40, 50])

>>> a*2

array([ 20, 40, 60, 80, 100])

>>> a

array([10, 20, 30, 40, 50])

>>> a = a*3

>>> a

array([ 30, 60, 90, 120, 150])

>>> b = np.array(range(1, 6))

>>> b

array([1, 2, 3, 4, 5])

>>> c = a + b

>>> c

array([ 31, 62, 93, 124, 155])

>>> e = np.array([(1.5,2,3), (4,5,6)])

>>> e

array([[1.5, 2. , 3. ],

[4. , 5. , 6. ]])

>>> e[1,1] 5.0

>>> e[1][1] 5.0

>>> type(e) numpy.ndarray

NDARRAY ATTRIBUTES

The more important attributes of an ndarray object are:

ndarray.ndim - the number of axes (dimensions) of the array.

ndarray.shape - the dimensions of the array. This is a tuple of integers indicating the size of the array in each dimension. For a matrix with n rows and mcolumns, shape will be (n,m). The length of the shape tuple is therefore the number of axes, ndim.

ndarray.size - the total number of elements of the array. This is equal to the product of the elements of shape.

ndarray.dtype - an object describing the type of the elements in the array. One can create or

specify dtype’s using standard Python types. Additionally NumPy provides types of its own.

>>> e

array([[1.5, 2. , 3. ],

[4. , 5. , 6. ]])

>>> e.ndim

2 # two dimensions:

>>> e.shape

(2, 3) # 2 rows, 3 columns

>>> e.dtype

dtype('float64') # type of each element

>>> e.size

6 # number of elements = 2*3

CREATE NDARRAY

There are a number of ways to create and initialize new numpy arrays, for example from

– a file

– a Python list or tuples

– using functions that are dedicated to generating numpy arrays, such as arange, linspace, random,randint etc.

– reading data from files

** Note - these descriptions are not complete

All functions below return an ndarray

numpy.loadtxt (filepath)

Returns an ndarray of elements from file. Each line designates a row.

numpy.linspace(start, stop, num=50)

Returns an ndarray of num evenly spaced samples, calculated over the interval [start, stop].

numpy.arange([start,]stop,[step])

Returns an ndarray of evenly spaced values. For floating point arguments, the length of the result is ceil((stop - start)/step).

Because of floating point overflow, this rule

may result in the last element being greater than stop.

*numpy.full(shape*,value**)	Returns array of given shape, filled with value
*numpy.zeros(shape*,dtype=float**)	Returns ndarray of zeros with the given shape, dtype.
*numpy.ones(shape*,dtype=float**)	Returns ndarray of ones with the given shape, dtype.
numpy.zeros_like(a, *dtype=None)* numpy. ones_like(a, *dtype=None)*	Return an array of zeros/ones with the same shape and type as a given array a.
numpy.random.randint(low, high=No ne, size=None)	Returns ndarray of ints; size-shaped array of random integers from the appropriate distribution, or a single such random int if size not provided. size is a int or tuple of ints, optional defining the shape of the ndarray
numpy.random.choice(a, size=None, p = None)	Returns a size-shaped array of randomly picked elements from a. size is a int or tuple of ints, optional defining the shape of the ndarray. p provides probabilities associated with every element; must add to 1.

numpy.reshape(a, shape): ndarray

Returns ndarray with the same data as a, but

dimensions according to the specified shape.

>>> np.random.seed() # to start off the randomizer

>>>h = np.random.randint(100, 120, 5)

>>>h

array([106, 113, 116, 113, 106])

>>>k = np.random.randint(100, 120, size = (3, 6))

>>>k

array([[101,	117,	112,	112,	112,	101],
[119,	103,	107,	101,	100,	109],
[106,	107,	112,	114,	115,	102]])

>>> m = np.random.choice(['red', 'blue', 'green' ], size = [2,3])

>>> m

array([['red', 'green', 'red'],

['red', 'blue', 'blue']], dtype='

>>> n = np.random.choice(['red', 'blue', 'green'], size = [4,7], p = (.1, .6, .3))

>>> n

array([['green', 'green', 'green', 'blue', 'blue', 'green', 'blue'],

['red', 'blue', 'blue', 'green', 'blue', 'blue', 'blue'],

['blue', 'blue', 'blue', 'blue', 'red', 'blue', 'green'],

['blue', 'blue', 'blue', 'green', 'blue', 'blue', 'green']], dtype='

>>> m = np.loadtxt('midtermData1.txt')

Practice problems:

Generate an ndarray to store:

· a 30 x 5 matrix of 0s

· a vector of 17 random integers in the 18-95 range

· a 10x10 matrix filled with about 50% of zeros and 50% of ones

· a 20x3 matrix of even values starting from 66 and up

SLICING

Indexing and slicing use usual slicing syntax Difference from lists:

- slices for the various dimensions separated by comma, e.g. x[0:2, 0:3]

- the result is not a copy of the portion of the original data, but an ndarray view on the portion of the data

- changing the content of the view changes the original ndarray

>>> lst = [[row*10+col for col in range (4) ] for row in range (5)]

>>> x = np.array(lst)

>>> x

array([[ 0,	1,	2,	3],
[10,	11,	12,	13],
[20,	21,	22,	23],
[30,	31,	32,	33],
[40,	41,	42,	43]])

>>> x[0:2, 0:3] # rows 0, 1 and cols 0,1,2

array([[ 0, 1, 2],

[10, 11, 12]])

>>> x[:, 3] # col 3

array([ 3,

13,

23,

33,

43])

>>> x[::2 , 0]

array([ 0,

20,

40])

Practice problems:

1. what will be printed by the following code?

lst = [[row*10+col for col in range (4) ] \ for row in range (5)]

x = np.array(lst)

y = np.arange(21).reshape(3,7) print('x = \n', x, '\n y = \n',y)

z = np.full((2,2), 8)

x [:2,:2] = z

y [1:3, -2:] = z

print('z= \n', z, '\n x = \n', x, '\n y = \n',y)

2. Given array x defined in the example above, what is the result of evaluating each of the following expressions:

1. x[3,3]

2. x[1,:]

3. x[:, 3]

4. x[1:3, :]

5. x[::3, 2:]

3. Generate the matrix shown below and write down the expression to generate the highlighted parts:

VECTORIZATION AND UFUNC

Vectorized operations – operations that are applied to each element of the array, no loops required.

>>> a = np.array([10, 20, 30, 40, 50])

>>> b = np.array(range(1, 6))

>>>a

array([10, 20, 30, 40, 50])

>>>b

array([1, 2, 3, 4, 5])

>>>2*a +3

array([ 23, 43, 63, 83, 103])

>>>c = a-2*b

>>>c

array([ 8, 16, 24, 32, 40])

>>> c % 3

array([2, 1, 0, 2, 1], dtype=int32)

>>> matrix = np.random.rand(3,5) *100+50

array([[102.67458931,	130.42761583,	140.99279117,	82.19500047,	137.34975161],
[145.03804322,	53.90727239,	146.04476238,	114.44195163,	70.12476056],
[122.61003385,	112.72862743,	87.3816173 ,	106.72014036,	70.37626365]])
>>> matrix = np.round(matrix, 2)