# Comparing NumPy efficiency vs Pure Python

##### Files associated with this lesson:

Lecture.ipynb

# Comparing NumPy efficiency vs Pure Python's¶

```
import sys
import numpy as np
```

### Hands on!¶

As we have already told you, `numpy`

will make your array-processing code more efficient. But the question is, *how much more efficient?* You'd be surprised with some of the results; sometimes `numpy`

optimizations improve the speed of your code by 100x or 1000x. For more information, watch this video by Jake VanderPlas: Performance Python: Seven Strategies for Optimizing Your Numerical Code.

#### Size of objects¶

As we've discussed before, Python numbers are "boxed", which means that, an integer not only contains the actual value of the int, but a lot of extra information about the object. So, for example, what would be the *accepted* size of an integer in bytes? 2? 4? We can see the real value with this function:

```
sys.getsizeof(1)
```

28 bytes! That's a lot of memory for just one tiny `int`

. With larger numbers, it gets worse:

```
sys.getsizeof(10**100)
```

Numpy numbers a lot more efficient in space; they're mapped closer to their C representation, they're also fixed and we have the chance to pick the correct one based on our implementation.

For example, the default numpy `int`

takes 8 bytes of memory:

```
np.dtype(np.int).itemsize
```

`np.int`

is the "default" int in numpy, it's just an "alias" for `np.int64`

or `np.int32`

, in this platform, it's `np.int64`

:

```
np.dtype(np.int) == np.dtype(np.int64)
```

That's why it takes 8 bytes: `8 x 8 = 64`

.

Numpy also offers more granularity when picking the correct type of our arrays, for example, you can create an "unsigned 8bit int", that just takes 1 byte in memory:

```
np.dtype(np.uint8).itemsize
```

That means that, if you're dealing with small numbers (`0-255`

) you can save a lot of memory space. As a reminder, to set the type of an array, just use the `dtype`

attribute:

```
np.array([0, 255], dtype=np.uint8)
```

You have to be aware of the "limits" of that type. For example, if we exceed the 255 limit, we get back 0:

```
np.array([0, 256], dtype=np.uint8)
```

For a complete reference of numpy data types, check this document.

### A note about performance and efficiency¶

We'll compare now the same operation both with pure python code and with Numpy, to see the real performance impact of it. What we want to do is *the sum of the squares of the elements* of an array, which in pure python looks like:

```
a = np.random.randint(1, 999, size=1_000_000)
```

```
sum([x ** 2 for x in a])
```

The same operation with numpy is performed in this way:

```
np.sum(a ** 2)
```

How much time the operation takes? We can use the `%time`

magic function to get an idea of elapsed time:

```
%time sum([x ** 2 for x in a])
```

At the time of this writing, it's `205 ms`

. This doesn't tell us much, let's compare it with numpy's version:

```
%time np.sum(a ** 2)
```

Numpy's version takes `2.7 µs`

, which is about 100 times less than the pure python version.

### Implications when coding numpy¶

In order for numpy to make these operations as efficient as possible, arrays will be allocated in contiguous positions of memory. That means that you can't just change the types of arrays or length without the need of re-mapping the entire array into another memory position:

```
arr = np.array([1, 2, 3])
```

```
arr.dtype
```

Let's try assigning an float:

```
arr[1] = 3.5
```

The array hasn't changed, the decimal part is dropped and we only keep the integer:

```
arr
```

Moreover, some operations will fail as types are incompatible:

```
arr += .5
```