NumPy Typecodes Cheatsheet
If you've used NumPy long enough, you've most probably run into those incomprehensible type abbreviations and were supposedly most annoyed by them. Gradually NumPy replaces them with something more readable, but internally they are still present, and they frequently leak out here and there.
There are several ways to specify a data type in NumPy. You can do it:
- as a type object: np.array([1,2,3], dtype=np.int16)
- as a string: np.array([1,2,3], dtype='int16')
- as an array protocol abbreviation: np.array([1,2,3], dtype='>i2')
- as a NumPy "1-char string typecode": np.array([1,2,3], dtype='h')
with the last variant being the least readable of them all. A usual scenario where you might want to use array protocol is decoding raw data (as read from a file, a network bytestream, etc.).
As for "1-char string typecode" use cases, you can stumble into it when you create an array from a string:
>>> np.array('python')
array('python', dtype='<U6')
or in less obvious situations, where you least expect it.
Here's a cheatsheet summarizing the most common type abbreviations in NumPy:
You can get the preferred representation from the dtype object:
which can be helpful for example if you don’t remember what a certain abbreviation means:
>>> np.array('python').dtype.type
numpy.str_
This is a tad more readable than <U6 — though less informative.
One of the useful applications of those typecodes is to distinguish between data types:
One minor downside of this method is that bools, strings, bytes, objects, and voids (?, U, S, O, and V, respectively) don’t have dedicated keys in the dict for some reason.
Another option to tell between type categories is to check the dtype kind attribute:
A more cleaner approach to checking a datatype or a datatype group is:
>>> np.issubdtype(a.dtype, np.integer)
True
>>> np.issubdtype(a.dtype, np.floating)
False
or
>>> pd.api.types.is_integer_dtype(a.dtype)
True
>>> pd.api.types.is_float_dtype(a.dtype)
False
# this method uses a pandas library
For example, if you have an array a = np.zeros(10, dtype=np.uint8) you can use any of the following to check if it is an array of unsigned integers:
>>> if a.dtype.kind == 'u':
>>> if a.dtype.char in np.typecodes['UnsignedInteger']:
>>> if np.issubdtype(a.dtype, np.unsignedinteger):
Hopefully, this cheatsheet will help deal with NumPy data types more efficiently.