Numpy 기초

불리언 인덱싱(booleam indexing)

numpy 불리언 인덱싱은 배열 각 요소의 선택 여부를 Ture, False 로 지정하는 방식입니다.
해당 인덱스의 True만을 조회합니다.

import numpy as np
import matplotlib.pyplot as plt
def pprint(arr):
    print('type : {}'.format(type(arr)))
    print('shape : {}, demention : {}, dtype : {}'.format(arr.shape, arr.ndim, arr.dtype))
    print("array's Data : \n", arr)

a1배열에서 요소의 값이 짝수인 요소들의 총합 구하기

a1 = np.arange(1, 25).reshape((4, 6))
pprint(a1)

type : <class 'numpy.ndarray'>
shape : (4, 6), demention : 2, dtype : int64
array's Data : 
 [[ 1  2  3  4  5  6]
 [ 7  8  9 10 11 12]
 [13 14 15 16 17 18]
 [19 20 21 22 23 24]]

even_arr = a1%2==0
pprint(even_arr)

type : <class 'numpy.ndarray'>
shape : (4, 6), demention : 2, dtype : bool
array's Data : 
 [[False  True False  True False  True]
 [False  True False  True False  True]
 [False  True False  True False  True]
 [False  True False  True False  True]]

# a1[a1%2==0] 과 동일한 의미이다.
a1[even_arr]

array([ 2,  4,  6,  8, 10, 12, 14, 16, 18, 20, 22, 24])

Boolean Indexing 의 응용

2014년 시애틀 강수량 데이터 사용
1년중 1월의 평균 강수량 구하기

import pandas as pd

# 데이터 로딩
rains_in_seattle = pd.read_csv('./seattle2014.csv')
rains_arr = rains_in_seattle['PRCP'].values
print('Data Sizd:', len(rains_arr))

Data Sizd: 365

# 날짜 배열
days_arr = np.arange(0, 365)

# 1월인 날을 구하기 위해 boolean indexing
condition_jan = days_arr < 31

# 40일까지 조회
condition_jan[:40]

array([ True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True, False, False, False, False, False,
       False, False, False, False])

# 1월의 강수량 추출 (40일)
rains_jan = rains_arr[condition_jan]

# 1월의 강수량 데이터 수
len(rains_jan)

# 1월 강수량 총합
np.sum(rains_jan)

# 1월 평균 강수량
np.mean(rains_jan)

30.322580645161292

원하는 달을 input으로 받아 강수량을 알려주는 프로그램 작성

days_arr = np.arange(0, 361)

month = int(input('해당 월을 숫자로 입력하시오'))

start = (month-1)*30
end = month * 30
rains_month = days_arr[start:end]

print('데이터 수 : ', len(rains_month))
print('총 강수량 : ', np.sum(rains_month))
print('평균 강수량 : ', np.mean(rains_month))

해당 월을 숫자로 입력하시오7
데이터 수 :  30
총 강수량 :  5835
평균 강수량 :  194.5

팬시 인덱싱(Fancy indexing)

배열에 인덱스 배열을 전달하여 요소를 참조하는 방법이다.

arr = np.arange(1, 25).reshape((4, 6))
pprint(arr)

type : <class 'numpy.ndarray'>
shape : (4, 6), demention : 2, dtype : int64
array's Data : 
 [[ 1  2  3  4  5  6]
 [ 7  8  9 10 11 12]
 [13 14 15 16 17 18]
 [19 20 21 22 23 24]]

fancy case 1

[arr[0,0], arr[1,2], arr[2,2], arr[3,3]]

[1, 9, 15, 22]

# (0,0), (1,1), (2,2), (3,3) 으로 배열 전달
arr[[0, 1, 2, 3], [0, 1, 2, 3]]

array([ 1,  8, 15, 22])

fancy case 2

# 전체 행에 대해 1,2 번 칼럼 참조
arr[:, [1, 2]]

array([[ 2,  3],
       [ 8,  9],
       [14, 15],
       [20, 21]])

배열 변환

배열을 변환하는 방법으로 전치, shape변환, 요소추가, 결합, 분리 등이 있다.

[numpy.ndarray 객체] . T

전치(Tranpose)는 행렬의 인덱스가 바뀌는 변환이다. (a, b) -> (b, a)로 변환된다.

a = np.random.randint(1, 10, (2, 3))
pprint(a)

type : <class 'numpy.ndarray'>
shape : (2, 3), demention : 2, dtype : int64
array's Data : 
 [[4 1 5]
 [4 2 5]]

pprint(a.T)

type : <class 'numpy.ndarray'>
shape : (3, 2), demention : 2, dtype : int64
array's Data : 
 [[4 4]
 [1 2]
 [5 5]]

배열 형태 변경

[numpy.ndarray 객체] . ravel( )

ravel 은 배열의 shape을 1차원 배열로 만든다.

a = np.random.randint(1, 10, (2, 3))
pprint(a)

type : <class 'numpy.ndarray'>
shape : (2, 3), demention : 2, dtype : int64
array's Data : 
 [[9 3 6]
 [3 5 6]]

a.ravel()

array([9, 3, 6, 3, 5, 6])

b = a.ravel()
pprint(b)

type : <class 'numpy.ndarray'>
shape : (6,), demention : 1, dtype : int64
array's Data : 
 [9 3 6 3 5 6]

b[0]=99
pprint(b)

type : <class 'numpy.ndarray'>
shape : (6,), demention : 1, dtype : int64
array's Data : 
 [99  3  6  3  5  6]

반환 행렬의 데이터를 변경하면 원래의 a 행렬도 변경된다.

pprint(a)

type : <class 'numpy.ndarray'>
shape : (2, 3), demention : 2, dtype : int64
array's Data : 
 [[99  3  6]
 [ 3  5  6]]

[numpy.ndarray 객체] . reshape( )

reshape은 데이터 변경 없이 지정된 shape으로 변환한다.

a = np.random.randint(1, 10, (2, 3))
pprint(a)

type : <class 'numpy.ndarray'>
shape : (2, 3), demention : 2, dtype : int64
array's Data : 
 [[2 2 1]
 [8 7 7]]

result = a.reshape((3, 2, 1))
pprint(result)

type : <class 'numpy.ndarray'>
shape : (3, 2, 1), demention : 3, dtype : int64
array's Data : 
 [[[2]
  [2]]

 [[1]
  [8]]

 [[7]
  [7]]]

배열 요소 추가 삭제

resize( )

np.resize(a, new_shape)
np.ndarry.resize(new_shape, refcheck=True)
배열의 shape과 크기를 변경합니다.

a = np.random.randint(1, 10, (2, 6))
pprint(a)

type : <class 'numpy.ndarray'>
shape : (2, 6), demention : 2, dtype : int64
array's Data : 
 [[6 6 3 6 4 5]
 [2 8 9 6 6 8]]

a.resize((6, 2))
pprint(a)

type : <class 'numpy.ndarray'>
shape : (6, 2), demention : 2, dtype : int64
array's Data : 
 [[6 6]
 [3 6]
 [4 5]
 [2 8]
 [9 6]
 [6 8]]

배열의 요소 수를 늘이거나 줄일 수 있다.

a = np.random.randint(1, 10, (2, 6))
pprint(a)

type : <class 'numpy.ndarray'>
shape : (2, 6), demention : 2, dtype : int64
array's Data : 
 [[4 4 1 6 6 5]
 [8 6 3 8 1 5]]

a.resize((2, 10))
pprint(a)

type : <class 'numpy.ndarray'>
shape : (2, 10), demention : 2, dtype : int64
array's Data : 
 [[4 4 1 6 6 5 8 6 3 8]
 [1 5 0 0 0 0 0 0 0 0]]

a = np.random.randint(1, 10, (2, 6))
pprint(a)

type : <class 'numpy.ndarray'>
shape : (2, 6), demention : 2, dtype : int64
array's Data : 
 [[1 6 7 3 7 5]
 [5 8 1 5 1 8]]

a.resize((3, 3))
pprint(a)

type : <class 'numpy.ndarray'>
shape : (3, 3), demention : 2, dtype : int64
array's Data : 
 [[1 6 7]
 [3 7 5]
 [5 8 1]]

append( )

np.append(arr, values, axis=None)
배열의 끝에 값을 추가

a = np.arange(1, 10).reshape(3, 3)
pprint(a)
b = np.arange(10, 19).reshape(3, 3)
pprint(b)

type : <class 'numpy.ndarray'>
shape : (3, 3), demention : 2, dtype : int64
array's Data : 
 [[1 2 3]
 [4 5 6]
 [7 8 9]]
type : <class 'numpy.ndarray'>
shape : (3, 3), demention : 2, dtype : int64
array's Data : 
 [[10 11 12]
 [13 14 15]
 [16 17 18]]

axis을 자정하지 않으면 1차원 배열로 변형되어 결함된다.

result = np.append(a, b)
pprint(result)

type : <class 'numpy.ndarray'>
shape : (18,), demention : 1, dtype : int64
array's Data : 
 [ 1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18]

pprint(a)

type : <class 'numpy.ndarray'>
shape : (3, 3), demention : 2, dtype : int64
array's Data : 
 [[1 2 3]
 [4 5 6]
 [7 8 9]]

axis=0 설정시 나머지 shape은 같아야한다. (이외는 오류 발생)

result = np.append(a, b, axis=0)
pprint(result)

type : <class 'numpy.ndarray'>
shape : (6, 3), demention : 2, dtype : int64
array's Data : 
 [[ 1  2  3]
 [ 4  5  6]
 [ 7  8  9]
 [10 11 12]
 [13 14 15]
 [16 17 18]]

different_shape_arr = np.arange(10, 20).reshape(2, 5)
pprint(different_shape_arr)

type : <class 'numpy.ndarray'>
shape : (2, 5), demention : 2, dtype : int64
array's Data : 
 [[10 11 12 13 14]
 [15 16 17 18 19]]

오류 발생 예제

np.append(a, different_shape_arr, axis=0)

---------------------------------------------------------------------------

ValueError                                Traceback (most recent call last)

<ipython-input-57-5f18e0ed6945> in <module>
----> 1 np.append(a, different_shape_arr, axis=0)

<__array_function__ internals> in append(*args, **kwargs)

~/opt/anaconda3/lib/python3.8/site-packages/numpy/lib/function_base.py in append(arr, values, axis)
   4743         values = ravel(values)
   4744         axis = arr.ndim-1
-> 4745     return concatenate((arr, values), axis=axis)
   4746 
   4747 

<__array_function__ internals> in concatenate(*args, **kwargs)

ValueError: all the input array dimensions for the concatenation axis must match exactly, but along dimension 1, the array at index 0 has size 3 and the array at index 1 has size 5

axis = 1 설정 시, shape[1]을 제외한 나머지 shape은 같아야한다.

# axis = 1
result = np.append(a, b, axis=1)
pprint(result)

type : <class 'numpy.ndarray'>
shape : (3, 6), demention : 2, dtype : int64
array's Data : 
 [[ 1  2  3 10 11 12]
 [ 4  5  6 13 14 15]
 [ 7  8  9 16 17 18]]

insert( )

np.insert(arr, obj, values, axis=None)
axis를 지정하지 않으며 1차원 배열로 변환
추가할 방향을 axis 로 지정

a = np.arange(1, 10).reshape(3, 3)
pprint(a)

type : <class 'numpy.ndarray'>
shape : (3, 3), demention : 2, dtype : int64
array's Data : 
 [[1 2 3]
 [4 5 6]
 [7 8 9]]

# a 배열을 1차원 배열로 변환하고 1번index에 999 추가
np.insert(a, 1, 999)

array([  1, 999,   2,   3,   4,   5,   6,   7,   8,   9])

# a 배열의 axis 0방향 1번 인덱스에 추가
# index가 1인 row에 999 추가
np.insert(a, 1, 999, axis=0)

array([[  1,   2,   3],
       [999, 999, 999],
       [  4,   5,   6],
       [  7,   8,   9]])

# a 배열의 axis 1 방향 1번 인덱스에 추가
# index가 1인 column에 999 추가
np.insert(a, 1, 999, axis=1)

array([[  1, 999,   2,   3],
       [  4, 999,   5,   6],
       [  7, 999,   8,   9]])

delete( )

np.delete(arr, obj, axis=None)
axis를 지정하지 않으며 1차원 배열로 변환
삭제할 방향을 axis 로 지정
delete 함수는 원본 배열을 변경하지 않으며 새로운 배열을 반환

a = np.arange(1, 10).reshape(3, 3)
pprint(a)

type : <class 'numpy.ndarray'>
shape : (3, 3), demention : 2, dtype : int64
array's Data : 
 [[1 2 3]
 [4 5 6]
 [7 8 9]]

# a 배열을 1차원 배열로 변환하고 1번 index 삭제
np.delete(a, 1)

array([1, 3, 4, 5, 6, 7, 8, 9])

# a 배열의 axis 0 방향 1번 index 행을 삭제한 배열 생성하여 반환
np.delete(a, 1, axis=0)

array([[1, 2, 3],
       [7, 8, 9]])

# a 배열의 axis 1방향 1번 index 인 열을 삭제한 배열을 생성하여 반환
np.delete(a, 1, axis=1)

array([[1, 3],
       [4, 6],
       [7, 9]])

배열 결합

배열과 배열을 결합하는 함수로 np.concatenate, np.vstack, np.hstack 가 있다.

np.concatenate

concatenate((a1, a2, …), axis=0)

a = np.arange(1, 7).reshape((2, 3))
pprint(a)
b = np.arange(7, 13).reshape((2, 3))
pprint(b)

type : <class 'numpy.ndarray'>
shape : (2, 3), demention : 2, dtype : int64
array's Data : 
 [[1 2 3]
 [4 5 6]]
type : <class 'numpy.ndarray'>
shape : (2, 3), demention : 2, dtype : int64
array's Data : 
 [[ 7  8  9]
 [10 11 12]]

# axis=0 방향으로 두 배열 결합, axis 기본값 = 0
result = np.concatenate((a, b))
result

array([[ 1,  2,  3],
       [ 4,  5,  6],
       [ 7,  8,  9],
       [10, 11, 12]])

# axis =1 방향으로 두 배열 결합
result = np.concatenate((a, b), axis=1)
result

array([[ 1,  2,  3,  7,  8,  9],
       [ 4,  5,  6, 10, 11, 12]])

np.vstack 수직 방향 배열 결합

np.vstack(tuple)
튜플로 설정된 여러 배열을 수직방향으로 연결
np.concatenate(tup, axis=0)와 동일하다.

a = np.arange(1, 7).reshape((2, 3))
pprint(a)
b = np.arange(7, 13).reshape((2, 3))
pprint(b)

type : <class 'numpy.ndarray'>
shape : (2, 3), demention : 2, dtype : int64
array's Data : 
 [[1 2 3]
 [4 5 6]]
type : <class 'numpy.ndarray'>
shape : (2, 3), demention : 2, dtype : int64
array's Data : 
 [[ 7  8  9]
 [10 11 12]]

np.vstack((a, b))

array([[ 1,  2,  3],
       [ 4,  5,  6],
       [ 7,  8,  9],
       [10, 11, 12]])

np.vstack((a, b, a, b))

array([[ 1,  2,  3],
       [ 4,  5,  6],
       [ 7,  8,  9],
       [10, 11, 12],
       [ 1,  2,  3],
       [ 4,  5,  6],
       [ 7,  8,  9],
       [10, 11, 12]])

np.vhstack 수평방향 배열 결합

np.hstack(tuple)
튜플로 설정된 여러 배열을 수평방향으로 연결(axis=1 방향)

a = np.arange(1, 7).reshape((2, 3))
pprint(a)
b = np.arange(7, 13).reshape((2, 3))
pprint(b)

type : <class 'numpy.ndarray'>
shape : (2, 3), demention : 2, dtype : int64
array's Data : 
 [[1 2 3]
 [4 5 6]]
type : <class 'numpy.ndarray'>
shape : (2, 3), demention : 2, dtype : int64
array's Data : 
 [[ 7  8  9]
 [10 11 12]]

np.hstack((a, b))

array([[ 1,  2,  3,  7,  8,  9],
       [ 4,  5,  6, 10, 11, 12]])

np.hstack((a, b, a, b, ))

array([[ 1,  2,  3,  7,  8,  9,  1,  2,  3,  7,  8,  9],
       [ 4,  5,  6, 10, 11, 12,  4,  5,  6, 10, 11, 12]])

배열 분리

np.hsplit( ) : 지정한 배열을 수평 방향으로 분할
np.vsplit( ) : 지정한 배열을 수직 방향으로 분할

np.hsplit(ary, indices_or_sections)

a = np.arange(1, 25).reshape((4, 6))
pprint(a)

type : <class 'numpy.ndarray'>
shape : (4, 6), demention : 2, dtype : int64
array's Data : 
 [[ 1  2  3  4  5  6]
 [ 7  8  9 10 11 12]
 [13 14 15 16 17 18]
 [19 20 21 22 23 24]]

result = np.hsplit(a, 2)
result

[array([[ 1,  2,  3],
        [ 7,  8,  9],
        [13, 14, 15],
        [19, 20, 21]]),
 array([[ 4,  5,  6],
        [10, 11, 12],
        [16, 17, 18],
        [22, 23, 24]])]

result = np.hsplit(a, 3)
result

[array([[ 1,  2],
        [ 7,  8],
        [13, 14],
        [19, 20]]),
 array([[ 3,  4],
        [ 9, 10],
        [15, 16],
        [21, 22]]),
 array([[ 5,  6],
        [11, 12],
        [17, 18],
        [23, 24]])]

np.hsplit(a, [1, 3, 5])

[array([[ 1],
        [ 7],
        [13],
        [19]]),
 array([[ 2,  3],
        [ 8,  9],
        [14, 15],
        [20, 21]]),
 array([[ 4,  5],
        [10, 11],
        [16, 17],
        [22, 23]]),
 array([[ 6],
        [12],
        [18],
        [24]])]

np.vsplit(ary, indices_or_sections)

a = np.arange(1, 25).reshape((4, 6))
pprint(a)

type : <class 'numpy.ndarray'>
shape : (4, 6), demention : 2, dtype : int64
array's Data : 
 [[ 1  2  3  4  5  6]
 [ 7  8  9 10 11 12]
 [13 14 15 16 17 18]
 [19 20 21 22 23 24]]

result=np.vsplit(a, 2)
result

[array([[ 1,  2,  3,  4,  5,  6],
        [ 7,  8,  9, 10, 11, 12]]),
 array([[13, 14, 15, 16, 17, 18],
        [19, 20, 21, 22, 23, 24]])]

np.array(result).shape

(2, 2, 6)

result=np.vsplit(a, 4)
result

[array([[1, 2, 3, 4, 5, 6]]),
 array([[ 7,  8,  9, 10, 11, 12]]),
 array([[13, 14, 15, 16, 17, 18]]),
 array([[19, 20, 21, 22, 23, 24]])]

np.array(result).shape

(4, 1, 6)

# row를 1, 2-3, 4번째 라인으로 구분
np.vsplit(a, [1, 3])

[array([[1, 2, 3, 4, 5, 6]]),
 array([[ 7,  8,  9, 10, 11, 12],
        [13, 14, 15, 16, 17, 18]]),
 array([[19, 20, 21, 22, 23, 24]])]

day 3

Numpy