[Python] 로지스틱회귀분석 (예측) 모델

Notice

Recent Posts

Recent Comments

Link

« 2026/05 »
일	월	화	수	목	금	토
					1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30
31

Tags more

Archives

Today

Total

관리 메뉴

마케팅은 학문, 데이터는 성장 동력

[Python] 로지스틱회귀분석 (예측) 모델 - iris(붓꽃) 본문

Data Analytics

[Python] 로지스틱회귀분석 (예측) 모델 - iris(붓꽃)

당신을 조아하준 2020. 11. 26. 14:51

[분석 주제]

흔하디 흔한 iris 데이터셋를 활용해 로지스틱회귀 모델로 독립변수를 토대로 Species (품종) 종속변수를 분류(예측)하라

#✅데이터 로드
from google.colab import drive
drive.mount('/content/drive')

import pandas as pd
df = pd.read_csv('/content/drive/MyDrive/my_data_set/iris.csv')

#✅EDA(탐색적 데이터 분석 수행)
print(df.info())
print(df.isna().sum())
print(df.shape)
print(df.columns)

#print(df['species'].value_counts()) #종속변수 클래스 균형 확인

#✅EDA 인사이트
#결측치 없음
#species 범주형 변수, 클래스 3개

#✅데이터 전처리
#범주형 변수 인코딩 LogisticRegression() 모델에서는 독립변수의 경우, 숫자에 크기나 의미 부여가 없는 원핫인코딩을 권장,
#변주형 변수가 종속변수인 경우 LabelEncoder로 정수형으로 변환
#라이브러리 (scikit-learn) 모듈(preprocessing) 사용

from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
df['species'] = le.fit_transform(df['species'])
#print(df.head(4))

#연속형 변수 스케일링
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler() #객체 생성

df[['sepal_length', 'sepal_width', 'petal_length', 'petal_width']] = scaler.fit_transform(df[['sepal_length', 'sepal_width', 'petal_length', 'petal_width']])
#print(df.head(3))

#✅데이터 분리 (8:2)
from sklearn.model_selection import train_test_split

X = df.drop(['species'], axis = 1)
y = df['species']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, stratify = y, random_state = 1)

#클래스 균형 유지 층화추출(stratify = y)

#✅모델 학습 및 예측
from sklearn.linear_model import LogisticRegression
model = LogisticRegression(random_state = 1, multi_class = 'multinomial', solver = 'lbfgs')
#iris 데이터셋의 종속변수(species)는 다중 클래스이므로 하이퍼파라미터 multi_class = 'multinomail', solver = 'lbfgs' 조합 권장
model.fit(X_train, y_train) #학습 수행
pred = model.predict(X_test) #예측 수행

#✅모델 성능 평가
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
acc = accuracy_score(y_test, pred)
rpt = classification_report(y_test, pred)
cm = confusion_matrix(y_test, pred)

print('acc 값:', acc)
print('rpt 값:', rpt)
print('cm 값:', cm)

#✅Completed (정확도, 정밀도, 재현율, F1 조화평균, 혼동행렬)

'Data Analytics' 카테고리의 다른 글

[Python] 파이썬 입문자를 위한 기초 실습 100제 (실전 코드 포함) (0)	2020.11.26
[Python] 파이썬 입문자용 환경 추천: PyCharm vs Colab vs VS Code (초보자 필독!) (0)	2020.11.26

'Data Analytics' Related Articles

마케팅은 학문, 데이터는 성장 동력

[Python] 로지스틱회귀분석 (예측) 모델 - iris(붓꽃) 본문

[Python] 로지스틱회귀분석 (예측) 모델 - iris(붓꽃)

'Data Analytics' 카테고리의 다른 글

티스토리툴바