This is the third article in a series. Links to previous articles: first , second
In this article, I will explain how to work with the Pandas library to create a Decision Tree.
3.1 Importing the library
# pandas , pd
import pandas as pd
3.2 Data frame and Series
Pandas uses structures such as Data frame and Series.
Let's take a look at the following Excel-like table.
One row of data is called Series, the columns are called the attributes of this data, and the entire table is called the Data frame.
3.3 Create Data frame
We connect an Excel spreadsheet using read_excel or ExcelWriter
# Excel , ipynb
df0 = pd.read_excel("data_golf.xlsx")
# DataFrame HTML
from IPython.display import HTML
html = "<div style='font-family:\"ใกใคใชใช\";'>"+df0.to_html()+"</div>"
HTML(html)
# Excel (with f.close)
with pd.ExcelWriter("data_golf2.xlsx") as f:
df0.to_excel(f)
Creating a Data Frame from a Dictionary (Associative Array): Dictionary brings together the data of the DataFrame columns
# :
d = {
"":["","","","","","","","","","","","","",""],
"":["","","","","","","","","","","","","",""],
"":["","","","","","","","","","","","","",""],
"":["","","","","","","","","","","","","",""],
"":["ร","ร","โ","โ","โ","ร","โ","ร","โ","โ","โ","โ","โ","ร"],
}
df0 = pd.DataFrame(d)
Creating Data Frames from Arrays: Collecting Data from DataFrame Rows
# :
d = [["","","","","ร"],
["","","","","ร"],
["","","","","โ"],
["","","","","โ"],
["","","","","โ"],
["","","","","ร"],
["","","","","โ"],
["","","","","ร"],
["","","","","โ"],
["","","","","โ"],
["","","","","โ"],
["","","","","โ"],
["","","","","โ"],
["","","","","ร"],
]
# columns index . , , .
df0 = pd.DataFrame(d,columns=["","","","",""],index=range(len(d)))
3.4 Getting information from the table
#
#
print(df0.shape) # (14, 5)
#
print(df0.shape[0]) # 14
#
print(df0.columns) # Index(['', '', '', '', ''], dtype='object')
# ( df0 - ๏ผ
print(df0.index) # RangeIndex(start=0, stop=14, step=1)
3.5 Retrieving loc iloc values
#
# ,
# โ1 ( )
print(df0.loc[1,""]) #
# ,
# 1,2,4, Data Frame-
df = df0.loc[[1,2,4],["",""]]
print(df)
#
#
# 1 ร
# 2 โ
# 3 โ
# 4 โ
# iloc . 0.
# 1 3, . iloc , 1:4, 4- .
df = df0.iloc[1:4,:-1]
print(df)
#
#
# 1
# 2
# 3
# (Series)
# . s Series
s = df0.iloc[0,:]
# , , s[" "]
print(s[""]) #
# (numpy.ndarray).
print(df0.values)
3.6 Loop through the data, going through the data with iterrows iteritems
# ,
# . .
for i,row in df0.iterrows():
# i ( ), row Series
print(i,row)
pass
# . .
for i,col in df0.iteritems():
# i , col Series
print(i,col)
pass
3.7 Frequency of value_counts
#
# . s Series
s = df0.loc[:,""]
#
print(s.value_counts())
#
# 5
# 5
# 4
# Name: , dtype: int64
# , , โโ
print(s.value_counts()[""]) # 5
3.8 Retrieving Specific Query Data
#
# , - .
print(df0.query("==''"))
#
#
# 0 ร
# 1 ร
# 7 ร
# 8 โ
# 10 โ
# , - ,
print(df0.query("=='' and =='โ'"))
#
#
# 8 โ
# 10 โ
# , - ,
print(df0.query("=='' or =='โ'"))
#
#
# 0 ร
# 1 ร
# 2 โ
# 3 โ
# 4 โ
# 6 โ
# 7 ร
# 8 โ
# 9 โ
# 10 โ
# 11 โ
# 12 โ
Thanks for reading!
We will be very happy if you tell us if you liked this article, was the translation clear, was it useful to you?