🤵🏿 🤟🏿 🌿 Let's write and understand the Decision Tree in Python from scratch! Part 3. Library for Pandas Data Analysis 💃 🍌 🕖

Hello, Habr! I present to your attention the translation of the article " Python で 0 からディシジョンツリーを作って理解する（3. データ分析ライブラリ Pandas 編） ".

This is the third article in a series. Links to previous articles: first , second

In this article, I will explain how to work with the Pandas library to create a Decision Tree.

3.1 Importing the library

#  pandas  ,        pd
import pandas as pd

3.2 Data frame and Series

Pandas uses structures such as Data frame and Series.

Let's take a look at the following Excel-like table.

One row of data is called Series, the columns are called the attributes of this data, and the entire table is called the Data frame.

3.3 Create Data frame

We connect an Excel spreadsheet using read_excel or ExcelWriter

#  Excel   ,     ipynb
df0 = pd.read_excel("data_golf.xlsx")
 
#  DataFrame  HTML  
from IPython.display import HTML
html = "<div style='font-family:\"メイリオ\";'>"+df0.to_html()+"</div>"
HTML(html)
 
#   Excel  (with   f.close)
with pd.ExcelWriter("data_golf2.xlsx") as f: 
       df0.to_excel(f)

Creating a Data Frame from a Dictionary (Associative Array): Dictionary brings together the data of the DataFrame columns

#   :    
 
d = {
    "":["","","","","","","","","","","","","",""],
    "":["","","","","","","","","","","","","",""], 
    "":["","","","","","","","","","","","","",""],
    "":["","","","","","","","","","","","","",""],
 
"":["×","×","○","○","○","×","○","×","○","○","○","○","○","×"],
}
df0 = pd.DataFrame(d)

Creating Data Frames from Arrays: Collecting Data from DataFrame Rows

#   :     
d = [["","","","","×"],
     ["","","","","×"],
     ["","","","","○"],
     ["","","","","○"],
     ["","","","","○"],
     ["","","","","×"],
     ["","","","","○"],
     ["","","","","×"],
     ["","","","","○"],
     ["","","","","○"],
     ["","","","","○"],
     ["","","","","○"],
     ["","","","","○"],
     ["","","","","×"],
    ]
#        columns  index .  ,   ,    .

df0 = pd.DataFrame(d,columns=["","","","",""],index=range(len(d)))

3.4 Getting information from the table

#    
 
#    
print(df0.shape) #  (14, 5)
 
#    
print(df0.shape[0]) #  14
 
#   
print(df0.columns) #  Index(['', '', '', '', ''], dtype='object')
 
#    (  df0 -    ）
print(df0.index) #  RangeIndex(start=0, stop=14, step=1)

3.5 Retrieving loc iloc values

#  
 
#  ,    
#       №1 ( )
print(df0.loc[1,""]) #  

#  ,       
#        1,2,4,      Data Frame-  
df = df0.loc[[1,2,4],["",""]]
print(df)
# 
#                    
# 1                               ×
# 2                        ○
# 3                            ○
# 4                            ○

# iloc     .    0.
#      1  3,    .   iloc  ,   1:4,  4-   . 
df = df0.iloc[1:4,:-1]
print(df)
# 
#                
# 1                              
# 2                    
# 3                        


#      (Series)
#      . s  Series
s = df0.iloc[0,:]
#  ,    ,      s[" "]
print(s[""]) #  

#       (numpy.ndarray).
print(df0.values)

3.6 Loop through the data, going through the data with iterrows iteritems

#  ,  
#     .     .
for i,row in df0.iterrows():
    # i    ( ), row  Series
    print(i,row)
    pass

#     .    .
for i,col in df0.iteritems():
    # i   , col  Series
    print(i,col)
    pass

3.7 Frequency of value_counts

#   
#      . s  Series
s = df0.loc[:,""]

#     
print(s.value_counts())
# 
#     5
#     5
#     4
# Name: , dtype: int64

# ,   ,   “”
print(s.value_counts()[""]) #  5

3.8 Retrieving Specific Query Data

#   
#  ,   - .
print(df0.query("==''"))
# 
#                    
# 0                            ×
# 1                            ×
# 7                            ×
# 8                            ○
# 10                                ○

#  ,   - ,     
print(df0.query("=='' and =='○'"))
# 
#                    
# 8                            ○
# 10                                ○

#  ,   - ,     
print(df0.query("=='' or =='○'"))
# 
#                    
# 0                            ×
# 1                            ×
# 2                        ○
# 3                            ○
# 4                            ○
# 6                        ○
# 7                            ×
# 8                            ○
# 9                                ○
# 10                                ○
# 11                            ○
# 12                                ○

Thanks for reading!

We will be very happy if you tell us if you liked this article, was the translation clear, was it useful to you?

Let's write and understand the Decision Tree in Python from scratch! Part 3. Library for Pandas Data Analysis