Let's write and understand the Decision Tree in Python from scratch! Part 3. Library for Pandas Data Analysis

Hello, Habr! I present to your attention the translation of the article " Python ใง 0 ใ‹ ใ‚‰ ใƒ‡ ใ‚ฃ ใ‚ท ใ‚ธ ใƒง ใƒณ ใƒ„ ใƒช ใƒผ ใ‚’ ไฝœ ใฃ ใฆ ็†่งฃ ใ™ ใ‚‹ ๏ผˆ3. ใƒ‡ ใƒผ ใ‚ฟ ๅˆ†ๆž โ€‹โ€‹ใƒฉ ใ‚ค โ€‹โ€‹ใƒ– ใƒฉ ใƒช Pandas ็ทจ๏ผ‰ ".



This is the third article in a series. Links to previous articles: first , second



In this article, I will explain how to work with the Pandas library to create a Decision Tree.



3.1 Importing the library



#  pandas  ,        pd
import pandas as pd


3.2 Data frame and Series



Pandas uses structures such as Data frame and Series.

Let's take a look at the following Excel-like table.



One row of data is called Series, the columns are called the attributes of this data, and the entire table is called the Data frame.





3.3 Create Data frame



We connect an Excel spreadsheet using read_excel or ExcelWriter

#  Excel   ,     ipynb
df0 = pd.read_excel("data_golf.xlsx")
 
#  DataFrame  HTML  
from IPython.display import HTML
html = "<div style='font-family:\"ใƒกใ‚คใƒชใ‚ช\";'>"+df0.to_html()+"</div>"
HTML(html)
 
#   Excel  (with   f.close)
with pd.ExcelWriter("data_golf2.xlsx") as f: 
       df0.to_excel(f)


Creating a Data Frame from a Dictionary (Associative Array): Dictionary brings together the data of the DataFrame columns



#   :    
 
d = {
    "":["","","","","","","","","","","","","",""],
    "":["","","","","","","","","","","","","",""], 
    "":["","","","","","","","","","","","","",""],
    "":["","","","","","","","","","","","","",""],
 
"":["ร—","ร—","โ—‹","โ—‹","โ—‹","ร—","โ—‹","ร—","โ—‹","โ—‹","โ—‹","โ—‹","โ—‹","ร—"],
}
df0 = pd.DataFrame(d)


Creating Data Frames from Arrays: Collecting Data from DataFrame Rows



#   :     
d = [["","","","","ร—"],
     ["","","","","ร—"],
     ["","","","","โ—‹"],
     ["","","","","โ—‹"],
     ["","","","","โ—‹"],
     ["","","","","ร—"],
     ["","","","","โ—‹"],
     ["","","","","ร—"],
     ["","","","","โ—‹"],
     ["","","","","โ—‹"],
     ["","","","","โ—‹"],
     ["","","","","โ—‹"],
     ["","","","","โ—‹"],
     ["","","","","ร—"],
    ]
#        columns  index .  ,   ,    .

df0 = pd.DataFrame(d,columns=["","","","",""],index=range(len(d)))


3.4 Getting information from the table



#    
 
#    
print(df0.shape) #  (14, 5)
 
#    
print(df0.shape[0]) #  14
 
#   
print(df0.columns) #  Index(['', '', '', '', ''], dtype='object')
 
#    (  df0 -    ๏ผ‰
print(df0.index) #  RangeIndex(start=0, stop=14, step=1)


3.5 Retrieving loc iloc values



#  
 
#  ,    
#       โ„–1 ( )
print(df0.loc[1,""]) #  

#  ,       
#        1,2,4,      Data Frame-  
df = df0.loc[[1,2,4],["",""]]
print(df)
# 
#                    
# 1                               ร—
# 2                        โ—‹
# 3                            โ—‹
# 4                            โ—‹

# iloc     .    0.
#      1  3,    .   iloc  ,   1:4,  4-   . 
df = df0.iloc[1:4,:-1]
print(df)
# 
#                
# 1                              
# 2                    
# 3                        


#      (Series)
#      . s  Series
s = df0.iloc[0,:]
#  ,    ,      s[" "]
print(s[""]) #  

#       (numpy.ndarray).
print(df0.values)


3.6 Loop through the data, going through the data with iterrows iteritems



#  ,  
#     .     .
for i,row in df0.iterrows():
    # i    ( ), row  Series
    print(i,row)
    pass

#     .    .
for i,col in df0.iteritems():
    # i   , col  Series
    print(i,col)
    pass


3.7 Frequency of value_counts



#   
#      . s  Series
s = df0.loc[:,""]

#     
print(s.value_counts())
# 
#     5
#     5
#     4
# Name: , dtype: int64

# ,   ,   โ€œโ€
print(s.value_counts()[""]) #  5


3.8 Retrieving Specific Query Data



#   
#  ,   - .
print(df0.query("==''"))
# 
#                    
# 0                            ร—
# 1                            ร—
# 7                            ร—
# 8                            โ—‹
# 10                                โ—‹

#  ,   - ,     
print(df0.query("=='' and =='โ—‹'"))
# 
#                    
# 8                            โ—‹
# 10                                โ—‹

#  ,   - ,     
print(df0.query("=='' or =='โ—‹'"))
# 
#                    
# 0                            ร—
# 1                            ร—
# 2                        โ—‹
# 3                            โ—‹
# 4                            โ—‹
# 6                        โ—‹
# 7                            ร—
# 8                            โ—‹
# 9                                โ—‹
# 10                                โ—‹
# 11                            โ—‹
# 12                                โ—‹


Thanks for reading!



We will be very happy if you tell us if you liked this article, was the translation clear, was it useful to you?



All Articles