EDA becomes easier with SWEETVIZ

Sweetviz is an open source Python library that generates easy-to-render reports for EDA execution with just two lines of code. The library allows you to quickly create a detailed report on all characteristics of a dataset without much effort. The capabilities of Sweetviz also include targeted analysis, comparison of two datasets, comparison of two parts of a dataset, selected according to a certain criterion, identification of correlations and associations, and sweetviz also creates allows you to create and save a report as an HTML file.





Using the library

pip :





pip install sweetviz
      
      



, :





import sweetviz as sv
      
      



.





Sweetviz DataFrame. pandas:





import pandas as pd
      
      



, , analyze(), compare() compare_intra() DataFrame. DataFrame, csv pd.read_csv():





df = pd.read_csv('PermitLog.csv')
      
      



, , 2 . , EDA.





analyze() . DataFrame ( , – ; «Permit Logs»):





report = sv.analyze([df, " Permit Logs"])
      
      



analyze() DataframeReport, report.





HTML report show_html(). , :





report.show_html('common analysis.html')
      
      



:





, . .





, «concept:name». , «Start trip», «End trip», «Permit SUBMITTED by EMPLOYEE» .. , . , – . , .





, Sweetviz . , , . «case:id» «org:role», id , , . , «case:id» «org:role».





:





df_unique_ids_roles = df[['case:id', 'org:role']].copy()
df_unique_ids_roles.drop_duplicates()
      
      



.





analyze() show_html() «Unique ids and roles»:





report = sv.analyze([df_unique_ids_roles, 'Unique ids and roles'])
report.show_html('Unique ids and roles.html')
      
      



, .





, , , , . , , , 8 :





  • EMPLOYEE,





  • UNDEFINED,





  • SUPERVISOR,





  • ADMINISTRATION,





  • BUDGET OWNER,





  • PRE_APPROVER,





  • DIRECTOR,





  • MISSING.





, («EMPLOYEE»). , .





, , , 20 % , , «UNDEFINED». Sweetviz , «org:role» , compare_intra().





compare_intra() , - . «UNDEFINED» «org:role», – .





, compare_intra() , – , – . compare_intra() DataframeReport report «Undefined role vs other.html» show_html():





report = sv.compare_intra(df, df["org:role"] == "UNDEFINED", ["Undefined role", "Other"])
report.show_html('Undefined role vs other.html')
      
      



. . , , – .





(«org:resource»), , «SYSTEM» «STAFF MEMBER», «SYSTEM» , «STAFF MEMBER» – . «concept:name» , «Request Payment» «Payment Handled».





, , , «UNDEFINED» «org:role» , .





compare_intra() Sweetviz . (, train test), Sweetviz compare(). 2018 2018 .





date_before_2018 date_after_2018:





reg_exp = '201[67]'
mask = df['time:timestamp'].str.contains(reg_exp)
date_before_2018, date_after_2018 = df[mask], df[~mask]
      
      



, . , «time:timestamp» , «2016» «2017», date_before_2018, – date_after_2018.





. compare() , . HTML show_html().





report = sv.compare([date_before_2018, 'Before 2018'], [date_after_2018, 'After 2018'])
report.show_html('date.html')
      
      



:





«case:OrganizationalEntity», , .





, :





, : 2018 , 2018 .





Sweetviz . , . .





, «case:Overspent», , . , «True» «False». Sweetviz , , «case:Overspent» . analyze() target_feat , :





report = sv.analyze([df,  'Target: case:Overspent'], target_feat='case:Overspent')
report.show_html('target overspent.html')
      
      



:





, , «case:Overspent».





:





, , , .





The Sweetviz library is powerful and extremely useful for preliminary data analysis, allowing you to visualize information with just two lines of code for comfortable exploration and identify dependencies and anomalies in large datasets. With the help of three main functions (analyze (), compare (), compare_intra ()), the library allows you to create reports that help to carry out complete and targeted analyzes and comparison of datasets or its groups.








All Articles