Overview

Dataset statistics

Number of variables47
Number of observations10000
Missing cells157882
Missing cells (%)33.6%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory3.9 MiB
Average record size in memory414.0 B

Variable types

Numeric15
Categorical13
Text7
Unsupported10
DateTime1
Boolean1

Dataset

Description6270000_대구광역시_07_22_03_P_건강기능식품일반판매업_4월
Author대구광역시
URLhttp://data.daegu.go.kr/open/data/dataView.do?dataSetId=DMI_0000084950&dataSetDetailId=DDI_0000084971&provdMethod=FILE

Alerts

개방서비스명 has constant value ""Constant
개방서비스ID has constant value ""Constant
다중이용업소여부 has constant value ""Constant
남성종사자수 is highly imbalanced (98.3%)Imbalance
여성종사자수 is highly imbalanced (98.3%)Imbalance
급수시설구분명 is highly imbalanced (66.2%)Imbalance
공장생산직종업원수 is highly imbalanced (62.0%)Imbalance
인허가취소일자 has 10000 (100.0%) missing valuesMissing
폐업일자 has 2538 (25.4%) missing valuesMissing
휴업시작일자 has 10000 (100.0%) missing valuesMissing
휴업종료일자 has 10000 (100.0%) missing valuesMissing
재개업일자 has 10000 (100.0%) missing valuesMissing
소재지전화 has 4628 (46.3%) missing valuesMissing
소재지면적 has 3655 (36.5%) missing valuesMissing
도로명전체주소 has 2289 (22.9%) missing valuesMissing
도로명우편번호 has 2362 (23.6%) missing valuesMissing
업태구분명 has 10000 (100.0%) missing valuesMissing
좌표정보(X) has 195 (1.9%) missing valuesMissing
좌표정보(Y) has 195 (1.9%) missing valuesMissing
영업장주변구분명 has 10000 (100.0%) missing valuesMissing
등급구분명 has 10000 (100.0%) missing valuesMissing
총종업원수 has 10000 (100.0%) missing valuesMissing
본사종업원수 has 4015 (40.2%) missing valuesMissing
공장사무직종업원수 has 4016 (40.2%) missing valuesMissing
공장판매직종업원수 has 4014 (40.1%) missing valuesMissing
보증액 has 9955 (99.6%) missing valuesMissing
월세액 has 9954 (99.5%) missing valuesMissing
전통업소지정번호 has 10000 (100.0%) missing valuesMissing
전통업소주된음식 has 10000 (100.0%) missing valuesMissing
홈페이지 has 9972 (99.7%) missing valuesMissing
소재지우편번호 is highly skewed (γ1 = -43.08182548)Skewed
본사종업원수 is highly skewed (γ1 = 21.58804641)Skewed
공장사무직종업원수 is highly skewed (γ1 = 21.08179218)Skewed
시설총규모 is highly skewed (γ1 = 36.43189737)Skewed
번호 has unique valuesUnique
관리번호 has unique valuesUnique
인허가취소일자 is an unsupported type, check if it needs cleaning or further analysisUnsupported
휴업시작일자 is an unsupported type, check if it needs cleaning or further analysisUnsupported
휴업종료일자 is an unsupported type, check if it needs cleaning or further analysisUnsupported
재개업일자 is an unsupported type, check if it needs cleaning or further analysisUnsupported
업태구분명 is an unsupported type, check if it needs cleaning or further analysisUnsupported
영업장주변구분명 is an unsupported type, check if it needs cleaning or further analysisUnsupported
등급구분명 is an unsupported type, check if it needs cleaning or further analysisUnsupported
총종업원수 is an unsupported type, check if it needs cleaning or further analysisUnsupported
전통업소지정번호 is an unsupported type, check if it needs cleaning or further analysisUnsupported
전통업소주된음식 is an unsupported type, check if it needs cleaning or further analysisUnsupported
본사종업원수 has 5946 (59.5%) zerosZeros
공장사무직종업원수 has 5733 (57.3%) zerosZeros
공장판매직종업원수 has 5163 (51.6%) zerosZeros
시설총규모 has 9840 (98.4%) zerosZeros

Reproduction

Analysis started2023-12-10 19:18:05.110120
Analysis finished2023-12-10 19:18:08.731663
Duration3.62 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

번호
Real number (ℝ)

UNIQUE 

Distinct10000
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean6689.123
Minimum1
Maximum13379
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-11T04:18:08.891585image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile675.9
Q13356.75
median6681
Q310001.25
95-th percentile12715.05
Maximum13379
Range13378
Interquartile range (IQR)6644.5

Descriptive statistics

Standard deviation3852.0761
Coefficient of variation (CV)0.57587162
Kurtosis-1.1926845
Mean6689.123
Median Absolute Deviation (MAD)3322.5
Skewness-0.00021480162
Sum66891230
Variance14838490
MonotonicityNot monotonic
2023-12-11T04:18:09.139943image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
12577 1
 
< 0.1%
12471 1
 
< 0.1%
12045 1
 
< 0.1%
4938 1
 
< 0.1%
8303 1
 
< 0.1%
9759 1
 
< 0.1%
8393 1
 
< 0.1%
13196 1
 
< 0.1%
5420 1
 
< 0.1%
6463 1
 
< 0.1%
Other values (9990) 9990
99.9%
ValueCountFrequency (%)
1 1
< 0.1%
3 1
< 0.1%
4 1
< 0.1%
6 1
< 0.1%
7 1
< 0.1%
8 1
< 0.1%
9 1
< 0.1%
10 1
< 0.1%
11 1
< 0.1%
12 1
< 0.1%
ValueCountFrequency (%)
13379 1
< 0.1%
13378 1
< 0.1%
13377 1
< 0.1%
13376 1
< 0.1%
13375 1
< 0.1%
13373 1
< 0.1%
13370 1
< 0.1%
13368 1
< 0.1%
13366 1
< 0.1%
13365 1
< 0.1%

개방서비스명
Categorical

CONSTANT 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
건강기능식품일반판매업
10000 

Length

Max length11
Median length11
Mean length11
Min length11

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row건강기능식품일반판매업
2nd row건강기능식품일반판매업
3rd row건강기능식품일반판매업
4th row건강기능식품일반판매업
5th row건강기능식품일반판매업

Common Values

ValueCountFrequency (%)
건강기능식품일반판매업 10000
100.0%

Length

2023-12-11T04:18:09.364254image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-11T04:18:09.522493image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
건강기능식품일반판매업 10000
100.0%

개방서비스ID
Categorical

CONSTANT 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
07_22_03_P
10000 

Length

Max length10
Median length10
Mean length10
Min length10

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row07_22_03_P
2nd row07_22_03_P
3rd row07_22_03_P
4th row07_22_03_P
5th row07_22_03_P

Common Values

ValueCountFrequency (%)
07_22_03_P 10000
100.0%

Length

2023-12-11T04:18:09.655416image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-11T04:18:09.799113image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
07_22_03_p 10000
100.0%

개방자치단체코드
Real number (ℝ)

Distinct8
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3448096
Minimum3410000
Maximum3480000
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-11T04:18:09.939078image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum3410000
5-th percentile3410000
Q13430000
median3450000
Q33470000
95-th percentile3480000
Maximum3480000
Range70000
Interquartile range (IQR)40000

Descriptive statistics

Standard deviation21106.381
Coefficient of variation (CV)0.0061211699
Kurtosis-1.0817732
Mean3448096
Median Absolute Deviation (MAD)20000
Skewness-0.41454521
Sum3.448096 × 1010
Variance4.4547933 × 108
MonotonicityNot monotonic
2023-12-11T04:18:10.104157image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=8)
ValueCountFrequency (%)
3470000 2173
21.7%
3460000 1909
19.1%
3450000 1674
16.7%
3420000 1365
13.7%
3410000 851
 
8.5%
3440000 776
 
7.8%
3430000 728
 
7.3%
3480000 524
 
5.2%
ValueCountFrequency (%)
3410000 851
 
8.5%
3420000 1365
13.7%
3430000 728
 
7.3%
3440000 776
 
7.8%
3450000 1674
16.7%
3460000 1909
19.1%
3470000 2173
21.7%
3480000 524
 
5.2%
ValueCountFrequency (%)
3480000 524
 
5.2%
3470000 2173
21.7%
3460000 1909
19.1%
3450000 1674
16.7%
3440000 776
 
7.8%
3430000 728
 
7.3%
3420000 1365
13.7%
3410000 851
 
8.5%

관리번호
Text

UNIQUE 

Distinct10000
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2023-12-11T04:18:10.376852image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length22
Median length22
Mean length22
Min length22

Characters and Unicode

Total characters220000
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique10000 ?
Unique (%)100.0%

Sample

1st row3470000-134-2019-00091
2nd row3450000-134-2017-00065
3rd row3440000-134-2014-00027
4th row3480000-134-2015-00018
5th row3460000-134-2009-00038
ValueCountFrequency (%)
3470000-134-2019-00091 1
 
< 0.1%
3480000-134-2009-00006 1
 
< 0.1%
3440000-134-2016-00003 1
 
< 0.1%
3410000-134-2009-00008 1
 
< 0.1%
3470000-134-2019-00069 1
 
< 0.1%
3440000-134-2019-00028 1
 
< 0.1%
3460000-134-2010-00119 1
 
< 0.1%
3470000-134-2010-00031 1
 
< 0.1%
3460000-134-2009-00094 1
 
< 0.1%
3470000-134-2016-00008 1
 
< 0.1%
Other values (9990) 9990
99.9%
2023-12-11T04:18:10.881643image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 84410
38.4%
- 30000
 
13.6%
4 25227
 
11.5%
3 23902
 
10.9%
1 21988
 
10.0%
2 15056
 
6.8%
7 4584
 
2.1%
5 4537
 
2.1%
6 4415
 
2.0%
8 2965
 
1.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 190000
86.4%
Dash Punctuation 30000
 
13.6%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 84410
44.4%
4 25227
 
13.3%
3 23902
 
12.6%
1 21988
 
11.6%
2 15056
 
7.9%
7 4584
 
2.4%
5 4537
 
2.4%
6 4415
 
2.3%
8 2965
 
1.6%
9 2916
 
1.5%
Dash Punctuation
ValueCountFrequency (%)
- 30000
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 220000
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 84410
38.4%
- 30000
 
13.6%
4 25227
 
11.5%
3 23902
 
10.9%
1 21988
 
10.0%
2 15056
 
6.8%
7 4584
 
2.1%
5 4537
 
2.1%
6 4415
 
2.0%
8 2965
 
1.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 220000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 84410
38.4%
- 30000
 
13.6%
4 25227
 
11.5%
3 23902
 
10.9%
1 21988
 
10.0%
2 15056
 
6.8%
7 4584
 
2.1%
5 4537
 
2.1%
6 4415
 
2.0%
8 2965
 
1.3%

인허가일자
Real number (ℝ)

Distinct3218
Distinct (%)32.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean20110044
Minimum20040204
Maximum20200429
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-11T04:18:11.108996image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum20040204
5-th percentile20040618
Q120070605
median20110428
Q320141112
95-th percentile20190612
Maximum20200429
Range160225
Interquartile range (IQR)70507.5

Descriptive statistics

Standard deviation48059.741
Coefficient of variation (CV)0.0023898377
Kurtosis-1.0278005
Mean20110044
Median Absolute Deviation (MAD)39697.5
Skewness0.096148663
Sum2.0110044 × 1011
Variance2.3097387 × 109
MonotonicityNot monotonic
2023-12-11T04:18:11.376952image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
20040618 269
 
2.7%
20040617 91
 
0.9%
20040615 40
 
0.4%
20040616 40
 
0.4%
20040614 40
 
0.4%
20131204 34
 
0.3%
20041005 32
 
0.3%
20040621 32
 
0.3%
20040831 30
 
0.3%
20040916 29
 
0.3%
Other values (3208) 9363
93.6%
ValueCountFrequency (%)
20040204 1
 
< 0.1%
20040214 1
 
< 0.1%
20040304 1
 
< 0.1%
20040320 1
 
< 0.1%
20040326 1
 
< 0.1%
20040401 1
 
< 0.1%
20040407 1
 
< 0.1%
20040408 3
< 0.1%
20040412 2
< 0.1%
20040414 2
< 0.1%
ValueCountFrequency (%)
20200429 5
0.1%
20200428 2
 
< 0.1%
20200427 7
0.1%
20200424 4
< 0.1%
20200423 2
 
< 0.1%
20200422 4
< 0.1%
20200421 4
< 0.1%
20200420 6
0.1%
20200417 3
< 0.1%
20200416 4
< 0.1%

인허가취소일자
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing10000
Missing (%)100.0%
Memory size166.0 KiB
Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
3
7462 
1
2538 

Length

Max length1
Median length1
Mean length1
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row1
3rd row3
4th row1
5th row3

Common Values

ValueCountFrequency (%)
3 7462
74.6%
1 2538
 
25.4%

Length

2023-12-11T04:18:11.647883image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-11T04:18:11.822605image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
3 7462
74.6%
1 2538
 
25.4%

영업상태명
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
폐업
7462 
영업/정상
2538 

Length

Max length5
Median length2
Mean length2.7614
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row영업/정상
2nd row영업/정상
3rd row폐업
4th row영업/정상
5th row폐업

Common Values

ValueCountFrequency (%)
폐업 7462
74.6%
영업/정상 2538
 
25.4%

Length

2023-12-11T04:18:12.030324image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-11T04:18:12.201012image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
폐업 7462
74.6%
영업/정상 2538
 
25.4%
Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2
7462 
1
2538 

Length

Max length1
Median length1
Mean length1
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row1
3rd row2
4th row1
5th row2

Common Values

ValueCountFrequency (%)
2 7462
74.6%
1 2538
 
25.4%

Length

2023-12-11T04:18:12.394087image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-11T04:18:12.590528image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
2 7462
74.6%
1 2538
 
25.4%
Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
폐업
7462 
영업
2538 

Length

Max length2
Median length2
Mean length2
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row영업
2nd row영업
3rd row폐업
4th row영업
5th row폐업

Common Values

ValueCountFrequency (%)
폐업 7462
74.6%
영업 2538
 
25.4%

Length

2023-12-11T04:18:12.793113image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-11T04:18:12.971308image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
폐업 7462
74.6%
영업 2538
 
25.4%

폐업일자
Real number (ℝ)

MISSING 

Distinct2797
Distinct (%)37.5%
Missing2538
Missing (%)25.4%
Infinite0
Infinite (%)0.0%
Mean20139956
Minimum20040419
Maximum20200429
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-11T04:18:13.169638image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum20040419
5-th percentile20060918
Q120110425
median20150130
Q320171222
95-th percentile20191112
Maximum20200429
Range160010
Interquartile range (IQR)60797

Descriptive statistics

Standard deviation41344.317
Coefficient of variation (CV)0.0020528504
Kurtosis-0.80478798
Mean20139956
Median Absolute Deviation (MAD)30190
Skewness-0.51074793
Sum1.5028435 × 1011
Variance1.7093525 × 109
MonotonicityNot monotonic
2023-12-11T04:18:13.702173image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
20171229 55
 
0.5%
20171228 44
 
0.4%
20171205 38
 
0.4%
20190531 31
 
0.3%
20171222 29
 
0.3%
20171211 29
 
0.3%
20171221 29
 
0.3%
20181203 27
 
0.3%
20191231 26
 
0.3%
20191105 24
 
0.2%
Other values (2787) 7130
71.3%
(Missing) 2538
 
25.4%
ValueCountFrequency (%)
20040419 1
< 0.1%
20040610 1
< 0.1%
20040624 2
< 0.1%
20040701 1
< 0.1%
20040713 1
< 0.1%
20040714 1
< 0.1%
20040715 2
< 0.1%
20040722 1
< 0.1%
20040802 1
< 0.1%
20040809 1
< 0.1%
ValueCountFrequency (%)
20200429 1
 
< 0.1%
20200428 2
< 0.1%
20200424 3
< 0.1%
20200423 4
< 0.1%
20200421 2
< 0.1%
20200420 1
 
< 0.1%
20200413 2
< 0.1%
20200410 1
 
< 0.1%
20200409 1
 
< 0.1%
20200408 1
 
< 0.1%

휴업시작일자
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing10000
Missing (%)100.0%
Memory size166.0 KiB

휴업종료일자
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing10000
Missing (%)100.0%
Memory size166.0 KiB

재개업일자
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing10000
Missing (%)100.0%
Memory size166.0 KiB

소재지전화
Text

MISSING 

Distinct4908
Distinct (%)91.4%
Missing4628
Missing (%)46.3%
Memory size156.2 KiB
2023-12-11T04:18:14.299353image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/