Overview

Dataset statistics

Number of variables15
Number of observations10000
Missing cells21811
Missing cells (%)14.5%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory1.3 MiB
Average record size in memory132.0 B

Variable types

Text4
Categorical6
Unsupported1
Numeric4

Dataset

Description관리번호,점용종류,점용구분,세입구분,시도,시군구,법정동,본번,부번,점용시작일,점용종료일,도로명주소,도로명주소 본번,도로명주소 부번,점용면적
Author서울특별시
URLhttps://data.seoul.go.kr/dataList/OA-17393/S/1/datasetView.do

Alerts

도로명주소 부번 is highly overall correlated with 점용종료일 and 2 other fieldsHigh correlation
세입구분 is highly overall correlated with 시도High correlation
시도 is highly overall correlated with 부번 and 8 other fieldsHigh correlation
점용종류 is highly overall correlated with 시도High correlation
점용구분 is highly overall correlated with 시도High correlation
시군구 is highly overall correlated with 시도High correlation
부번 is highly overall correlated with 시도High correlation
점용시작일 is highly overall correlated with 점용종료일 and 1 other fieldsHigh correlation
점용종료일 is highly overall correlated with 점용시작일 and 2 other fieldsHigh correlation
점용면적 is highly overall correlated with 시도 and 1 other fieldsHigh correlation
점용종류 is highly imbalanced (71.1%)Imbalance
시도 is highly imbalanced (93.8%)Imbalance
도로명주소 부번 is highly imbalanced (92.8%)Imbalance
부번 has 1414 (14.1%) missing valuesMissing
점용시작일 has 2283 (22.8%) missing valuesMissing
점용종료일 has 2284 (22.8%) missing valuesMissing
도로명주소 has 7610 (76.1%) missing valuesMissing
도로명주소 본번 has 8146 (81.5%) missing valuesMissing
점용종료일 is highly skewed (γ1 = 55.85319521)Skewed
점용면적 is highly skewed (γ1 = 42.17333817)Skewed
관리번호 has unique valuesUnique
본번 is an unsupported type, check if it needs cleaning or further analysisUnsupported
부번 has 102 (1.0%) zerosZeros

Reproduction

Analysis started2024-07-20 09:58:57.612320
Analysis finished2024-07-20 09:59:07.101593
Duration9.49 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

관리번호
Text

UNIQUE 

Distinct10000
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-07-20T18:59:07.523722image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length20
Median length20
Mean length20
Min length20

Characters and Unicode

Total characters200000
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique10000 ?
Unique (%)100.0%

Sample

1st row200911215103_1_00115
2nd row201811680101_1_27023
3rd row200911380107_1_00298
4th row201511230104_1_14595
5th row201211680106_1_12938
ValueCountFrequency (%)
200911215103_1_00115 1
 
< 0.1%
201211215107_1_13782 1
 
< 0.1%
200911500101_1_00078 1
 
< 0.1%
201211620101_5_17627 1
 
< 0.1%
200911440120_1_00196 1
 
< 0.1%
201311680106_1_13237 1
 
< 0.1%
200911650101_1_00080 1
 
< 0.1%
201111230109_1_11878 1
 
< 0.1%
200911110135_5_00213 1
 
< 0.1%
201511260101_1_21620 1
 
< 0.1%
Other values (9990) 9990
99.9%
2024-07-20T18:59:08.538496image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1 61308
30.7%
0 46077
23.0%
2 21264
 
10.6%
_ 20000
 
10.0%
3 8772
 
4.4%
5 8394
 
4.2%
9 8321
 
4.2%
6 7291
 
3.6%
4 6802
 
3.4%
8 6132
 
3.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 180000
90.0%
Connector Punctuation 20000
 
10.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 61308
34.1%
0 46077
25.6%
2 21264
 
11.8%
3 8772
 
4.9%
5 8394
 
4.7%
9 8321
 
4.6%
6 7291
 
4.1%
4 6802
 
3.8%
8 6132
 
3.4%
7 5639
 
3.1%
Connector Punctuation
ValueCountFrequency (%)
_ 20000
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 200000
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1 61308
30.7%
0 46077
23.0%
2 21264
 
10.6%
_ 20000
 
10.0%
3 8772
 
4.4%
5 8394
 
4.2%
9 8321
 
4.2%
6 7291
 
3.6%
4 6802
 
3.4%
8 6132
 
3.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 200000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1 61308
30.7%
0 46077
23.0%
2 21264
 
10.6%
_ 20000
 
10.0%
3 8772
 
4.4%
5 8394
 
4.2%
9 8321
 
4.2%
6 7291
 
3.6%
4 6802
 
3.4%
8 6132
 
3.1%

점용종류
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
도로
9252 
구거
 
408
하천
 
340

Length

Max length2
Median length2
Mean length2
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row도로
2nd row도로
3rd row도로
4th row도로
5th row도로

Common Values

ValueCountFrequency (%)
도로 9252
92.5%
구거 408
 
4.1%
하천 340
 
3.4%

Length

2024-07-20T18:59:09.024817image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-07-20T18:59:09.398442image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
도로 9252
92.5%
구거 408
 
4.1%
하천 340
 
3.4%

점용구분
Categorical

HIGH CORRELATION 

Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
계속
4271 
폐쇄
2786 
일시
2119 
무단
824 

Length

Max length2
Median length2
Mean length2
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row계속
2nd row일시
3rd row계속
4th row폐쇄
5th row폐쇄

Common Values

ValueCountFrequency (%)
계속 4271
42.7%
폐쇄 2786
27.9%
일시 2119
21.2%
무단 824
 
8.2%

Length

2024-07-20T18:59:09.788593image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-07-20T18:59:10.157255image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
계속 4271
42.7%
폐쇄 2786
27.9%
일시 2119
21.2%
무단 824
 
8.2%

세입구분
Categorical

HIGH CORRELATION 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
구세
6029 
시세
3971 

Length

Max length2
Median length2
Mean length2
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row시세
2nd row구세
3rd row시세
4th row시세
5th row구세

Common Values

ValueCountFrequency (%)
구세 6029
60.3%
시세 3971
39.7%

Length

2024-07-20T18:59:10.593345image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-07-20T18:59:10.954975image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
구세 6029
60.3%
시세 3971
39.7%

시도
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
서울
9927 
<NA>
 
73

Length

Max length4
Median length2
Mean length2.0146
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row서울
2nd row서울
3rd row서울
4th row서울
5th row서울

Common Values

ValueCountFrequency (%)
서울 9927
99.3%
<NA> 73
 
0.7%

Length

2024-07-20T18:59:11.373746image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-07-20T18:59:11.981148image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
서울 9927
99.3%
na 73
 
0.7%

시군구
Categorical

HIGH CORRELATION 

Distinct26
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
종로구
1114 
강남구
1080 
관악구
683 
강서구
 
567
동대문구
 
550
Other values (21)
6006 

Length

Max length4
Median length3
Mean length3.1012
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row광진구
2nd row강남구
3rd row은평구
4th row동대문구
5th row강남구

Common Values

ValueCountFrequency (%)
종로구 1114
 
11.1%
강남구 1080
 
10.8%
관악구 683
 
6.8%
강서구 567
 
5.7%
동대문구 550
 
5.5%
성북구 504
 
5.0%
영등포구 457
 
4.6%
송파구 411
 
4.1%
마포구 390
 
3.9%
중구 388
 
3.9%
Other values (16) 3856
38.6%

Length

2024-07-20T18:59:12.500215image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
종로구 1114
 
11.1%
강남구 1080
 
10.8%
관악구 683
 
6.8%
강서구 567
 
5.7%
동대문구 550
 
5.5%
성북구 504
 
5.0%
영등포구 457
 
4.6%
송파구 411
 
4.1%
마포구 390
 
3.9%
중구 388
 
3.9%
Other values (16) 3856
38.6%
Distinct434
Distinct (%)4.4%
Missing73
Missing (%)0.7%
Memory size156.2 KiB
2024-07-20T18:59:13.344199image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length6
Median length3
Mean length3.2130553
Min length2

Characters and Unicode

Total characters31896
Distinct characters211
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique31 ?
Unique (%)0.3%

Sample

1st row구의동
2nd row역삼동
3rd row응암동
4th row전농동
5th row대치동
ValueCountFrequency (%)
신림동 409
 
4.1%
화곡동 316
 
3.2%
봉천동 254
 
2.6%
역삼동 253
 
2.5%
논현동 192
 
1.9%
삼성동 156
 
1.6%
신사동 126
 
1.3%
청담동 124
 
1.2%
수유동 115
 
1.2%
정릉동 114
 
1.1%
Other values (424) 7868
79.3%
2024-07-20T18:59:14.986632image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
9641
30.2%
1213
 
3.8%
1080
 
3.4%
614
 
1.9%
509
 
1.6%
495
 
1.6%
458
 
1.4%
387
 
1.2%
381
 
1.2%
361
 
1.1%
Other values (201) 16757
52.5%

Most occurring categories

ValueCountFrequency (%)
Other Letter 30894
96.9%
Decimal Number 1002
 
3.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
9641
31.2%
1213
 
3.9%
1080
 
3.5%
614
 
2.0%
509
 
1.6%
495
 
1.6%
458
 
1.5%
387
 
1.3%
381
 
1.2%
361
 
1.2%
Other values (193) 15755
51.0%
Decimal Number
ValueCountFrequency (%)
1 229
22.9%
2 228
22.8%
3 193
19.3%
4 114
11.4%
5 96
9.6%
7 81
 
8.1%
6 53
 
5.3%
8 8
 
0.8%

Most occurring scripts

ValueCountFrequency (%)
Hangul 30894
96.9%
Common 1002
 
3.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
9641
31.2%
1213
 
3.9%
1080
 
3.5%
614
 
2.0%
509
 
1.6%
495
 
1.6%
458
 
1.5%
387
 
1.3%
381
 
1.2%
361
 
1.2%
Other values (193) 15755
51.0%
Common
ValueCountFrequency (%)
1 229
22.9%
2 228
22.8%
3 193
19.3%
4 114
11.4%
5 96
9.6%
7 81
 
8.1%
6 53
 
5.3%
8 8
 
0.8%

Most occurring blocks

ValueCountFrequency (%)
Hangul 30894
96.9%
ASCII 1002
 
3.1%

Most frequent character per block

Hangul
ValueCountFrequency (%)
9641
31.2%
1213
 
3.9%
1080
 
3.5%
614
 
2.0%
509
 
1.6%
495
 
1.6%
458
 
1.5%
387
 
1.3%
381
 
1.2%
361
 
1.2%
Other values (193) 15755
51.0%
ASCII
ValueCountFrequency (%)
1 229
22.9%
2 228
22.8%
3 193
19.3%
4 114
11.4%
5 96
9.6%
7 81
 
8.1%
6 53
 
5.3%
8 8
 
0.8%

본번
Unsupported

REJECTED  UNSUPPORTED 

Missing0
Missing (%)0.0%
Memory size156.2 KiB

부번
Real number (ℝ)

HIGH CORRELATION  MISSING  ZEROS 

Distinct419
Distinct (%)4.9%
Missing1414
Missing (%)14.1%
Infinite0
Infinite (%)0.0%
Mean42.148498
Minimum0
Maximum3039
Zeros102
Zeros (%)1.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2024-07-20T18:59:15.622821image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile1
Q13
median10
Q327
95-th percentile157.75
Maximum3039
Range3039
Interquartile range (IQR)24

Descriptive statistics

Standard deviation145.32637
Coefficient of variation (CV)3.4479609
Kurtosis146.07098
Mean42.148498
Median Absolute Deviation (MAD)8
Skewness10.234007
Sum361887
Variance21119.754
MonotonicityNot monotonic
2024-07-20T18:59:16.228250image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1 1025
 
10.2%
2 659
 
6.6%
3 495
 
5.0%
4 435
 
4.3%
5 372
 
3.7%
6 346
 
3.5%
7 260
 
2.6%
8 255
 
2.5%
9 229
 
2.3%
11 206
 
2.1%
Other values (409) 4304
43.0%
(Missing) 1414
 
14.1%
ValueCountFrequency (%)
0 102
 
1.0%
1 1025
10.2%
2 659
6.6%
3 495
5.0%
4 435
4.3%
5 372
 
3.7%
6 346
 
3.5%
7 260
 
2.6%
8 255
 
2.5%
9 229
 
2.3%
ValueCountFrequency (%)
3039 1
< 0.1%
3036 1
< 0.1%
3005 1
< 0.1%
2670 1
< 0.1%
2660 1
< 0.1%
2588 1
< 0.1%
2471 1
< 0.1%
2094 1
< 0.1%
2050 1
< 0.1%
1839 1
< 0.1%

점용시작일
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct917
Distinct (%)11.9%
Missing2283
Missing (%)22.8%
Infinite0
Infinite (%)0.0%
Mean20172170
Minimum19961101
Maximum20241201
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2024-07-20T18:59:16.699833image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum19961101
5-th percentile20080101
Q120130101
median20180101
Q320220101
95-th percentile20240101
Maximum20241201
Range280100
Interquartile range (IQR)90000

Descriptive statistics

Standard deviation55111.817
Coefficient of variation (CV)0.0027320719
Kurtosis-1.0073214
Mean20172170
Median Absolute Deviation (MAD)50000
Skewness-0.43619034
Sum1.5566863 × 1011
Variance3.0373124 × 109
MonotonicityNot monotonic
2024-07-20T18:59:17.292538image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/