Overview

Dataset statistics

Number of variables13
Number of observations283
Missing cells711
Missing cells (%)19.3%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory29.7 KiB
Average record size in memory107.5 B

Variable types

Text3
DateTime4
Numeric3
Categorical3

Alerts

허가용량 is highly overall correlated with 사업개시용량 and 1 other fieldsHigh correlation
사업개시용량 is highly overall correlated with 허가용량 and 2 other fieldsHigh correlation
설치면적(㎡) is highly overall correlated with 허가용량 and 1 other fieldsHigh correlation
설치위치 is highly overall correlated with 사업개시용량 and 1 other fieldsHigh correlation
지목 is highly overall correlated with 설치위치High correlation
설치위치 is highly imbalanced (70.3%)Imbalance
공사신고용량 has 152 (53.7%) missing valuesMissing
사업개시용량 has 201 (71.0%) missing valuesMissing
공사신고일(발행번호) has 153 (54.1%) missing valuesMissing
사업개시일 has 201 (71.0%) missing valuesMissing
설치면적(㎡) has 4 (1.4%) missing valuesMissing

Reproduction

Analysis started2024-01-09 22:21:51.420407
Analysis finished2024-01-09 22:21:53.014912
Duration1.59 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct279
Distinct (%)98.6%
Missing0
Missing (%)0.0%
Memory size2.3 KiB
2024-01-10T07:21:53.242069image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length18
Median length14
Mean length10.176678
Min length2

Characters and Unicode

Total characters2880
Distinct characters228
Distinct categories8 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique275 ?
Unique (%)97.2%

Sample

1st row도림태양광발전소
2nd row용신1호 태양광발전소
3rd row용신2호 태양광발전소
4th row용신3호 태양광발전소
5th row용신4호 태양광발전소
ValueCountFrequency (%)
태양광발전소 208
40.9%
충남신재생에너지 4
 
0.8%
인화2호 2
 
0.4%
태양광에너지 2
 
0.4%
관리6호 2
 
0.4%
발전소 2
 
0.4%
프라임 2
 
0.4%
대성태양광발전소 2
 
0.4%
해오름 2
 
0.4%
관리5호 2
 
0.4%
Other values (279) 280
55.1%
2024-01-10T07:21:53.645730image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
269
 
9.3%
262
 
9.1%
261
 
9.1%
261
 
9.1%
260
 
9.0%
255
 
8.9%
232
 
8.1%
116
 
4.0%
46
 
1.6%
2 44
 
1.5%
Other values (218) 874
30.3%

Most occurring categories

ValueCountFrequency (%)
Other Letter 2429
84.3%
Space Separator 232
 
8.1%
Decimal Number 183
 
6.4%
Uppercase Letter 15
 
0.5%
Close Punctuation 6
 
0.2%
Open Punctuation 6
 
0.2%
Other Symbol 5
 
0.2%
Lowercase Letter 4
 
0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
269
11.1%
262
10.8%
261
10.7%
261
10.7%
260
10.7%
255
 
10.5%
116
 
4.8%
46
 
1.9%
25
 
1.0%
25
 
1.0%
Other values (192) 649
26.7%
Decimal Number
ValueCountFrequency (%)
2 44
24.0%
1 41
22.4%
3 34
18.6%
5 11
 
6.0%
7 10
 
5.5%
6 10
 
5.5%
4 10
 
5.5%
9 10
 
5.5%
8 8
 
4.4%
0 5
 
2.7%
Uppercase Letter
ValueCountFrequency (%)
J 4
26.7%
S 2
13.3%
H 2
13.3%
C 1
 
6.7%
A 1
 
6.7%
P 1
 
6.7%
D 1
 
6.7%
Y 1
 
6.7%
L 1
 
6.7%
E 1
 
6.7%
Lowercase Letter
ValueCountFrequency (%)
s 2
50.0%
c 2
50.0%
Space Separator
ValueCountFrequency (%)
232
100.0%
Close Punctuation
ValueCountFrequency (%)
) 6
100.0%
Open Punctuation
ValueCountFrequency (%)
( 6
100.0%
Other Symbol
ValueCountFrequency (%)
5
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 2434
84.5%
Common 427
 
14.8%
Latin 19
 
0.7%

Most frequent character per script

Hangul
ValueCountFrequency (%)
269
11.1%
262
10.8%
261
 
10.7%
261
 
10.7%
260
 
10.7%
255
 
10.5%
116
 
4.8%
46
 
1.9%
25
 
1.0%
25
 
1.0%
Other values (193) 654
26.9%
Common
ValueCountFrequency (%)
232
54.3%
2 44
 
10.3%
1 41
 
9.6%
3 34
 
8.0%
5 11
 
2.6%
7 10
 
2.3%
6 10
 
2.3%
4 10
 
2.3%
9 10
 
2.3%
8 8
 
1.9%
Other values (3) 17
 
4.0%
Latin
ValueCountFrequency (%)
J 4
21.1%
S 2
10.5%
s 2
10.5%
c 2
10.5%
H 2
10.5%
C 1
 
5.3%
A 1
 
5.3%
P 1
 
5.3%
D 1
 
5.3%
Y 1
 
5.3%
Other values (2) 2
10.5%

Most occurring blocks

ValueCountFrequency (%)
Hangul 2429
84.3%
ASCII 446
 
15.5%
None 5
 
0.2%

Most frequent character per block

Hangul
ValueCountFrequency (%)
269
11.1%
262
10.8%
261
10.7%
261
10.7%
260
10.7%
255
 
10.5%
116
 
4.8%
46
 
1.9%
25
 
1.0%
25
 
1.0%
Other values (192) 649
26.7%
ASCII
ValueCountFrequency (%)
232
52.0%
2 44
 
9.9%
1 41
 
9.2%
3 34
 
7.6%
5 11
 
2.5%
7 10
 
2.2%
6 10
 
2.2%
4 10
 
2.2%
9 10
 
2.2%
8 8
 
1.8%
Other values (15) 36
 
8.1%
None
ValueCountFrequency (%)
5
100.0%
Distinct165
Distinct (%)58.3%
Missing0
Missing (%)0.0%
Memory size2.3 KiB
2024-01-10T07:21:53.803255image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length44
Median length39
Mean length19.699647
Min length11

Characters and Unicode

Total characters5575
Distinct characters84
Distinct categories7 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique143 ?
Unique (%)50.5%

Sample

1st row태안군 원북면 신두리 324-1(1필지)
2nd row태안군 근흥면 용신리 2-26
3rd row태안군 근흥면 용신리 2-28
4th row태안군 근흥면 용신리 2-29
5th row태안군 근흥면 용신리 2-23
ValueCountFrequency (%)
태안군 281
23.5%
소원면 69
 
5.8%
남면 49
 
4.1%
안면읍 47
 
3.9%
모항리 46
 
3.8%
신온리 40
 
3.3%
원북면 36
 
3.0%
766-5 35
 
2.9%
이원면 32
 
2.7%
정당리 26
 
2.2%
Other values (263) 536
44.8%
2024-01-10T07:21:54.074600image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
916
16.4%
352
 
6.3%
1 352
 
6.3%
305
 
5.5%
- 298
 
5.3%
282
 
5.1%
281
 
5.0%
259
 
4.6%
5 217
 
3.9%
6 210
 
3.8%
Other values (74) 2103
37.7%

Most occurring categories

ValueCountFrequency (%)
Other Letter 2556
45.8%
Decimal Number 1678
30.1%
Space Separator 916
 
16.4%
Dash Punctuation 298
 
5.3%
Other Punctuation 79
 
1.4%
Open Punctuation 24
 
0.4%
Close Punctuation 24
 
0.4%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
352
13.8%
305
11.9%
282
11.0%
281
11.0%
259
 
10.1%
139
 
5.4%
71
 
2.8%
69
 
2.7%
64
 
2.5%
55
 
2.2%
Other values (58) 679
26.6%
Decimal Number
ValueCountFrequency (%)
1 352
21.0%
5 217
12.9%
6 210
12.5%
2 203
12.1%
4 155
9.2%
3 147
8.8%
7 131
 
7.8%
8 108
 
6.4%
0 84
 
5.0%
9 71
 
4.2%
Other Punctuation
ValueCountFrequency (%)
, 77
97.5%
. 2
 
2.5%
Space Separator
ValueCountFrequency (%)
916
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 298
100.0%
Open Punctuation
ValueCountFrequency (%)
( 24
100.0%
Close Punctuation
ValueCountFrequency (%)
) 24
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 3019
54.2%
Hangul 2556
45.8%

Most frequent character per script

Hangul
ValueCountFrequency (%)
352
13.8%
305
11.9%
282
11.0%
281
11.0%
259
 
10.1%
139
 
5.4%
71
 
2.8%
69
 
2.7%
64
 
2.5%
55
 
2.2%
Other values (58) 679
26.6%
Common
ValueCountFrequency (%)
916
30.3%
1 352
 
11.7%
- 298
 
9.9%
5 217
 
7.2%
6 210
 
7.0%
2 203
 
6.7%
4 155
 
5.1%
3 147
 
4.9%
7 131
 
4.3%
8 108
 
3.6%
Other values (6) 282
 
9.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 3019
54.2%
Hangul 2556
45.8%

Most frequent character per block

ASCII
ValueCountFrequency (%)
916
30.3%
1 352
 
11.7%
- 298
 
9.9%
5 217
 
7.2%
6 210
 
7.0%
2 203
 
6.7%
4 155
 
5.1%
3 147
 
4.9%
7 131
 
4.3%
8 108
 
3.6%
Other values (6) 282
 
9.3%
Hangul
ValueCountFrequency (%)
352
13.8%
305
11.9%
282
11.0%
281
11.0%
259
 
10.1%
139
 
5.4%
71
 
2.8%
69
 
2.7%
64
 
2.5%
55
 
2.2%
Other values (58) 679
26.6%
Distinct71
Distinct (%)25.1%
Missing0
Missing (%)0.0%
Memory size2.3 KiB
Minimum2015-01-05 00:00:00
Maximum2019-11-21 00:00:00
2024-01-10T07:21:54.177006image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-01-10T07:21:54.282020image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
Distinct79
Distinct (%)27.9%
Missing0
Missing (%)0.0%
Memory size2.3 KiB
Minimum2018-01-04 00:00:00
Maximum2022-11-20 00:00:00
2024-01-10T07:21:54.393251image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-01-10T07:21:54.757050image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

허가용량
Real number (ℝ)

HIGH CORRELATION 

Distinct98
Distinct (%)34.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean208.83041
Minimum14.8
Maximum499.95
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size2.6 KiB
2024-01-10T07:21:54.869192image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum14.8
5-th percentile64.103
Q199
median99.33
Q3336.2
95-th percentile498.96
Maximum499.95
Range485.15
Interquartile range (IQR)237.2

Descriptive statistics

Standard deviation169.47787
Coefficient of variation (CV)0.81155742
Kurtosis-0.92677886
Mean208.83041
Median Absolute Deviation (MAD)1.41
Skewness0.9232973
Sum59099.005
Variance28722.747
MonotonicityNot monotonic
2024-01-10T07:21:54.983648image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
99.33 57
20.1%
99.0 37
 
13.1%
498.96 20
 
7.1%
99.2 19
 
6.7%
97.92 13
 
4.6%
496.8 9
 
3.2%
487.08 8
 
2.8%
496.47 6
 
2.1%
491.36 6
 
2.1%
296.4 5
 
1.8%
Other values (88) 103
36.4%
ValueCountFrequency (%)
14.8 1
0.4%
19.8 1
0.4%
24.8 1
0.4%
25.13 1
0.4%
29.6 1
0.4%
33.75 1
0.4%
39.42 2
0.7%
40.04 1
0.4%
40.15 1
0.4%
43.8 1
0.4%
ValueCountFrequency (%)
499.95 2
 
0.7%
499.62 1
 
0.4%
499.32 2
 
0.7%
498.96 20
7.1%
498.22 1
 
0.4%
496.8 9
3.2%
496.47 6
 
2.1%
495.88 1
 
0.4%
495.72 1
 
0.4%
495.36 2
 
0.7%

공사신고용량
Text

MISSING 

Distinct55
Distinct (%)42.0%
Missing152
Missing (%)53.7%
Memory size2.3 KiB
2024-01-10T07:21:55.181997image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/