Overview

Dataset statistics

Number of variables5
Number of observations10000
Missing cells0
Missing cells (%)0.0%
Duplicate rows629
Duplicate rows (%)6.3%
Total size in memory488.3 KiB
Average record size in memory50.0 B

Variable types

Text1
Categorical2
Numeric2

Dataset

Description인천광역시 남촌농산물도매시장 월간 경락가격에 대한 데이터로 품목, 등급, 단량, 단위, 평균가등을 볼 수 있습니다.
Author인천광역시
URLhttps://data.incheon.go.kr/findData/publicDataDetail?dataId=15051664&srcSe=7661IVAWM27C61E190

Alerts

단위 has constant value ""Constant
Dataset has 629 (6.3%) duplicate rowsDuplicates
단량 is highly overall correlated with 평균가High correlation
평균가 is highly overall correlated with 단량High correlation
등급 is highly imbalanced (54.4%)Imbalance
평균가 is highly skewed (γ1 = 99.15503283)Skewed

Reproduction

Analysis started2024-05-17 21:42:23.472968
Analysis finished2024-05-17 21:42:25.774869
Duration2.3 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

품목
Text

Distinct407
Distinct (%)4.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-18T06:42:26.080290image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length22
Median length17
Mean length9.2899
Min length5

Characters and Unicode

Total characters92899
Distinct characters295
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique57 ?
Unique (%)0.6%

Sample

1st row옥수수(미백)
2nd row깻잎(기타)
3rd row참다래(키위)(키위(수입))
4th row근대(근대(일반))
5th row무(가을무)
ValueCountFrequency (%)
표고버섯(생표고 251
 
2.4%
오이(백다다기 207
 
2.0%
기타(엽경채류(기타 197
 
1.9%
수박(수박(일반)(꼭지절단 189
 
1.8%
표고버섯(표고버섯(일반 146
 
1.4%
새송이(새송이(일반 145
 
1.4%
가지(가지(일반 143
 
1.4%
시금치(시금치(일반 142
 
1.4%
풋고추(청양 123
 
1.2%
밤(밤(일반 119
 
1.2%
Other values (401) 8622
83.8%
2024-05-18T06:42:26.888088image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
( 14099
 
15.2%
) 14099
 
15.2%
3521
 
3.8%
3471
 
3.7%
2881
 
3.1%
2873
 
3.1%
2462
 
2.7%
2170
 
2.3%
1333
 
1.4%
1285
 
1.4%
Other values (285) 44705
48.1%

Most occurring categories

ValueCountFrequency (%)
Other Letter 64252
69.2%
Open Punctuation 14099
 
15.2%
Close Punctuation 14099
 
15.2%
Space Separator 284
 
0.3%
Other Punctuation 144
 
0.2%
Decimal Number 21
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
3521
 
5.5%
3471
 
5.4%
2881
 
4.5%
2873
 
4.5%
2462
 
3.8%
2170
 
3.4%
1333
 
2.1%
1285
 
2.0%
1261
 
2.0%
1112
 
1.7%
Other values (279) 41883
65.2%
Decimal Number
ValueCountFrequency (%)
1 14
66.7%
8 7
33.3%
Open Punctuation
ValueCountFrequency (%)
( 14099
100.0%
Close Punctuation
ValueCountFrequency (%)
) 14099
100.0%
Space Separator
ValueCountFrequency (%)
284
100.0%
Other Punctuation
ValueCountFrequency (%)
, 144
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 64252
69.2%
Common 28647
30.8%

Most frequent character per script

Hangul
ValueCountFrequency (%)
3521
 
5.5%
3471
 
5.4%
2881
 
4.5%
2873
 
4.5%
2462
 
3.8%
2170
 
3.4%
1333
 
2.1%
1285
 
2.0%
1261
 
2.0%
1112
 
1.7%
Other values (279) 41883
65.2%
Common
ValueCountFrequency (%)
( 14099
49.2%
) 14099
49.2%
284
 
1.0%
, 144
 
0.5%
1 14
 
< 0.1%
8 7
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
Hangul 64252
69.2%
ASCII 28647
30.8%

Most frequent character per block

ASCII
ValueCountFrequency (%)
( 14099
49.2%
) 14099
49.2%
284
 
1.0%
, 144
 
0.5%
1 14
 
< 0.1%
8 7
 
< 0.1%
Hangul
ValueCountFrequency (%)
3521
 
5.5%
3471
 
5.4%
2881
 
4.5%
2873
 
4.5%
2462
 
3.8%
2170
 
3.4%
1333
 
2.1%
1285
 
2.0%
1261
 
2.0%
1112
 
1.7%
Other values (279) 41883
65.2%

등급
Categorical

IMBALANCE 

Distinct10
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
특(1등
6376 
상(2등
2552 
보통(3
 
456
4등
 
192
9등(등
 
184
Other values (5)
 
240

Length

Max length17
Median length16
Mean length16.0337
Min length16

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row특(1등
2nd row특(1등
3rd row특(1등
4th row특(1등
5th row특(1등

Common Values

ValueCountFrequency (%)
특(1등 6376
63.8%
상(2등 2552
25.5%
보통(3 456
 
4.6%
4등 192
 
1.9%
9등(등 184
 
1.8%
없음 95
 
0.9%
5등 46
 
0.5%
8등 42
 
0.4%
6등 38
 
0.4%
7등 19
 
0.2%

Length

2024-05-18T06:42:27.295002image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-05-18T06:42:27.650312image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
특(1등 6376
63.8%
상(2등 2552
25.5%
보통(3 456
 
4.6%
4등 192
 
1.9%
9등(등 184
 
1.8%
없음 95
 
0.9%
5등 46
 
0.5%
8등 42
 
0.4%
6등 38
 
0.4%
7등 19
 
0.2%

단량
Real number (ℝ)

HIGH CORRELATION 

Distinct87
Distinct (%)0.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean6.470575
Minimum0.01
Maximum136
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2024-05-18T06:42:28.020897image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0.01
5-th percentile0.5
Q13
median5
Q310
95-th percentile15
Maximum136
Range135.99
Interquartile range (IQR)7

Descriptive statistics

Standard deviation5.0756718
Coefficient of variation (CV)0.78442361
Kurtosis60.981697
Mean6.470575
Median Absolute Deviation (MAD)3
Skewness3.5279631
Sum64705.75
Variance25.762444
MonotonicityNot monotonic
2024-05-18T06:42:28.533576image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
10.0 2128
21.3%
4.0 2051
20.5%
2.0 1118
11.2%
5.0 804
 
8.0%
8.0 734
 
7.3%
1.0 493
 
4.9%
15.0 319
 
3.2%
3.0 310
 
3.1%
20.0 284
 
2.8%
0.5 248
 
2.5%
Other values (77) 1511
15.1%
ValueCountFrequency (%)
0.01 22
 
0.2%
0.02 1
 
< 0.1%
0.05 33
0.3%
0.06 8
 
0.1%
0.1 15
 
0.1%
0.12 3
 
< 0.1%
0.15 4
 
< 0.1%
0.16 16
 
0.2%
0.2 66
0.7%
0.25 6
 
0.1%
ValueCountFrequency (%)
136.0 1
 
< 0.1%
102.0 1
 
< 0.1%
85.0 1
 
< 0.1%
40.0 3
 
< 0.1%
34.0 1
 
< 0.1%
25.0 5
 
0.1%
21.0 2
 
< 0.1%
20.0 284
2.8%
19.0 1
 
< 0.1%
18.0 76
 
0.8%

단위
Categorical

CONSTANT 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
kg
10000 

Length

Max length2
Median length2
Mean length2
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowkg
2nd rowkg
3rd rowkg
4th rowkg
5th rowkg

Common Values

ValueCountFrequency (%)
kg 10000
100.0%

Length

2024-05-18T06:42:28.806071image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-05-18T06:42:28.969628image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
kg 10000
100.0%

평균가
Real number (ℝ)

HIGH CORRELATION  SKEWED 

Distinct4423
Distinct (%)44.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean22467.7
Minimum100
Maximum40008000
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2024-05-18T06:42:29.164723image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum100
5-th percentile1493.8
Q15691
median11977.5
Q322000
95-th percentile55118.85
Maximum40008000
Range40007900
Interquartile range (IQR)16309

Descriptive statistics

Standard deviation401038.11
Coefficient of variation (CV)17.84954
Kurtosis9886.4886
Mean22467.7
Median Absolute Deviation (MAD)7434
Skewness99.155033
Sum2.24677 × 108
Variance1.6083157 × 1011
MonotonicityNot monotonic
2024-05-18T06:42:29.444590image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
10000 208
 
2.1%
8000 193
 
1.9%
4000 181
 
1.8%
5000 162
 
1.6%
12000 159
 
1.6%
3000 152
 
1.5%
15000 149
 
1.5%
13000 149
 
1.5%
6000 142
 
1.4%
7000 136
 
1.4%
Other values (4413) 8369
83.7%
ValueCountFrequency (%)
100 1
 
< 0.1%
150 1
 
< 0.1%
200 4
 
< 0.1%
250 1
 
< 0.1%
276 1
 
< 0.1%
300 22
0.2%
336 1
 
< 0.1%
350 9
0.1%
375 1
 
< 0.1%
400 9
0.1%
ValueCountFrequency (%)
40008000 1
 
< 0.1%
1417000 1
 
< 0.1%
1062500 1
 
< 0.1%
604200 1
 
< 0.1%
560037 1
 
< 0.1%
448177 1
 
< 0.1%
406200 3
 
< 0.1%
354200 1
 
< 0.1%
240000 8
0.1%
215000 2
 
< 0.1%

Interactions

2024-05-18T06:42:24.675632image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-05-18T06:42:24.112160image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-05-18T06:42:24.960573image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-05-18T06:42:24.388921image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-05-18T06:42:29.734484image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
등급단량평균가
등급1.0000.0750.000
단량0.0751.0000.000
평균가0.0000.0001.000
2024-05-18T06:42:30.018122image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
단량평균가등급
단량1.0000.5930.039
평균가0.5931.0000.000
등급0.0390.0001.000

Missing values

2024-05-18T06:42:25.335372image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-05-18T06:42:25.642208image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

품목등급단량단위평균가
11030옥수수(미백)특(1등8.0kg2000
2894깻잎(기타)특(1등4.0kg9800
11875참다래(키위)(키위(수입))특(1등5.7kg52938
2406근대(근대(일반))특(1등4.0kg4027
5458무(가을무)특(1등20.0kg12573
12192칼리플라워(꽃양배추)(칼리플라워(일반))특(1등8.0kg17000
9021시금치(시금치(일반))특(1등0.35kg2200
7292사과(기꾸8)상(2등10.0kg45714
8889시금치(시금치(일반))특(1등4.0kg9107
1066강낭콩(줄콩)특(1등10.0kg22800
품목등급단량단위평균가
9099실파(실파(일반))특(1등1.0kg1000
8012새송이(새송이(일반))특(1등4.0kg12690
5659무(다발무)특(1등5.0kg2838
3014깻잎(깻잎(일반))특(1등3.0kg10286
8874시금치(시금치(일반))상(2등4.0kg10000
2213곡물제조(두부)특(1등0.5kg1230
11280적채(적채(일반))특(1등10.0kg9400
13490포도(샤인마스캇)상(2등2.0kg7227
13154파프리카(파프리카(일반))상(2등5.0kg17432
15158풋고추(청초(일반))특(1등4.0kg3000

Duplicate rows

Most frequently occurring

품목등급단량단위평균가# duplicates
58곡물제조(두부)특(1등0.5kg123018
66곡물제조(순두부)특(1등16.0kg1780018
225미역(줄기미역)특(1등7.5kg1100018
63곡물제조(두부)특(1등7.0kg750017
196무청(건무청)특(1등10.0kg2000017
476콩나물(콩나물(일반))특(1등5.0kg750017
67곡물제조(연두부)특(1등12.0kg1780016
222미역(줄기미역)특(1등5.5kg800016
324숙주나물(숙주나물(일반))특(1등3.5kg450016
31고구마순(생고구마순)특(1등2.0kg400015