Overview

Dataset statistics

Number of variables5
Number of observations10000
Missing cells0
Missing cells (%)0.0%
Duplicate rows661
Duplicate rows (%)6.6%
Total size in memory488.3 KiB
Average record size in memory50.0 B

Variable types

Text1
Categorical2
Numeric2

Dataset

Description인천광역시 남촌농산물도매시장 월간 경락가격에 대한 데이터로 품목, 등급, 단량, 단위, 평균가등을 볼 수 있습니다.
Author인천광역시
URLhttps://data.incheon.go.kr/findData/publicDataDetail?dataId=15051664&srcSe=7661IVAWM27C61E190

Alerts

단위 has constant value ""Constant
Dataset has 661 (6.6%) duplicate rowsDuplicates
단량 is highly overall correlated with 평균가High correlation
평균가 is highly overall correlated with 단량High correlation
등급 is highly imbalanced (54.7%)Imbalance

Reproduction

Analysis started2024-05-17 21:41:55.548217
Analysis finished2024-05-17 21:41:57.942175
Duration2.39 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

품목
Text

Distinct407
Distinct (%)4.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-18T06:41:58.212435image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length22
Median length17
Mean length9.2852
Min length5

Characters and Unicode

Total characters92852
Distinct characters298
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique55 ?
Unique (%)0.5%

Sample

1st row포도(샤인마스캇)
2nd row가지(가지(일반))
3rd row기타(엽경채류(기타))
4th row동부(동부(일반))
5th row아스파라거스(녹색)
ValueCountFrequency (%)
표고버섯(생표고 254
 
2.5%
오이(백다다기 206
 
2.0%
기타(엽경채류(기타 181
 
1.8%
수박(수박(일반)(꼭지절단 169
 
1.6%
표고버섯(표고버섯(일반 145
 
1.4%
시금치(시금치(일반 145
 
1.4%
풋고추(청양 130
 
1.3%
가지(가지(일반 130
 
1.3%
호박(애호박 119
 
1.2%
새송이(새송이(일반 108
 
1.1%
Other values (401) 8682
84.5%
2024-05-18T06:41:59.318461image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
( 14033
 
15.1%
) 14033
 
15.1%
3478
 
3.7%
3433
 
3.7%
3025
 
3.3%
2955
 
3.2%
2459
 
2.6%
2152
 
2.3%
1366
 
1.5%
1366
 
1.5%
Other values (288) 44552
48.0%

Most occurring categories

ValueCountFrequency (%)
Other Letter 64353
69.3%
Open Punctuation 14033
 
15.1%
Close Punctuation 14033
 
15.1%
Space Separator 269
 
0.3%
Other Punctuation 141
 
0.2%
Decimal Number 23
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
3478
 
5.4%
3433
 
5.3%
3025
 
4.7%
2955
 
4.6%
2459
 
3.8%
2152
 
3.3%
1366
 
2.1%
1366
 
2.1%
1230
 
1.9%
1100
 
1.7%
Other values (282) 41789
64.9%
Decimal Number
ValueCountFrequency (%)
1 19
82.6%
8 4
 
17.4%
Open Punctuation
ValueCountFrequency (%)
( 14033
100.0%
Close Punctuation
ValueCountFrequency (%)
) 14033
100.0%
Space Separator
ValueCountFrequency (%)
269
100.0%
Other Punctuation
ValueCountFrequency (%)
, 141
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 64353
69.3%
Common 28499
30.7%

Most frequent character per script

Hangul
ValueCountFrequency (%)
3478
 
5.4%
3433
 
5.3%
3025
 
4.7%
2955
 
4.6%
2459
 
3.8%
2152
 
3.3%
1366
 
2.1%
1366
 
2.1%
1230
 
1.9%
1100
 
1.7%
Other values (282) 41789
64.9%
Common
ValueCountFrequency (%)
( 14033
49.2%
) 14033
49.2%
269
 
0.9%
, 141
 
0.5%
1 19
 
0.1%
8 4
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
Hangul 64353
69.3%
ASCII 28499
30.7%

Most frequent character per block

ASCII
ValueCountFrequency (%)
( 14033
49.2%
) 14033
49.2%
269
 
0.9%
, 141
 
0.5%
1 19
 
0.1%
8 4
 
< 0.1%
Hangul
ValueCountFrequency (%)
3478
 
5.4%
3433
 
5.3%
3025
 
4.7%
2955
 
4.6%
2459
 
3.8%
2152
 
3.3%
1366
 
2.1%
1366
 
2.1%
1230
 
1.9%
1100
 
1.7%
Other values (282) 41789
64.9%

등급
Categorical

IMBALANCE 

Distinct10
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
특(1등
6410 
상(2등
2537 
보통(3
 
452
9등(등
 
188
4등
 
178
Other values (5)
 
235

Length

Max length17
Median length16
Mean length16.0323
Min length16

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row특(1등
2nd row상(2등
3rd row특(1등
4th row특(1등
5th row특(1등

Common Values

ValueCountFrequency (%)
특(1등 6410
64.1%
상(2등 2537
 
25.4%
보통(3 452
 
4.5%
9등(등 188
 
1.9%
4등 178
 
1.8%
없음 90
 
0.9%
8등 48
 
0.5%
5등 40
 
0.4%
6등 36
 
0.4%
7등 21
 
0.2%

Length

2024-05-18T06:41:59.750384image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-05-18T06:42:00.003403image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
특(1등 6410
64.1%
상(2등 2537
 
25.4%
보통(3 452
 
4.5%
9등(등 188
 
1.9%
4등 178
 
1.8%
없음 90
 
0.9%
8등 48
 
0.5%
5등 40
 
0.4%
6등 36
 
0.4%
7등 21
 
0.2%

단량
Real number (ℝ)

HIGH CORRELATION 

Distinct84
Distinct (%)0.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean6.456265
Minimum0.01
Maximum136
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2024-05-18T06:42:00.414455image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0.01
5-th percentile0.5
Q13
median5
Q310
95-th percentile16
Maximum136
Range135.99
Interquartile range (IQR)7

Descriptive statistics

Standard deviation5.0334465
Coefficient of variation (CV)0.77962205
Kurtosis56.35695
Mean6.456265
Median Absolute Deviation (MAD)3
Skewness3.333624
Sum64562.65
Variance25.335584
MonotonicityNot monotonic
2024-05-18T06:42:00.824245image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
10.0 2109
21.1%
4.0 2056
20.6%
2.0 1113
11.1%
5.0 815
 
8.2%
8.0 766
 
7.7%
1.0 463
 
4.6%
3.0 308
 
3.1%
15.0 300
 
3.0%
20.0 297
 
3.0%
0.5 258
 
2.6%
Other values (74) 1515
15.2%
ValueCountFrequency (%)
0.01 20
 
0.2%
0.05 36
0.4%
0.06 8
 
0.1%
0.1 17
 
0.2%
0.12 4
 
< 0.1%
0.15 4
 
< 0.1%
0.16 15
 
0.1%
0.2 78
0.8%
0.25 5
 
0.1%
0.3 37
0.4%
ValueCountFrequency (%)
136.0 1
 
< 0.1%
85.0 2
 
< 0.1%
51.0 1
 
< 0.1%
40.0 1
 
< 0.1%
34.0 1
 
< 0.1%
25.0 3
 
< 0.1%
20.0 297
3.0%
18.0 78
 
0.8%
17.5 1
 
< 0.1%
17.0 47
 
0.5%

단위
Categorical

CONSTANT 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
kg
10000 

Length

Max length2
Median length2
Mean length2
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowkg
2nd rowkg
3rd rowkg
4th rowkg
5th rowkg

Common Values

ValueCountFrequency (%)
kg 10000
100.0%

Length

2024-05-18T06:42:01.073257image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-05-18T06:42:01.233961image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
kg 10000
100.0%

평균가
Real number (ℝ)

HIGH CORRELATION 

Distinct4418
Distinct (%)44.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean18109.831
Minimum100
Maximum1417000
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2024-05-18T06:42:01.430099image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum100
5-th percentile1490.95
Q15765.75
median12000
Q322000
95-th percentile55000
Maximum1417000
Range1416900
Interquartile range (IQR)16234.25

Descriptive statistics

Standard deviation27259.564
Coefficient of variation (CV)1.5052357
Kurtosis741.28906
Mean18109.831
Median Absolute Deviation (MAD)7269.5
Skewness17.626135
Sum1.8109831 × 108
Variance7.4308381 × 108
MonotonicityNot monotonic
2024-05-18T06:42:01.755347image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
10000 222
 
2.2%
8000 185
 
1.8%
4000 167
 
1.7%
5000 163
 
1.6%
13000 163
 
1.6%
15000 160
 
1.6%
12000 157
 
1.6%
6000 152
 
1.5%
11000 142
 
1.4%
3000 140
 
1.4%
Other values (4408) 8349
83.5%
ValueCountFrequency (%)
100 1
 
< 0.1%
150 1
 
< 0.1%
200 5
 
0.1%
276 1
 
< 0.1%
300 23
0.2%
336 1
 
< 0.1%
350 9
 
0.1%
375 1
 
< 0.1%
400 13
0.1%
424 1
 
< 0.1%
ValueCountFrequency (%)
1417000 1
 
< 0.1%
560037 1
 
< 0.1%
531200 1
 
< 0.1%
448177 1
 
< 0.1%
406200 1
 
< 0.1%
354200 1
 
< 0.1%
240000 6
0.1%
215000 1
 
< 0.1%
200000 3
 
< 0.1%
195000 9
0.1%

Interactions

2024-05-18T06:41:56.718746image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-05-18T06:41:56.144894image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-05-18T06:41:57.004231image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-05-18T06:41:56.424597image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-05-18T06:42:01.930207image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
등급단량평균가
등급1.0000.0740.020
단량0.0741.0000.853
평균가0.0200.8531.000
2024-05-18T06:42:02.094102image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
단량평균가등급
단량1.0000.5860.039
평균가0.5861.0000.008
등급0.0390.0081.000

Missing values

2024-05-18T06:41:57.364718image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-05-18T06:41:57.800376image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

품목등급단량단위평균가
13543포도(샤인마스캇)특(1등4.0kg19094
69가지(가지(일반))상(2등8.0kg11843
2610기타(엽경채류(기타))특(1등2.0kg5136
4414동부(동부(일반))특(1등4.0kg14722
9279아스파라거스(녹색)특(1등1.0kg8000
3022깻잎(깻잎(일반))특(1등2.0kg13199
4905로메인(로메인(일반))상(2등2.0kg13000
2831기타식품(기타)상(2등1.0kg8000
1745고들빼기(고들빼기(일반))상(2등5.0kg48750
4239더덕(더덕(일반))보통(31.0kg10625
품목등급단량단위평균가
8010새송이(새송이(일반))상(2등2.0kg3000
6663배추(기타)특(1등8.0kg6310
8950시금치(시금치(일반))상(2등0.4kg1773
5737무순(무순(일반))특(1등0.06kg350
8780순무(순무(일반))특(1등5.0kg3190
6491방울토마토(대추방울)특(1등3.0kg23014
6727배추(쌈배추)상(2등10.0kg8000
7971새발나물(새발나물(일반))특(1등4.0kg4635
7601사과(홍로)상(2등10.0kg30370
1339고구마(밤고구마)4등10.0kg5000

Duplicate rows

Most frequently occurring

품목등급단량단위평균가# duplicates
69곡물제조(두부)특(1등7.0kg750018
236미역(줄기미역)특(1등7.5kg1100017
76곡물제조(연두부)특(1등12.0kg1780016
171마늘(깐마늘 남도)특(1등0.2kg100016
320숙주나물(숙주나물(일반))특(1등3.5kg450016
68곡물제조(두부)특(1등3.0kg530015
103꼬시래기(꼬시래기(일반))특(1등8.0kg1050015
233미역(줄기미역)특(1등5.5kg800015
365어묵,어분,어비(기타)특(1등15.0kg7200015
494콩나물(콩나물(일반))특(1등5.0kg750015