Overview

Dataset statistics

Number of variables11
Number of observations1040
Missing cells803
Missing cells (%)7.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory91.5 KiB
Average record size in memory90.1 B

Variable types

Numeric2
Text3
Categorical6

Dataset

Description상주박물관에서 소장하고 있는 유물정보에 대한 데이터로 유물명, 주수량, 시대, 장르, 재질, 크기 등의 항목을 제공합니다. )
Author경상북도 상주시
URLhttps://www.data.go.kr/data/3049752/fileData.do

Alerts

데이터기준일 has constant value ""Constant
번호 is highly overall correlated with 출토지/소장자 and 1 other fieldsHigh correlation
주수량 is highly overall correlated with 출토지/소장자High correlation
시대 is highly overall correlated with 장르 and 1 other fieldsHigh correlation
장르 is highly overall correlated with 시대 and 3 other fieldsHigh correlation
재질 is highly overall correlated with 장르 and 1 other fieldsHigh correlation
출토지/소장자 is highly overall correlated with 번호 and 5 other fieldsHigh correlation
문화재지정 is highly overall correlated with 번호 and 2 other fieldsHigh correlation
시대 is highly imbalanced (53.6%)Imbalance
재질 is highly imbalanced (64.2%)Imbalance
출토지/소장자 is highly imbalanced (81.7%)Imbalance
유물설명 has 791 (76.1%) missing valuesMissing

Reproduction

Analysis started2023-12-12 11:01:18.795001
Analysis finished2023-12-12 11:01:21.703132
Duration2.91 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

번호
Real number (ℝ)

HIGH CORRELATION 

Distinct1038
Distinct (%)100.0%
Missing2
Missing (%)0.2%
Infinite0
Infinite (%)0.0%
Mean519.5
Minimum1
Maximum1038
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size9.3 KiB
2023-12-12T20:01:21.812657image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile52.85
Q1260.25
median519.5
Q3778.75
95-th percentile986.15
Maximum1038
Range1037
Interquartile range (IQR)518.5

Descriptive statistics

Standard deviation299.78909
Coefficient of variation (CV)0.57707236
Kurtosis-1.2
Mean519.5
Median Absolute Deviation (MAD)259.5
Skewness0
Sum539241
Variance89873.5
MonotonicityStrictly increasing
2023-12-12T20:01:21.976792image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
715 1
 
0.1%
685 1
 
0.1%
686 1
 
0.1%
687 1
 
0.1%
688 1
 
0.1%
689 1
 
0.1%
690 1
 
0.1%
691 1
 
0.1%
692 1
 
0.1%
693 1
 
0.1%
Other values (1028) 1028
98.8%
(Missing) 2
 
0.2%
ValueCountFrequency (%)
1 1
0.1%
2 1
0.1%
3 1
0.1%
4 1
0.1%
5 1
0.1%
6 1
0.1%
7 1
0.1%
8 1
0.1%
9 1
0.1%
10 1
0.1%
ValueCountFrequency (%)
1038 1
0.1%
1037 1
0.1%
1036 1
0.1%
1035 1
0.1%
1034 1
0.1%
1033 1
0.1%
1032 1
0.1%
1031 1
0.1%
1030 1
0.1%
1029 1
0.1%
Distinct650
Distinct (%)62.6%
Missing2
Missing (%)0.2%
Memory size8.3 KiB
2023-12-12T20:01:22.297379image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length33
Median length24
Mean length6.0635838
Min length1

Characters and Unicode

Total characters6294
Distinct characters620
Distinct categories10 ?
Distinct scripts5 ?
Distinct blocks7 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique553 ?
Unique (%)53.3%

Sample

1st row휘찬려사 彙纂麗史 (1)~(23)
2nd row후집 後集 (1) ~(5)
3rd row효자공실록부양리 공문집 孝子公實錄附陽里 公文集 (1)~(2)
4th row함창향교교지 咸昌鄕校校誌
5th row학용요의변정록 學庸要義卞正錄
ValueCountFrequency (%)
간찰 175
 
10.1%
고신 77
 
4.4%
준호구 71
 
4.1%
김영기 34
 
2.0%
영남지도 34
 
2.0%
교지 29
 
1.7%
26
 
1.5%
지형도 24
 
1.4%
시문집 18
 
1.0%
1950년대 16
 
0.9%
Other values (816) 1236
71.0%
2023-12-12T20:01:22.779711image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
702
 
11.2%
185
 
2.9%
178
 
2.8%
150
 
2.4%
135
 
2.1%
) 119
 
1.9%
( 119
 
1.9%
112
 
1.8%
104
 
1.7%
97
 
1.5%
Other values (610) 4393
69.8%

Most occurring categories

ValueCountFrequency (%)
Other Letter 5067
80.5%
Space Separator 702
 
11.2%
Decimal Number 206
 
3.3%
Close Punctuation 119
 
1.9%
Open Punctuation 119
 
1.9%
Math Symbol 32
 
0.5%
Dash Punctuation 26
 
0.4%
Other Punctuation 13
 
0.2%
Other Symbol 7
 
0.1%
Lowercase Letter 3
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
185
 
3.7%
178
 
3.5%
150
 
3.0%
135
 
2.7%
112
 
2.2%
104
 
2.1%
97
 
1.9%
96
 
1.9%
93
 
1.8%
92
 
1.8%
Other values (587) 3825
75.5%
Decimal Number
ValueCountFrequency (%)
1 79
38.3%
9 28
 
13.6%
0 27
 
13.1%
5 27
 
13.1%
2 17
 
8.3%
3 10
 
4.9%
4 8
 
3.9%
7 6
 
2.9%
6 3
 
1.5%
8 1
 
0.5%
Other Punctuation
ValueCountFrequency (%)
, 6
46.2%
? 4
30.8%
· 2
 
15.4%
/ 1
 
7.7%
Lowercase Letter
ValueCountFrequency (%)
x 1
33.3%
k 1
33.3%
g 1
33.3%
Space Separator
ValueCountFrequency (%)
702
100.0%
Close Punctuation
ValueCountFrequency (%)
) 119
100.0%
Open Punctuation
ValueCountFrequency (%)
( 119
100.0%
Math Symbol
ValueCountFrequency (%)
~ 32
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 26
100.0%
Other Symbol
ValueCountFrequency (%)
7
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 4610
73.2%
Common 1224
 
19.4%
Han 456
 
7.2%
Latin 3
 
< 0.1%
Katakana 1
 
< 0.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
185
 
4.0%
178
 
3.9%
150
 
3.3%
135
 
2.9%
112
 
2.4%
104
 
2.3%
97
 
2.1%
96
 
2.1%
93
 
2.0%
92
 
2.0%
Other values (355) 3368
73.1%
Han
ValueCountFrequency (%)
14
 
3.1%
13
 
2.9%
9
 
2.0%
9
 
2.0%
8
 
1.8%
7
 
1.5%
7
 
1.5%
7
 
1.5%
7
 
1.5%
7
 
1.5%
Other values (221) 368
80.7%
Common
ValueCountFrequency (%)
702
57.4%
) 119
 
9.7%
( 119
 
9.7%
1 79
 
6.5%
~ 32
 
2.6%
9 28
 
2.3%
0 27
 
2.2%
5 27
 
2.2%
- 26
 
2.1%
2 17
 
1.4%
Other values (10) 48
 
3.9%
Latin
ValueCountFrequency (%)
x 1
33.3%
k 1
33.3%
g 1
33.3%
Katakana
ValueCountFrequency (%)
1
100.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 4610
73.2%
ASCII 1218
 
19.4%
CJK 446
 
7.1%
CJK Compat Ideographs 10
 
0.2%
Geometric Shapes 7
 
0.1%
None 2
 
< 0.1%
Katakana 1
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
702
57.6%
) 119
 
9.8%
( 119
 
9.8%
1 79
 
6.5%
~ 32
 
2.6%
9 28
 
2.3%
0 27
 
2.2%
5 27
 
2.2%
- 26
 
2.1%
2 17
 
1.4%
Other values (11) 42
 
3.4%
Hangul
ValueCountFrequency (%)
185
 
4.0%
178
 
3.9%
150
 
3.3%
135
 
2.9%
112
 
2.4%
104
 
2.3%
97
 
2.1%
96
 
2.1%
93
 
2.0%
92
 
2.0%
Other values (355) 3368
73.1%
CJK
ValueCountFrequency (%)
14
 
3.1%
13
 
2.9%
9
 
2.0%
9
 
2.0%
8
 
1.8%
7
 
1.6%
7
 
1.6%
7
 
1.6%
7
 
1.6%
7
 
1.6%
Other values (216) 358
80.3%
Geometric Shapes
ValueCountFrequency (%)
7
100.0%
CJK Compat Ideographs
ValueCountFrequency (%)
5
50.0%
2
 
20.0%
1
 
10.0%
1
 
10.0%
1
 
10.0%
None
ValueCountFrequency (%)
· 2
100.0%
Katakana
ValueCountFrequency (%)
1
100.0%

주수량
Real number (ℝ)

HIGH CORRELATION 

Distinct20
Distinct (%)1.9%
Missing2
Missing (%)0.2%
Infinite0
Infinite (%)0.0%
Mean1.6252408
Minimum1
Maximum47
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size9.3 KiB
2023-12-12T20:01:22.951160image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q11
median1
Q31
95-th percentile4
Maximum47
Range46
Interquartile range (IQR)0

Descriptive statistics

Standard deviation2.8996756
Coefficient of variation (CV)1.7841513
Kurtosis109.22685
Mean1.6252408
Median Absolute Deviation (MAD)0
Skewness8.9992367
Sum1687
Variance8.4081183
MonotonicityNot monotonic
2023-12-12T20:01:23.107822image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=20)
ValueCountFrequency (%)
1 899
86.4%
2 52
 
5.0%
3 25
 
2.4%
4 14
 
1.3%
5 14
 
1.3%
14 4
 
0.4%
16 4
 
0.4%
7 4
 
0.4%
12 4
 
0.4%
6 3
 
0.3%
Other values (10) 15
 
1.4%
ValueCountFrequency (%)
1 899
86.4%
2 52
 
5.0%
3 25
 
2.4%
4 14
 
1.3%
5 14
 
1.3%
6 3
 
0.3%
7 4
 
0.4%
8 2
 
0.2%
9 1
 
0.1%
10 2
 
0.2%
ValueCountFrequency (%)
47 1
 
0.1%
43 1
 
0.1%
25 1
 
0.1%
23 1
 
0.1%
17 1
 
0.1%
16 4
0.4%
15 2
0.2%
14 4
0.4%
13 3
0.3%
12 4
0.4%

시대
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct12
Distinct (%)1.2%
Missing0
Missing (%)0.0%
Memory size8.3 KiB
조선
659 
근/현대
200 
기타
93 
일제강점기
 
52
대한제국
 
9
Other values (7)
 
27

Length

Max length5
Median length2
Mean length2.5769231
Min length1

Unique

Unique2 ?
Unique (%)0.2%

Sample

1st row기타
2nd row기타
3rd row기타
4th row기타
5th row기타

Common Values

ValueCountFrequency (%)
조선 659
63.4%
근/현대 200
 
19.2%
기타 93
 
8.9%
일제강점기 52
 
5.0%
대한제국 9
 
0.9%
광복이후 8
 
0.8%
삼국 6
 
0.6%
고려 6
 
0.6%
<NA> 3
 
0.3%
통일신라 2
 
0.2%
Other values (2) 2
 
0.2%

Length

2023-12-12T20:01:23.276902image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
조선 659
63.4%
근/현대 200
 
19.2%
기타 93
 
8.9%
일제강점기 52
 
5.0%
대한제국 9
 
0.9%
광복이후 8
 
0.8%
삼국 6
 
0.6%
고려 6
 
0.6%
na 3
 
0.3%
통일신라 2
 
0.2%
Other values (2) 2
 
0.2%

장르
Categorical

HIGH CORRELATION 

Distinct9
Distinct (%)0.9%
Missing0
Missing (%)0.0%
Memory size8.3 KiB
고문서
551 
민속품
214 
고서
143 
서화
63 
공예
 
49
Other values (4)
 
20

Length

Max length4
Median length3
Mean length2.7413462
Min length2

Unique

Unique1 ?
Unique (%)0.1%

Sample

1st row고서
2nd row고서
3rd row고서
4th row고서
5th row고서

Common Values

ValueCountFrequency (%)
고문서 551
53.0%
민속품 214
 
20.6%
고서 143
 
13.8%
서화 63
 
6.1%
공예 49
 
4.7%
기타 13
 
1.2%
<NA> 3
 
0.3%
건축 3
 
0.3%
조선 1
 
0.1%

Length

2023-12-12T20:01:24.079964image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T20:01:24.301421image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
고문서 551
53.0%
민속품 214
 
20.6%
고서 143
 
13.8%
서화 63
 
6.1%
공예 49
 
4.7%
기타 13
 
1.2%
na 3
 
0.3%
건축 3
 
0.3%
조선 1
 
0.1%

재질
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct11
Distinct (%)1.1%
Missing0
Missing (%)0.0%
Memory size8.3 KiB
822 
금속
 
75
목재
 
60
도자기
 
44
사직
 
13
Other values (6)
 
26

Length

Max length4
Median length1
Mean length1.2576923
Min length1

Unique

Unique1 ?
Unique (%)0.1%

Sample

1st row
2nd row
3rd row
4th row
5th row

Common Values

ValueCountFrequency (%)
822
79.0%
금속 75
 
7.2%
목재 60
 
5.8%
도자기 44
 
4.2%
사직 13
 
1.2%
토제 11
 
1.1%
기타 6
 
0.6%
석재 3
 
0.3%
<NA> 3
 
0.3%
유리 2
 
0.2%

Length

2023-12-12T20:01:24.482737image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
822
79.0%
금속 75
 
7.2%
목재 60
 
5.8%
도자기 44
 
4.2%
사직 13
 
1.2%
토제 11
 
1.1%
기타 6
 
0.6%
석재 3
 
0.3%
na 3
 
0.3%
유리 2
 
0.2%

크기
Text

Distinct970
Distinct (%)93.8%
Missing6
Missing (%)0.6%
Memory size8.3 KiB
2023-12-12T20:01:24.863464image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length72
Median length11
Mean length11.846228
Min length3

Characters and Unicode

Total characters12249
Distinct characters113
Distinct categories9 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique928 ?
Unique (%)89.7%

Sample

1st row27.6×19.8cm
2nd row28.3×19.2cm
3rd row25.9×18.6cm
4th row26.5 × 19.0cm
5th row29.1 × 18.5cm
ValueCountFrequency (%)
× 15
 
1.2%
높이 14
 
1.1%
1 12
 
0.9%
길이 8
 
0.6%
14.1×9.1 7
 
0.5%
2 7
 
0.5%
78.5×98.5cm 7
 
0.5%
전체길이 7
 
0.5%
56.1x46.8cm 6
 
0.5%
지름 6
 
0.5%
Other values (1106) 1205
93.1%
2023-12-12T20:01:25.550789image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
. 1961
16.0%
m 1110
9.1%
c 1109
9.1%
× 925
 
7.6%
5 863
 
7.0%
2 854
 
7.0%
1 816
 
6.7%
3 708
 
5.8%
0 631
 
5.2%
4 604
 
4.9%
Other values (103) 2668
21.8%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 6194
50.6%
Lowercase Letter 2342
 
19.1%
Other Punctuation 2007
 
16.4%
Math Symbol 925
 
7.6%
Other Letter 406
 
3.3%
Space Separator 260
 
2.1%
Close Punctuation 75
 
0.6%
Open Punctuation 32
 
0.3%
Dash Punctuation 8
 
0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
57
 
14.0%
27
 
6.7%
23
 
5.7%
20
 
4.9%
19
 
4.7%
18
 
4.4%
14
 
3.4%
14
 
3.4%
13
 
3.2%
12
 
3.0%
Other values (80) 189
46.6%
Decimal Number
ValueCountFrequency (%)
5 863
13.9%
2 854
13.8%
1 816
13.2%
3 708
11.4%
0 631
10.2%
4 604
9.8%
7 475
7.7%
8 467
7.5%
6 427
6.9%
9 349
5.6%
Other Punctuation
ValueCountFrequency (%)
. 1961
97.7%
, 45
 
2.2%
: 1
 
< 0.1%
Lowercase Letter
ValueCountFrequency (%)
m 1110
47.4%
c 1109
47.4%
x 123
 
5.3%
Close Punctuation
ValueCountFrequency (%)
) 73
97.3%
] 2
 
2.7%
Open Punctuation
ValueCountFrequency (%)
( 30
93.8%
[ 2
 
6.2%
Math Symbol
ValueCountFrequency (%)
× 925
100.0%
Space Separator
ValueCountFrequency (%)
260
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 8
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 9501
77.6%
Latin 2342
 
19.1%
Hangul 406
 
3.3%

Most frequent character per script

Hangul
ValueCountFrequency (%)
57
 
14.0%
27
 
6.7%
23
 
5.7%
20
 
4.9%
19
 
4.7%
18
 
4.4%
14
 
3.4%
14
 
3.4%
13
 
3.2%
12
 
3.0%
Other values (80) 189
46.6%
Common
ValueCountFrequency (%)
. 1961
20.6%
× 925
9.7%
5 863
9.1%
2 854
9.0%
1 816
8.6%
3 708
 
7.5%
0 631
 
6.6%
4 604
 
6.4%
7 475
 
5.0%
8 467
 
4.9%
Other values (10) 1197
12.6%
Latin
ValueCountFrequency (%)
m 1110
47.4%
c 1109
47.4%
x 123
 
5.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 10918
89.1%
None 925
 
7.6%
Hangul 406
 
3.3%

Most frequent character per block

ASCII
ValueCountFrequency (%)
. 1961
18.0%
m 1110
10.2%
c 1109
10.2%
5 863
7.9%
2 854
7.8%
1 816
7.5%
3 708
 
6.5%
0 631
 
5.8%
4 604
 
5.5%
7 475
 
4.4%
Other values (12) 1787
16.4%
None
ValueCountFrequency (%)
× 925
100.0%
Hangul
ValueCountFrequency (%)
57
 
14.0%
27
 
6.7%
23
 
5.7%
20
 
4.9%
19
 
4.7%
18
 
4.4%
14
 
3.4%
14
 
3.4%
13
 
3.2%
12
 
3.0%
Other values (80) 189
46.6%

출토지/소장자
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct17
Distinct (%)1.6%
Missing0
Missing (%)0.0%
Memory size8.3 KiB
<NA>
921 
조용중 기증
 
68
경북 상주시 모서면 호음리 일원
 
29
이상무 기증
 
4
김행일 기증
 
4
Other values (12)
 
14

Length

Max length17
Median length4
Mean length4.5711538
Min length4

Unique

Unique10 ?
Unique (%)1.0%

Sample

1st row<NA>
2nd row<NA>
3rd row<NA>
4th row<NA>
5th row<NA>

Common Values

ValueCountFrequency (%)
<NA> 921
88.6%
조용중 기증 68
 
6.5%
경북 상주시 모서면 호음리 일원 29
 
2.8%
이상무 기증 4
 
0.4%
김행일 기증 4
 
0.4%
정춘목 기증 2
 
0.2%
경북 상주시 개운동 일원 2
 
0.2%
권기순 기증 1
 
0.1%
김주진 기증 1
 
0.1%
김경락 기증 1
 
0.1%
Other values (7) 7
 
0.7%

Length

2023-12-12T20:01:25.761643image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
na 921
73.4%
기증 86
 
6.9%
조용중 68
 
5.4%
경북 33
 
2.6%
상주시 33
 
2.6%
일원 33
 
2.6%
모서면 29
 
2.3%
호음리 29
 
2.3%
이상무 4
 
0.3%
김행일 4
 
0.3%
Other values (13) 15
 
1.2%

문화재지정
Categorical

HIGH CORRELATION 

Distinct4
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size8.3 KiB
<NA>
793 
X
179 
보물 1004호
 
61
보물 1003호
 
7

Length

Max length8
Median length4
Mean length3.7451923
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row<NA>
2nd row<NA>
3rd row<NA>
4th row<NA>
5th row<NA>

Common Values

ValueCountFrequency (%)
<NA> 793
76.2%
X 179
 
17.2%
보물 1004호 61
 
5.9%
보물 1003호 7
 
0.7%

Length

2023-12-12T20:01:25.947426image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T20:01:26.118123image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
na 793
71.6%
x 179
 
16.2%
보물 68
 
6.1%
1004호 61
 
5.5%
1003호 7
 
0.6%

유물설명
Text

MISSING 

Distinct244
Distinct (%)98.0%
Missing791
Missing (%)76.1%
Memory size8.3 KiB
2023-12-12T20:01:26.594039image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/