Overview

Dataset statistics

Number of variables10
Number of observations737
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory60.6 KiB
Average record size in memory84.2 B

Variable types

Numeric3
Categorical7

Dataset

Description조산사 국가시험 응시자의 현황을 분석할 수 있는 정보(연도, 직종, 회차, 성별, 연령대, 응시지역, 졸업여부, 합격여부, 학교소재지)를 개인을 식별할 수 없는 형태로 제공합니다.
URLhttps://www.data.go.kr/data/15060458/fileData.do

Alerts

직종 has constant value ""Constant
성별 has constant value ""Constant
응시지역 has constant value ""Constant
연도 is highly overall correlated with 회차 and 1 other fieldsHigh correlation
회차 is highly overall correlated with 연도 and 1 other fieldsHigh correlation
일련번호 is highly overall correlated with 연도 and 2 other fieldsHigh correlation
졸업여부 is highly overall correlated with 일련번호High correlation
졸업여부 is highly imbalanced (63.6%)Imbalance
합격여부 is highly imbalanced (82.1%)Imbalance
일련번호 has unique valuesUnique

Reproduction

Analysis started2023-12-12 15:39:19.014686
Analysis finished2023-12-12 15:39:21.190819
Duration2.18 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

연도
Real number (ℝ)

HIGH CORRELATION 

Distinct24
Distinct (%)3.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2007.1072
Minimum2000
Maximum2023
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size6.6 KiB
2023-12-13T00:39:21.260778image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum2000
5-th percentile2000
Q12002
median2004
Q32012
95-th percentile2020
Maximum2023
Range23
Interquartile range (IQR)10

Descriptive statistics

Standard deviation6.5271727
Coefficient of variation (CV)0.00325203
Kurtosis-0.44201116
Mean2007.1072
Median Absolute Deviation (MAD)3
Skewness0.8733187
Sum1479238
Variance42.603984
MonotonicityIncreasing
2023-12-13T00:39:21.411612image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=24)
ValueCountFrequency (%)
2001 78
 
10.6%
2002 76
 
10.3%
2000 75
 
10.2%
2003 75
 
10.2%
2004 65
 
8.8%
2005 38
 
5.2%
2009 37
 
5.0%
2007 27
 
3.7%
2008 27
 
3.7%
2006 26
 
3.5%
Other values (14) 213
28.9%
ValueCountFrequency (%)
2000 75
10.2%
2001 78
10.6%
2002 76
10.3%
2003 75
10.2%
2004 65
8.8%
2005 38
5.2%
2006 26
 
3.5%
2007 27
 
3.7%
2008 27
 
3.7%
2009 37
5.0%
ValueCountFrequency (%)
2023 10
1.4%
2022 12
1.6%
2021 12
1.6%
2020 13
1.8%
2019 14
1.9%
2018 21
2.8%
2017 16
2.2%
2016 19
2.6%
2015 19
2.6%
2014 17
2.3%

직종
Categorical

CONSTANT 

Distinct1
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size5.9 KiB
조산사
737 

Length

Max length3
Median length3
Mean length3
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row조산사
2nd row조산사
3rd row조산사
4th row조산사
5th row조산사

Common Values

ValueCountFrequency (%)
조산사 737
100.0%

Length

2023-12-13T00:39:21.558320image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T00:39:21.679776image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
조산사 737
100.0%

회차
Real number (ℝ)

HIGH CORRELATION 

Distinct24
Distinct (%)3.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean18.107191
Minimum11
Maximum34
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size6.6 KiB
2023-12-13T00:39:21.768787image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum11
5-th percentile11
Q113
median15
Q323
95-th percentile31
Maximum34
Range23
Interquartile range (IQR)10

Descriptive statistics

Standard deviation6.5271727
Coefficient of variation (CV)0.36047406
Kurtosis-0.44201116
Mean18.107191
Median Absolute Deviation (MAD)3
Skewness0.8733187
Sum13345
Variance42.603984
MonotonicityIncreasing
2023-12-13T00:39:21.901101image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=24)
ValueCountFrequency (%)
12 78
 
10.6%
13 76
 
10.3%
11 75
 
10.2%
14 75
 
10.2%
15 65
 
8.8%
16 38
 
5.2%
20 37
 
5.0%
18 27
 
3.7%
19 27
 
3.7%
17 26
 
3.5%
Other values (14) 213
28.9%
ValueCountFrequency (%)
11 75
10.2%
12 78
10.6%
13 76
10.3%
14 75
10.2%
15 65
8.8%
16 38
5.2%
17 26
 
3.5%
18 27
 
3.7%
19 27
 
3.7%
20 37
5.0%
ValueCountFrequency (%)
34 10
1.4%
33 12
1.6%
32 12
1.6%
31 13
1.8%
30 14
1.9%
29 21
2.8%
28 16
2.2%
27 19
2.6%
26 19
2.6%
25 17
2.3%

일련번호
Real number (ℝ)

HIGH CORRELATION  UNIQUE 

Distinct737
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean369
Minimum1
Maximum737
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size6.6 KiB
2023-12-13T00:39:22.015488image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile37.8
Q1185
median369
Q3553
95-th percentile700.2
Maximum737
Range736
Interquartile range (IQR)368

Descriptive statistics

Standard deviation212.89786
Coefficient of variation (CV)0.57695898
Kurtosis-1.2
Mean369
Median Absolute Deviation (MAD)184
Skewness0
Sum271953
Variance45325.5
MonotonicityStrictly increasing
2023-12-13T00:39:22.144395image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1 1
 
0.1%
496 1
 
0.1%
487 1
 
0.1%
488 1
 
0.1%
489 1
 
0.1%
490 1
 
0.1%
491 1
 
0.1%
492 1
 
0.1%
493 1
 
0.1%
494 1
 
0.1%
Other values (727) 727
98.6%
ValueCountFrequency (%)
1 1
0.1%
2 1
0.1%
3 1
0.1%
4 1
0.1%
5 1
0.1%
6 1
0.1%
7 1
0.1%
8 1
0.1%
9 1
0.1%
10 1
0.1%
ValueCountFrequency (%)
737 1
0.1%
736 1
0.1%
735 1
0.1%
734 1
0.1%
733 1
0.1%
732 1
0.1%
731 1
0.1%
730 1
0.1%
729 1
0.1%
728 1
0.1%

성별
Categorical

CONSTANT 

Distinct1
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size5.9 KiB
737 

Length

Max length1
Median length1
Mean length1
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row
2nd row
3rd row
4th row
5th row

Common Values

ValueCountFrequency (%)
737
100.0%

Length

2023-12-13T00:39:22.298802image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T00:39:22.408125image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
737
100.0%

연령대
Categorical

Distinct4
Distinct (%)0.5%
Missing0
Missing (%)0.0%
Memory size5.9 KiB
20
551 
30
148 
40
 
29
50
 
9

Length

Max length2
Median length2
Mean length2
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row20
2nd row20
3rd row20
4th row20
5th row20

Common Values

ValueCountFrequency (%)
20 551
74.8%
30 148
 
20.1%
40 29
 
3.9%
50 9
 
1.2%

Length

2023-12-13T00:39:22.518190image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T00:39:22.640271image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
20 551
74.8%
30 148
 
20.1%
40 29
 
3.9%
50 9
 
1.2%

응시지역
Categorical

CONSTANT 

Distinct1
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size5.9 KiB
서울특별시
737 

Length

Max length5
Median length5
Mean length5
Min length5

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row서울특별시
2nd row서울특별시
3rd row서울특별시
4th row서울특별시
5th row서울특별시

Common Values

ValueCountFrequency (%)
서울특별시 737
100.0%

Length

2023-12-13T00:39:22.763549image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T00:39:22.868076image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
서울특별시 737
100.0%

졸업여부
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct3
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size5.9 KiB
졸업
658 
졸업예정
 
60
 
19

Length

Max length4
Median length2
Mean length2.1370421
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row졸업예정
2nd row
3rd row졸업예정
4th row졸업예정
5th row졸업예정

Common Values

ValueCountFrequency (%)
졸업 658
89.3%
졸업예정 60
 
8.1%
19
 
2.6%

Length

2023-12-13T00:39:22.984029image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T00:39:23.122215image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
졸업 658
91.6%
졸업예정 60
 
8.4%

합격여부
Categorical

IMBALANCE 

Distinct3
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size5.9 KiB
합격
707 
불합격
 
19
결시
 
11

Length

Max length3
Median length2
Mean length2.0257802
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row합격
2nd row합격
3rd row합격
4th row합격
5th row합격

Common Values

ValueCountFrequency (%)
합격 707
95.9%
불합격 19
 
2.6%
결시 11
 
1.5%

Length

2023-12-13T00:39:23.260441image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T00:39:23.394337image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
합격 707
95.9%
불합격 19
 
2.6%
결시 11
 
1.5%

학교소재지
Categorical

Distinct12
Distinct (%)1.6%
Missing0
Missing (%)0.0%
Memory size5.9 KiB
부산광역시
410 
기타
104 
경상북도
74 
대전광역시
74 
서울특별시
 
29
Other values (7)
46 

Length

Max length6
Median length5
Mean length4.4301221
Min length2

Unique

Unique3 ?
Unique (%)0.4%

Sample

1st row경상북도
2nd row대구광역시
3rd row경상북도
4th row기타
5th row경상북도

Common Values

ValueCountFrequency (%)
부산광역시 410
55.6%
기타 104
 
14.1%
경상북도 74
 
10.0%
대전광역시 74
 
10.0%
서울특별시 29
 
3.9%
대구광역시 28
 
3.8%
일본 9
 
1.2%
<NA> 4
 
0.5%
우즈베키스탄 2
 
0.3%
충청남도 1
 
0.1%
Other values (2) 2
 
0.3%

Length

2023-12-13T00:39:23.528069image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
부산광역시 410
55.6%
기타 104
 
14.1%
경상북도 74
 
10.0%
대전광역시 74
 
10.0%
서울특별시 29
 
3.9%
대구광역시 28
 
3.8%
일본 9
 
1.2%
na 4
 
0.5%
우즈베키스탄 2
 
0.3%
충청남도 1
 
0.1%
Other values (2) 2
 
0.3%

Interactions

2023-12-13T00:39:20.517639image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T00:39:19.617538image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T00:39:20.012778image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T00:39:20.669726image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T00:39:19.742010image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T00:39:20.168499image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T00:39:20.784285image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/