Overview

Dataset statistics

Number of variables12
Number of observations100
Missing cells209
Missing cells (%)17.4%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory9.7 KiB
Average record size in memory99.3 B

Variable types

Categorical6
Text4
Numeric2

Alerts

eng_lang_area_nm has constant value ""Constant
kor_lang_area_nm has constant value ""Constant
jan_lang_area_nm has constant value ""Constant
chg_lang_area_nm has constant value ""Constant
BASE_YMD has constant value ""Constant
se_nm is highly imbalanced (91.9%)Imbalance
eng_lang_hotel_nm has 45 (45.0%) missing valuesMissing
kor_lang_hotel_nm has 54 (54.0%) missing valuesMissing
rn_adres has 2 (2.0%) missing valuesMissing
lo has 5 (5.0%) missing valuesMissing
la has 5 (5.0%) missing valuesMissing
tel_no has 98 (98.0%) missing valuesMissing

Reproduction

Analysis started2023-12-10 09:58:22.506236
Analysis finished2023-12-10 09:58:24.513675
Duration2.01 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

se_nm
Categorical

IMBALANCE 

Distinct2
Distinct (%)2.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
호텔
99 
에어비앤비
 
1

Length

Max length5
Median length2
Mean length2.03
Min length2

Unique

Unique1 ?
Unique (%)1.0%

Sample

1st row호텔
2nd row호텔
3rd row호텔
4th row호텔
5th row호텔

Common Values

ValueCountFrequency (%)
호텔 99
99.0%
에어비앤비 1
 
1.0%

Length

2023-12-10T18:58:24.654118image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-10T18:58:24.847903image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
호텔 99
99.0%
에어비앤비 1
 
1.0%

eng_lang_hotel_nm
Text

MISSING 

Distinct55
Distinct (%)100.0%
Missing45
Missing (%)45.0%
Memory size932.0 B
2023-12-10T18:58:25.375474image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length50
Median length38
Mean length29.854545
Min length10

Characters and Unicode

Total characters1642
Distinct characters72
Distinct categories9 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique55 ?
Unique (%)100.0%

Sample

1st row@12Haven- Stunning Seaside Luxury Villa. Sleeps 12
2nd row1000 miles
3rd row15 minutes to Kuala Lumpur City Centre
4th row1-5 pax 5mins IOI Mall LRT Cozy Apartment Puchong
5th row1805 Condo D'Savoy Homestay
ValueCountFrequency (%)
10
 
3.5%
suites 7
 
2.4%
apartment 6
 
2.1%
2br 6
 
2.1%
hotel 5
 
1.7%
homestay 5
 
1.7%
klcc 5
 
1.7%
pax 5
 
1.7%
lrt 4
 
1.4%
city 4
 
1.4%
Other values (169) 232
80.3%
2023-12-10T18:58:26.260694image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
234
 
14.3%
e 121
 
7.4%
a 96
 
5.8%
t 87
 
5.3%
o 85
 
5.2%
i 71
 
4.3%
n 71
 
4.3%
r 58
 
3.5%
s 57
 
3.5%
u 46
 
2.8%
Other values (62) 716
43.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 966
58.8%
Uppercase Letter 307
 
18.7%
Space Separator 234
 
14.3%
Decimal Number 95
 
5.8%
Other Punctuation 21
 
1.3%
Dash Punctuation 12
 
0.7%
Math Symbol 3
 
0.2%
Close Punctuation 2
 
0.1%
Open Punctuation 2
 
0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 121
12.5%
a 96
9.9%
t 87
 
9.0%
o 85
 
8.8%
i 71
 
7.3%
n 71
 
7.3%
r 58
 
6.0%
s 57
 
5.9%
u 46
 
4.8%
l 45
 
4.7%
Other values (16) 229
23.7%
Uppercase Letter
ValueCountFrequency (%)
C 34
11.1%
S 33
10.7%
A 33
10.7%
H 25
 
8.1%
L 25
 
8.1%
B 22
 
7.2%
R 18
 
5.9%
K 15
 
4.9%
P 12
 
3.9%
I 12
 
3.9%
Other values (14) 78
25.4%
Decimal Number
ValueCountFrequency (%)
2 16
16.8%
1 15
15.8%
5 13
13.7%
8 11
11.6%
3 9
9.5%
7 7
7.4%
0 7
7.4%
6 7
7.4%
4 6
 
6.3%
9 4
 
4.2%
Other Punctuation
ValueCountFrequency (%)
@ 8
38.1%
* 4
19.0%
, 3
 
14.3%
' 3
 
14.3%
/ 1
 
4.8%
& 1
 
4.8%
. 1
 
4.8%
Space Separator
ValueCountFrequency (%)
234
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 12
100.0%
Math Symbol
ValueCountFrequency (%)
| 3
100.0%
Close Punctuation
ValueCountFrequency (%)
] 2
100.0%
Open Punctuation
ValueCountFrequency (%)
[ 2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 1273
77.5%
Common 369
 
22.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 121
 
9.5%
a 96
 
7.5%
t 87
 
6.8%
o 85
 
6.7%
i 71
 
5.6%
n 71
 
5.6%
r 58
 
4.6%
s 57
 
4.5%
u 46
 
3.6%
l 45
 
3.5%
Other values (40) 536
42.1%
Common
ValueCountFrequency (%)
234
63.4%
2 16
 
4.3%
1 15
 
4.1%
5 13
 
3.5%
- 12
 
3.3%
8 11
 
3.0%
3 9
 
2.4%
@ 8
 
2.2%
7 7
 
1.9%
0 7
 
1.9%
Other values (12) 37
 
10.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1642
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
234
 
14.3%
e 121
 
7.4%
a 96
 
5.8%
t 87
 
5.3%
o 85
 
5.2%
i 71
 
4.3%
n 71
 
4.3%
r 58
 
3.5%
s 57
 
3.5%
u 46
 
2.8%
Other values (62) 716
43.6%

kor_lang_hotel_nm
Text

MISSING 

Distinct45
Distinct (%)97.8%
Missing54
Missing (%)54.0%
Memory size932.0 B
2023-12-10T18:58:26.733504image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length35
Median length19
Mean length12.173913
Min length3

Characters and Unicode

Total characters560
Distinct characters135
Distinct categories7 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique44 ?
Unique (%)95.7%

Sample

1st row1 다마이 레지던스 - 더 럭셔리 3 베드룸 스위트 앳 KLCC
2nd row1 데이 카 호텔
3rd row1 리바란 호텔
4th row1 바론 모텔
5th row1 보니오 타워 B 서비스 콘도
ValueCountFrequency (%)
호텔 28
 
15.0%
1 10
 
5.3%
9
 
4.8%
1st 6
 
3.2%
6
 
3.2%
부티크 4
 
2.1%
7 4
 
2.1%
알람 3
 
1.6%
게스트하우스 3
 
1.6%
3
 
1.6%
Other values (95) 111
59.4%
2023-12-10T18:58:27.534805image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
141
25.2%
1 32
 
5.7%
30
 
5.4%
29
 
5.2%
18
 
3.2%
10
 
1.8%
10
 
1.8%
10
 
1.8%
9
 
1.6%
3 8
 
1.4%
Other values (125) 263
47.0%

Most occurring categories

ValueCountFrequency (%)
Other Letter 317
56.6%
Space Separator 141
25.2%
Decimal Number 71
 
12.7%
Lowercase Letter 12
 
2.1%
Uppercase Letter 12
 
2.1%
Other Punctuation 4
 
0.7%
Dash Punctuation 3
 
0.5%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
30
 
9.5%
29
 
9.1%
18
 
5.7%
10
 
3.2%
10
 
3.2%
10
 
3.2%
9
 
2.8%
6
 
1.9%
6
 
1.9%
6
 
1.9%
Other values (104) 183
57.7%
Decimal Number
ValueCountFrequency (%)
1 32
45.1%
3 8
 
11.3%
0 7
 
9.9%
7 6
 
8.5%
9 5
 
7.0%
8 4
 
5.6%
6 3
 
4.2%
2 3
 
4.2%
5 3
 
4.2%
Uppercase Letter
ValueCountFrequency (%)
A 4
33.3%
C 2
16.7%
B 2
16.7%
F 1
 
8.3%
K 1
 
8.3%
L 1
 
8.3%
G 1
 
8.3%
Lowercase Letter
ValueCountFrequency (%)
t 6
50.0%
s 6
50.0%
Space Separator
ValueCountFrequency (%)
141
100.0%
Other Punctuation
ValueCountFrequency (%)
@ 4
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 3
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 317
56.6%
Common 219
39.1%
Latin 24
 
4.3%

Most frequent character per script

Hangul
ValueCountFrequency (%)
30
 
9.5%
29
 
9.1%
18
 
5.7%
10
 
3.2%
10
 
3.2%
10
 
3.2%
9
 
2.8%
6
 
1.9%
6
 
1.9%
6
 
1.9%
Other values (104) 183
57.7%
Common
ValueCountFrequency (%)
141
64.4%
1 32
 
14.6%
3 8
 
3.7%
0 7
 
3.2%
7 6
 
2.7%
9 5
 
2.3%
@ 4
 
1.8%
8 4
 
1.8%
6 3
 
1.4%
- 3
 
1.4%
Other values (2) 6
 
2.7%
Latin
ValueCountFrequency (%)
t 6
25.0%
s 6
25.0%
A 4
16.7%
C 2
 
8.3%
B 2
 
8.3%
F 1
 
4.2%
K 1
 
4.2%
L 1
 
4.2%
G 1
 
4.2%

Most occurring blocks

ValueCountFrequency (%)
Hangul 317
56.6%
ASCII 243
43.4%

Most frequent character per block

ASCII
ValueCountFrequency (%)
141
58.0%
1 32
 
13.2%
3 8
 
3.3%
0 7
 
2.9%
t 6
 
2.5%
7 6
 
2.5%
s 6
 
2.5%
9 5
 
2.1%
@ 4
 
1.6%
8 4
 
1.6%
Other values (11) 24
 
9.9%
Hangul
ValueCountFrequency (%)
30
 
9.5%
29
 
9.1%
18
 
5.7%
10
 
3.2%
10
 
3.2%
10
 
3.2%
9
 
2.8%
6
 
1.9%
6
 
1.9%
6
 
1.9%
Other values (104) 183
57.7%

eng_lang_area_nm
Categorical

CONSTANT 

Distinct1
Distinct (%)1.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
말레이시아
100 

Length

Max length5
Median length5
Mean length5
Min length5

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row말레이시아
2nd row말레이시아
3rd row말레이시아
4th row말레이시아
5th row말레이시아

Common Values

ValueCountFrequency (%)
말레이시아 100
100.0%

Length

2023-12-10T18:58:27.794365image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-10T18:58:27.977966image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
말레이시아 100
100.0%

kor_lang_area_nm
Categorical

CONSTANT 

Distinct1
Distinct (%)1.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
マレーシア
100 

Length

Max length5
Median length5
Mean length5
Min length5

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowマレーシア
2nd rowマレーシア
3rd rowマレーシア
4th rowマレーシア
5th rowマレーシア

Common Values

ValueCountFrequency (%)
マレーシア 100
100.0%

Length

2023-12-10T18:58:28.206943image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-10T18:58:28.453611image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
マレーシア 100
100.0%

jan_lang_area_nm
Categorical

CONSTANT 

Distinct1
Distinct (%)1.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
馬來西亞
100 

Length

Max length4
Median length4
Mean length4
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row馬來西亞
2nd row馬來西亞
3rd row馬來西亞
4th row馬來西亞
5th row馬來西亞

Common Values

ValueCountFrequency (%)
馬來西亞 100
100.0%

Length

2023-12-10T18:58:28.724833image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-10T18:58:28.928114image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
馬來西亞 100
100.0%

chg_lang_area_nm
Categorical

CONSTANT 

Distinct1
Distinct (%)1.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
馬來西亞
100 

Length

Max length4
Median length4
Mean length4
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row馬來西亞
2nd row馬來西亞
3rd row馬來西亞
4th row馬來西亞
5th row馬來西亞

Common Values

ValueCountFrequency (%)
馬來西亞 100
100.0%

Length

2023-12-10T18:58:29.113792image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-10T18:58:29.284381image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
馬來西亞 100
100.0%

rn_adres
Text

MISSING 

Distinct94
Distinct (%)95.9%
Missing2
Missing (%)2.0%
Memory size932.0 B
2023-12-10T18:58:30.019066image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/