Here are the top 51 keywords compared to the reference corpus called something like English tenten 2013, which I don't really know much about.
- perfect interface - appears NOWHERE in the reference corpus. Hahaha
- nice dress
- dear friend
- teenager
- goad
- aerate
- ire
- anger
- adolescence
- cuz
- immature
- fury
- ripple
- wrath
- poem
- shatter
- cruel
- Czech
- drip
- ache
- dough
- pity
- grin
- Danny
- friendship
- horrible
- breeze
- dumb
- prototype
- nasty
- silly
- mad
- loop
- whisper
- dare
- rage
- stain
- lawn
- shine
- dear
- sigh
- gonna
- awake
- math
- beneath
- terrible
- flaw
- endless
- stranger
- pile
- underneath
Here is the raw frequency data:
Item | Score | Freq | Ref_freq |
Teenager | 248.83 | 3 | 4165 |
ire | 197.21 | 4 | 22476 |
Sh__t | 196.61 | 2 | 1 |
Tautology | 196.38 | 2 | 28 |
Gimmee | 195.61 | 2 | 117 |
hummock | 183.19 | 2 | 1666 |
Challis | 181.63 | 2 | 1875 |
tussock | 180.21 | 2 | 2070 |
goad | 179.78 | 3 | 14494 |
Brontë | 179.06 | 2 | 2229 |
aerate | 171.99 | 3 | 16181 |
Hatred | 168.24 | 2 | 3834 |
Anger | 159.69 | 3 | 19177 |
invisibly | 142.16 | 2 | 8706 |
immaturity | 131.16 | 2 | 11343 |
Poem | 120.95 | 2 | 14220 |
hee | 100.06 | 2 | 21932 |
Nightmare | 99.82 | 2 | 22042 |
schmautological | 98.81 | 1 | 0 |
hummmmmmmmmmmmmmmmmm | 98.81 | 1 | 0 |
crankeny | 98.81 | 1 | 0 |
Unfriendship | 98.81 | 1 | 0 |
Tautologocial | 98.81 | 1 | 0 |
doorlessly | 98.8 | 1 | 1 |
Yuckiness | 98.8 | 1 | 1 |
Blushworthy | 98.8 | 1 | 1 |
yalping | 98.8 | 1 | 3 |
shruggery | 98.8 | 1 | 3 |
muggery | 98.79 | 1 | 5 |
Vapidity | 98.78 | 1 | 7 |
Bashfulness | 98.78 | 1 | 7 |
repiece | 98.77 | 1 | 8 |
needfinding | 98.77 | 1 | 8 |
wion | 98.76 | 1 | 12 |
Willfulness | 98.75 | 1 | 14 |
Amiability | 98.75 | 1 | 14 |
SNARKY | 98.74 | 1 | 16 |
Quirkiness | 98.74 | 1 | 16 |
Licentiousness | 98.73 | 1 | 18 |
Joviality | 98.73 | 1 | 18 |
Clinking | 98.73 | 1 | 18 |
Naiveté | 98.72 | 1 | 21 |
NCMO | 98.68 | 1 | 29 |
Xylophones | 98.68 | 1 | 30 |
Reasonability | 98.67 | 1 | 32 |
Scheiss | 98.61 | 1 | 47 |
Zombification | 98.46 | 1 | 81 |
Here is the data as compared to a different corpus taken from the internet in 2018.
Item | Score | Freq | Ref_freq |
Teenager | 245.72 | 3 | 168 |
aerate | 235.74 | 3 | 211 |
Tautology | 196.62 | 2 | 0 |
Sh__t | 196.62 | 2 | 0 |
Gimmee | 196.62 | 2 | 0 |
hummock | 194.33 | 2 | 10 |
cuz | 193.66 | 3 | 441 |
tussock | 191.64 | 2 | 22 |
Anger | 190.55 | 3 | 462 |
hee | 185.25 | 2 | 52 |
Challis | 180.83 | 2 | 74 |
Hatred | 178.31 | 2 | 87 |
invisibly | 174.76 | 2 | 106 |
Brontë | 174.4 | 2 | 108 |
goad | 169.69 | 3 | 623 |
Poem | 164.74 | 2 | 164 |
immaturity | 144.46 | 2 | 306 |
poem | 142.56 | 16 | 8462 |
immature | 127.93 | 3 | 1103 |
adolescence | 126.76 | 3 | 1121 |
grin | 119.49 | 6 | 3322 |
ache | 118.89 | 5 | 2646 |
fright | 114.02 | 2 | 614 |
ire | 106.56 | 4 | 2272 |
ho | 104.57 | 2 | 746 |
Austen | 101.39 | 2 | 796 |
crappy | 100.9 | 2 | 804 |
yalping | 98.81 | 1 | 0 |
shruggery | 98.81 | 1 | 0 |
schmautological | 98.81 | 1 | 0 |
repiece | 98.81 | 1 | 0 |
needfinding | 98.81 | 1 | 0 |
muggery | 98.81 | 1 | 0 |
hummmmmmmmmmmmmmmmmm | 98.81 | 1 | 0 |
doorlessly | 98.81 | 1 | 0 |
crankeny | 98.81 | 1 | 0 |
cinchy | 98.81 | 1 | 0 |
Yuckiness | 98.81 | 1 | 0 |
Xylophones | 98.81 | 1 | 0 |
Willfulness | 98.81 | 1 | 0 |
Vapidity | 98.81 | 1 | 0 |
Unfriendship | 98.81 | 1 | 0 |
Tautologocial | 98.81 | 1 | 0 |
SNARKY | 98.81 | 1 | 0 |
Quirkiness | 98.81 | 1 | 0 |
Naiveté | 98.81 | 1 | 0 |
NCMO | 98.81 | 1 | 0 |
Notice that the keywords are different when you change the reference corpus. Also, something went amiss with the diacritical marks. Hmm.
I need to go get some other things done. But this is fascinating. I have to remember though that raw frequency is not the ultimate measure of everything. It is, however, interesting to see how the keywords vary. Maybe things changed online between 2013 and 2018. It is fun.
No comments:
Post a Comment