Thursday, January 31, 2019

Kates of Wrath Keywords

I used Sketch Engine to create a list of the top 50 keywords (and multi-word chunks) from the book of poetry I wrote and published in December, the Kates of Wrath. First I uploaded the text of my book as a corpus - a very teeny tiny pile of words, only like 8k or so.

Here are the top 51 keywords compared to the reference corpus called something like English tenten 2013, which I don't really know much about.

  1. perfect interface - appears NOWHERE in the reference corpus. Hahaha
  2. nice dress
  3. dear friend
  4. teenager
  5. goad
  6. aerate
  7. ire
  8. anger
  9. adolescence
  10. cuz
  11. immature
  12. fury
  13. ripple
  14. wrath
  15. poem
  16. shatter
  17. cruel
  18. Czech
  19. drip
  20. ache
  21. dough
  22. pity
  23. grin
  24. Danny
  25. friendship
  26. horrible
  27. breeze
  28. dumb
  29. prototype
  30. nasty
  31. silly
  32. mad 
  33. loop
  34. whisper
  35. dare
  36. rage
  37. stain
  38. lawn
  39. shine
  40. dear
  41. sigh
  42. gonna
  43. awake
  44. math
  45. beneath
  46. terrible
  47. flaw
  48. endless
  49. stranger
  50. pile
  51. underneath
Here is the raw frequency data:

Item Score Freq Ref_freq
Teenager 248.83 3 4165
ire 197.21 4 22476
Sh__t 196.61 2 1
Tautology 196.38 2 28
Gimmee 195.61 2 117
hummock 183.19 2 1666
Challis 181.63 2 1875
tussock 180.21 2 2070
goad 179.78 3 14494
Brontë 179.06 2 2229
aerate 171.99 3 16181
Hatred 168.24 2 3834
Anger 159.69 3 19177
invisibly 142.16 2 8706
immaturity 131.16 2 11343
Poem 120.95 2 14220
hee 100.06 2 21932
Nightmare 99.82 2 22042
schmautological 98.81 1 0
hummmmmmmmmmmmmmmmmm 98.81 1 0
crankeny 98.81 1 0
Unfriendship 98.81 1 0
Tautologocial 98.81 1 0
doorlessly 98.8 1 1
Yuckiness 98.8 1 1
Blushworthy 98.8 1 1
yalping 98.8 1 3
shruggery 98.8 1 3
muggery 98.79 1 5
Vapidity 98.78 1 7
Bashfulness 98.78 1 7
repiece 98.77 1 8
needfinding 98.77 1 8
wion 98.76 1 12
Willfulness 98.75 1 14
Amiability 98.75 1 14
SNARKY 98.74 1 16
Quirkiness 98.74 1 16
Licentiousness 98.73 1 18
Joviality 98.73 1 18
Clinking 98.73 1 18
Naiveté 98.72 1 21
NCMO 98.68 1 29
Xylophones 98.68 1 30
Reasonability 98.67 1 32
Scheiss 98.61 1 47
Zombification 98.46 1 81

Here is the data as compared to a different corpus taken from the internet in 2018.

Item Score Freq Ref_freq
Teenager 245.72 3 168
aerate 235.74 3 211
Tautology 196.62 2 0
Sh__t 196.62 2 0
Gimmee 196.62 2 0
hummock 194.33 2 10
cuz 193.66 3 441
tussock 191.64 2 22
Anger 190.55 3 462
hee 185.25 2 52
Challis 180.83 2 74
Hatred 178.31 2 87
invisibly 174.76 2 106
Brontë 174.4 2 108
goad 169.69 3 623
Poem 164.74 2 164
immaturity 144.46 2 306
poem 142.56 16 8462
immature 127.93 3 1103
adolescence 126.76 3 1121
grin 119.49 6 3322
ache 118.89 5 2646
fright 114.02 2 614
ire 106.56 4 2272
ho 104.57 2 746
Austen 101.39 2 796
crappy 100.9 2 804
yalping 98.81 1 0
shruggery 98.81 1 0
schmautological 98.81 1 0
repiece 98.81 1 0
needfinding 98.81 1 0
muggery 98.81 1 0
hummmmmmmmmmmmmmmmmm 98.81 1 0
doorlessly 98.81 1 0
crankeny 98.81 1 0
cinchy 98.81 1 0
Yuckiness 98.81 1 0
Xylophones 98.81 1 0
Willfulness 98.81 1 0
Vapidity 98.81 1 0
Unfriendship 98.81 1 0
Tautologocial 98.81 1 0
SNARKY 98.81 1 0
Quirkiness 98.81 1 0
Naiveté 98.81 1 0
NCMO 98.81 1 0

Notice that the keywords are different when you change the reference corpus. Also, something went amiss with the diacritical marks. Hmm. 

I need to go get some other things done. But this is fascinating. I have to remember though that raw frequency is not the ultimate measure of everything. It is, however, interesting to see how the keywords vary. Maybe things changed online between 2013 and 2018. It is fun.

No comments:

Post a Comment