Saturday, January 26, 2019

CL ideas in the middle of the night Jan 25-26 2019 Edition

This is probably not going to be particularly coherent but I didn't want to forget and it seemed a more logical, findable place to put it here, plus the teeeeeny handful of people who are likely to read this would probably find it mildly funny it interesting.

Which reminds me of my blogging goals a decade ago:

1. Be interesting
2. Don't be boring

Later followed with:

3. Brevity

Totally, miserably failed there. But you know, a decade later I decided that it's okay that I'm not.

Ideas for corpora:

Kate Corpus

What if I could find all of the things I've ever written on a computer and put them into a giant pile?

X-person corpus

The way this idea could be of more general interest is by applying it more broadly. What if you could advertise for a service that - no, I mean, what if you could actually create a service that actually downloaded all* (*not possible) your text input history? Like with fh and genealogy, you would probably suddenly be able to generate a ton of interest by shifting the idea from theories and applying it to individuals.

People are really mostly interested in themselves.

We are a bunch of narcissists looking in mirrors in this depressing postmodern online world. This entire post will prove that point at least ten times. But in my defense, it's stream of consciousness writing after  only five hours of sleep.

How can I sleep. This is really, really interesting. I have to write about it in order to not explode.

But I guess even though it's pretty stupid for me to create the Kate Corpus because it is rather egocentric, it's also ethical. I give myself full permission.

So yeah. Where is my typing and how do I find it?

First computer: Family 386, totally gone. Unrecoverable. Did it even have internet?

Second computer: Compaq something or other. I learned to type 85 wpm when I was 10, my dad bought me a computer. I spent hours playing...in Ms word. And just playing. Or watching my brother play. Or recording my sister Sarah and I talking. But I was not that good at finger eye coordination. I've long since mourned the deletion of those sound files. Maybe my dad knows where the backup to that thing is. I have printed copies of two or three texts I wrote from that time period.

Cuz this entire corpus is clearly about diachronic change. Fun fact: I haven't decided within myself whether or not that is a word I can bear to include in my lexicon. It is just so...snobby. Change over time is just fine and not ridiculously arrogant. Or potentially part of the lyrics to where in time is Carmen San Diego. See the above comment about computer games as a child.

This computer lasted me a long time. I don't think it was usually connected to the internet.

I remember when it or one very similar was. Dial up. My first texts online: definitely aol instant messenger (AIM). Is that chat history even recoverable?! Didn't Google Hangouts or its predecessor subsume most aim accounts? My username is or was katidid99. It must have been 1999. I was 12/13 years old. This seems like an important source of text to include in this corpus. Hmm. How to recovery it?

Find a way to open that account. Find a way to download chat history - if it exists. Clean the text so it's only my half of the conversations. Is it ethical to include that? I'm... Going with yes?

BYU desktop. I think this was a new hard drive but all old parts from that Compaq? Wow. Computers lasted much longer back then. These days it's like, 2-3ish years and Danny, the sole computer purchaser in our house, upgrades it (piecemeal usually). And doesn't get rid of the old one. #computergraveyard

I was blogging by this point. So at least I have the texts from that time period. I already found and gathered the 25 posts from my livejournal - username kvasicek. I even correctly guessed the password. Ha! Such useless knowledge.

I would love to find my school essays. They definitely exist.

Oh wow, I wonder if my computer programs exist. That would... That would be really embarrassing. I took cs 100 and cs 142. The former was surprisingly fascinatingly philosophical. It led me to watch a four? hour documentary about the history of computers in the media center of the library which proved to be immensely valuable to my general education. And I can vividly remember some of the lectures. I remember the moment when I realized just exactly how incredible computers combined with human creative powers really are. I also felt very frustrated that I was even in that class. Although it was a perfect level for me - I learned basic HTML (oh sheesh... Memories of dating Nate the programmer at that time and writing him an extremely stupid and horribly embarrassing message on the sidewalk in sidewalk chalk...in HTML...by the dorms. That is... Something like classic Kate right there. Ughh. There's good reason for me to dislike all past iterations of my self. Adorkable.) I got the sense in that class that the real computer nerds totally skipped out of that class and intro to Java.

Which I took. And really struggled. And that's the class where I might have some code for this corpus.

I remember considering switching majors to cs. I remember having no idea what I wanted to study, the anxiety from not being able to decide. Everything was interesting. I took so many random classes. Ultimately cs lost because 1. It was a very long major. Too many credit hours meant not enough flexibility to take language classes. I wanted the space in my schedule to take ASL even though I never considered majoring in it. 2. The experience of CS 142 was about as good as it could possibly be but it was pretty miserable for me. I was one of three girls in a class of 100 geeky socially awkward white guys. I felt so out of place. We were sitting at these computers in what felt like a dungeon. Silent. I hated how that felt. It was depressing. I only succeeded in that class because of Nate's extreme patience. Actually, I got an A-. I think that means I must have managed to pass the exams on my own. But it was hard. So many times I'd try to solve the problem head on, and then the TA or Nate would be like, "why not do it like this?" Like, I'd be struggling to build a bridge, and they'd flip the world upside down so that suddenly I was in the place I wanted to be. I don't know if I have the mind of a computer programmer.

Long dumb tangent.

I would really like to find my English essays from that time.

What if... Backtracking... My high school email was recoverable. Oh! Hmm.

BYU laptop - I think somewhere there's a backup. It was a brick that got bricked. A brick-brick. Hmmm. I blogged from that thing in Jordan the first time I went. I think.

So hard to remember.

I used Danny's old laptop after we were first married. I am sure he backed it up somewhere.

The next laptop is now almost useless, a piece of junk, in our basement. It's been wiped. Text from it might be backed up.

My computer - where are the files. I don't know. Hmm.

So that's mostly the hard drive only stuff from Kate's Corpus. Finding .doc files that may or may not exist.

Blogs: (comparatively easy to find, not gonna list)

Chat histories
Only significant ones are aim, google whatever, Facebook whatever

Email
Kvasicek at Williston - long shot
Kvasicek at Hotmail - recoverable?
Kate@byu - greatest email address ever. I don't think I can recover this but maybe a forensic CL in the BYU law program can. Apparently that is the main focus of their program, or at least the direction it's going. Hmm.
Kvasicek at Gmail - I have this account for sure
Kalzbeta at Gmail
dannyandkate at Gmail
Czechoutyourancestors at Gmail

Skype - oh dang. I've been trying to recover that for weeks. If only I could talk to a real person. That is how I recovered my Gmail account. Knowing someone who works for Google. Begging him.

Hmm.

What else what else.

Comments? I don't remember all the places I've ever commented online. I remember a couple. I remember the moment I decided feminist Mormon housewives was not a blog I found interesting anymore. It had to do with an exchange with a troll in the comments. At least partially. Must've been 2008. I was in the studio apartment in Cambridge. Basically twiddling my thumbs while Danny did his internship. Hello birth of my intense interest in genealogy.

Oh, what about that? Finding my familysearch comments history? Hmm.

Then of course there's my current harddrive.

Oh crap, I need to figure out how to timestamp this stuff. How do I do that?! Somewhere in the title?

Okay today's project:

Danny and I will write a script in Python that harvests my blogger posts one at a time as text files and names them with the date in the title. I had just copy and pasted 113 posts into notepad. It was easier than automating it. But I'm going to want to automate it. You can download it as an xml file. The corpus building software I'm using wants it as a .txt and I don't want the extra xml tags.

Crap. I should probably go spend some time reading more about corpus building before investing a ton of time into this.

Hmm.

What if there were a Kate-reading Corpus? Like, a giant pile of words which I know I've read? As big as the Kate Corpus itself will be, the Kate-reading Corpus would be unfathomably bigger and more complicated/impossible to collect, especially to measure diachronic change. What would you even do with such a corpus?!

Hmmmm.

If automatic handwritten text recognition really works, I have a literal bookshelf of journals that could be fed through it to create a more comprehensive Kate Corpus. And that will include my earliest writing from when I was 5 and 6. So, hmmm.

I need to play around with that transkribus tool. Need to do that anyway, I'm presenting on it at the cgsi conference in October. Hmm.

How will I ever sleep. This is too exciting.


No comments:

Post a Comment