I'm looking for a way to sample lists of probabilities.
7 29 Jan 2015 21:41 by u/gunfourhigher
I am working on a text synthesis program that reads in text and makes markov chains, but I need to weight the chains. Instead of having nodes that just have a list of words ex:(cat, dog) after "a", I need to have something like (cat:30, dog:70) where if I sample the letter a a hundred times, I get ~30ish cats and ~70ish dogs.
I am sure that this problem has been solved efficiently, or at least more so than what I am about to write. I am thinking of having a lists of 100 elements, and having the ratios of dogs to cats match the elements, and sampling the entire list randomly. I could then increase the size of the list later to account for granularity, but since I am going to be dealing with thousands of words, this isnt the best option as far as memory goes.
I hope someone understands what I'm trying to say!
It's worth noting that I can write whatever backend to this that I need, I'm not worried about details. I'm not trying to get the code written for me, I just need to know what this is called, and where I can find out more about it. I don't want to appear like someone trying to bamboozle you into doing my homework!
6 comments
1 u/Atko 29 Jan 2015 21:55
This reminds me of a school task I worked on with MapReduce. Perhaps MapReduce might be helpful?
1 u/gunfourhigher [OP] 29 Jan 2015 23:24
No, not really. I'm using Ruby for this project, and the solution will be pure ruby. I've run across probability vectors since my initial post, and I haven't had much time to look, but I think that's what I need.
1 u/tabularassa 30 Jan 2015 03:25
Problem description is not too clear
1 u/seiyria 30 Jan 2015 22:18
I did this in CoffeeScript. Let me explain the algorithm:
note, this assumes positive probabilities (but really, who uses negative probabilities)
a sample run could be below:
Let me know how this works out, or if you have questions.
1 u/gunfourhigher [OP] 31 Jan 2015 06:53
That's basically a probability vector, but instead of using 1.0, you're running your own limit and adjusting it on the fly. Thanks for sharing though, I hadn't even thought about doing it that way!
0 u/seiyria 31 Jan 2015 17:14
No problem. Sorry I couldn't be more helpful.