Randomizing Text File StreamReader/Writer

guest2013-1 · Mar 30, 2010

I'm trying to wrap my head around (and can't figure out) how I'd approach this.

I need to randomize a text file (datafeed). So instead of it being alphabetical it would be random... no idea even how to start with this. Any help?

WiT8litZ · Mar 31, 2010

1) Import the lines into iTunes as new songs

2) Make a new playlist and call it what you want

3) Import the lines (now songs) into the new playlist

4) Hit shuffle (this is where the magic happens)

5) Export the playlist to xml

6) Parse the xml to whatever you want

crazy_cat · Mar 31, 2010

am i missing something?

Keeper · Mar 31, 2010

not a .Net Expert or anything but surely with any language you can load the txt file into a listbox and then randomize the lines, and then save it ?

lol @ Wit8litz - I think he wants his code to do it, not manually through "itunes"

Jabberwocky · Mar 31, 2010

Make a macro in excel

import the text file into excel
column B =rand()
sort column B
export to a text file

sn3rd · Mar 31, 2010

How "random" do you need? Do you mean pseudorandom?

Is this for obfuscation / encryption? Or something else? i.e. Do you want it to appear random, or BE random?

guest2013-1 · Mar 31, 2010

Line 1
Line 2
Line 3
Line 4

becomes

Line 3
Line 1
Line 2
Line 4

Or any combination of.

This is an application that will be distributed. So not everyone would have excel or iTunes. WTF would I write something I can do myself in 5 minutes?

Loading the file into a Listbox/randomizing it won't be an option because the files can get quite big and the key here would be speed.

dequadin · Mar 31, 2010

There is a standard algorithm for generating a random permutation of a finite set.

Fisher–Yates shuffle

BTW this is what Microsoft failed at with the browser selection screen.

Nod · Mar 31, 2010

In php, you have this: http://www.php.net/manual/en/function.shuffle.php
Might be able to use some of the methods shown in the comments section.

Keeper · Mar 31, 2010

AcidRaZor said:
Loading the file into a Listbox/randomizing it won't be an option because the files can get quite big and the key here would be speed.

if speed is VERY important, how about loading line 1 of the text file, and putting it into another textfile, but INTO a random line. then line 2,3,4,etc...

for example:
at itteration 500 in the loop, it would load line 500 of the text file, but write it into line 123 of the second text file ?
at itteration 501 in the loop, it would load line 501 of the text file, but write it into line 43 of the second text file.

i'm sure it would be quite fast and there will be no problems like "the txt file is too big to add to a listbox" - ive done stuff similar to this and it is *quite fast* (faster than a listbox too, and more stable and no limitations)

dequadin · Mar 31, 2010

Keeper said:
if speed is VERY important, how about loading line 1 of the text file, and putting it into another textfile, but INTO a random line. then line 2,3,4,etc...

What if you don't know how many lines are in the original source? What if there are only two lines, and you write the first into line 1000. Now you have a 1000 line text file with loads of wasted space, you can't do a second pass to cleanup the empty lines because who's to say there aren't blank lines in the source?

Doing this in memory will always be faster, you are doubling the disk IO (which is the slowest part), especially since Acid doesn't specify *what* he wants to do with the text once it's randomized.

@Acid how big is quite big? 50MB? 1GB?

guest2013-1 · Mar 31, 2010

The file size can get up to 2gb depending on how many lines there are in. Once it's randomized I write it to a new file and then split it up into pieces

dequadin · Mar 31, 2010

AcidRaZor said:
The file size can get up to 2gb depending on how many lines there are in. Once it's randomized I write it to a new file and then split it up into pieces

So reading in the entire file is not an option.

Do you need the whole file randomized and the pieces, or could you just output the pieces? Max size of each piece?

Keeper · Mar 31, 2010

dequadin said:
What if you don't know how many lines are in the original source? What if there are only two lines, and you write the first into line 1000. Now you have a 1000 line text file with loads of wasted space, you can't do a second pass to cleanup the empty lines because who's to say there aren't blank lines in the source?

Doing this in memory will always be faster, you are doubling the disk IO (which is the slowest part), especially since Acid doesn't specify *what* he wants to do with the text once it's randomized.

@Acid how big is quite big? 50MB? 1GB?

Easy peasy lemon squezy, dequadin.... let me tell you.

dequadin said:
What if you don't know how many lines are in the original source?

easy.
most languages have a command that counts the number of lines in a text file *instantly*
mine is EOF (End of File) which returns a value with the amount of lines.

dequadin said:
What if there are only two lines, and you write the first into line 1000.

easy.
1st iteration it will random between 0 and 1 - so line 1.
2nd iteration it will random between 0 and 2 - so it can randomize before, OR, after - so it will still be random even if there are only a *few* lines
55th iteration it will random between 0 and 55
1055th iteration it will random between 0 and 1055

So, It won't write into line 1000 if it hasn't reached the 1000th iteration - you use the iteration as the max value it can random to.

dequadin said:
Now you have a 1000 line text file with loads of wasted space, you can't do a second pass to cleanup the empty lines because who's to say there aren't blank lines in the source?

no second pass needed as explained

dequadin said:
Doing this in memory will always be faster, you are doubling the disk IO (which is the slowest part), especially since Acid doesn't specify *what* he wants to do with the text once it's randomized.

well loading 2GB into memory might take long too, and what if the user only has 512mb ram?
Memory is usually better, but the disk based one will work on any PC - even if it only has 256mb ram

herbertk · Mar 31, 2010

all depends how random you wanna go... you can create a bunch of string variables and just con-cat them randomly ... but it wont be true random...

what language are you dong this in ?

Keeper · Mar 31, 2010

Acid, here is code for you in "Semi-coding-semi-English" - I have no idea what language you will be using..

use command to count how many lines the file has. (eg. total 200 lines detected)
For i = 1 to LineAmount (eg 1 to 200)
copy the files line i string (eg. line 55)
randomvalue = Generate a random number with a max of i (eg. 37)
output to file, into line randomvalue, and move lines down (do not replace)
Next i

HavocXphere · Mar 31, 2010

1. Find out how many lines there are.
2. Jump to random line & copy to second file.
3. Remove copied line from first file completely.
4. Goto step 1 until lines = 0

Files can be opened in read&write mode so you should be able to do 2 & 3 in one go.

Keeper · Mar 31, 2010

yeah, that could work too Havoc - but it might be slower because it needs to delete from file A too now - BUT, IMO, it will be fast enough also.

You can actually combine the two to make it even more random hehe!
grab random line string from fileA, insert into random line into fileB, delete line out of FileA

herbertk · Mar 31, 2010

ok misunderstood it... both above looks like it will suit your needs keepers will prob be quicker just put the file in an array and access the specific lines and add to a new array, or work with 3 arrays one for the random line numbers one original lines of the file and on new array...

dequadin · Mar 31, 2010

Keeper said:
most languages have a command that counts the number of lines in a text file *instantly*
mine is EOF (End of File) which returns a value with the amount of lines.

Crap, you have to parse the entire file to count the lines, next...

1st iteration it will random between 0 and 1 - so line 1.
2nd iteration it will random between 0 and 2 - so it can randomize before, OR, after - so it will still be random even if there are only a *few* lines
55th iteration it will random between 0 and 55
1055th iteration it will random between 0 and 1055

So, It won't write into line 1000 if it hasn't reached the 1000th iteration - you use the iteration as the max value it can random to.

This will give you a terrible distribution
You'll most likely have to cater for collisions, since you're effectively scaling your random output. i.e. The first iteration (between 0 and 1) say it picks 1, second iteration (between 0 and 2) say it picks 1 again.

EDIT: I see you handle point 2 by "move lines down", this will take very long when you need to insert a line in position 10 and then move 1.5GB's of text "down one line".

well loading 2GB into memory might take long too, and what if the user only has 512mb ram?
Memory is usually better, but the disk based one will work on any PC - even if it only has 256mb ram

See my comment after Acid had given us the file size where I say it's impractical.

Join the MyBroadband community

Get started

Randomizing Text File StreamReader/Writer

guest

Senior Member

Well-Known Member

Honorary Master

Expert Member

Expert Member

guest

Expert Member

Honorary Master

Honorary Master

Expert Member

guest

Expert Member

Honorary Master

Expert Member

Honorary Master

Honorary Master

Honorary Master

Expert Member

Expert Member