Randomizing Text File StreamReader/Writer

guest2013-1

guest
Joined
Aug 22, 2003
Messages
19,800
I'm trying to wrap my head around (and can't figure out) how I'd approach this.

I need to randomize a text file (datafeed). So instead of it being alphabetical it would be random... no idea even how to start with this. Any help?
 

WiT8litZ

Senior Member
Joined
Nov 26, 2005
Messages
951
1) Import the lines into iTunes as new songs

2) Make a new playlist and call it what you want

3) Import the lines (now songs) into the new playlist

4) Hit shuffle (this is where the magic happens)

5) Export the playlist to xml

6) Parse the xml to whatever you want
 

Keeper

Honorary Master
Joined
Mar 29, 2008
Messages
23,624
not a .Net Expert or anything but surely with any language you can load the txt file into a listbox and then randomize the lines, and then save it ?

lol @ Wit8litz - I think he wants his code to do it, not manually through "itunes" :D
 

Jabberwocky

Expert Member
Joined
Aug 8, 2008
Messages
3,615
Make a macro in excel

import the text file into excel
column B =rand()
sort column B
export to a text file
 

sn3rd

Expert Member
Joined
Jan 18, 2008
Messages
4,305
How "random" do you need? Do you mean pseudorandom?

Is this for obfuscation / encryption? Or something else? i.e. Do you want it to appear random, or BE random?
 

guest2013-1

guest
Joined
Aug 22, 2003
Messages
19,800
Line 1
Line 2
Line 3
Line 4

becomes

Line 3
Line 1
Line 2
Line 4

Or any combination of.

This is an application that will be distributed. So not everyone would have excel or iTunes. WTF would I write something I can do myself in 5 minutes?

Loading the file into a Listbox/randomizing it won't be an option because the files can get quite big and the key here would be speed.
 

Keeper

Honorary Master
Joined
Mar 29, 2008
Messages
23,624
Loading the file into a Listbox/randomizing it won't be an option because the files can get quite big and the key here would be speed.

if speed is VERY important, how about loading line 1 of the text file, and putting it into another textfile, but INTO a random line. then line 2,3,4,etc...

for example:
at itteration 500 in the loop, it would load line 500 of the text file, but write it into line 123 of the second text file ?
at itteration 501 in the loop, it would load line 501 of the text file, but write it into line 43 of the second text file.


i'm sure it would be quite fast and there will be no problems like "the txt file is too big to add to a listbox" - ive done stuff similar to this and it is *quite fast* (faster than a listbox too, and more stable and no limitations)
 

dequadin

Expert Member
Joined
May 9, 2008
Messages
1,434
if speed is VERY important, how about loading line 1 of the text file, and putting it into another textfile, but INTO a random line. then line 2,3,4,etc...

What if you don't know how many lines are in the original source? What if there are only two lines, and you write the first into line 1000. Now you have a 1000 line text file with loads of wasted space, you can't do a second pass to cleanup the empty lines because who's to say there aren't blank lines in the source?

Doing this in memory will always be faster, you are doubling the disk IO (which is the slowest part), especially since Acid doesn't specify *what* he wants to do with the text once it's randomized.

@Acid how big is quite big? 50MB? 1GB?
 

guest2013-1

guest
Joined
Aug 22, 2003
Messages
19,800
The file size can get up to 2gb depending on how many lines there are in. Once it's randomized I write it to a new file and then split it up into pieces
 

dequadin

Expert Member
Joined
May 9, 2008
Messages
1,434
The file size can get up to 2gb depending on how many lines there are in. Once it's randomized I write it to a new file and then split it up into pieces

So reading in the entire file is not an option.

Do you need the whole file randomized and the pieces, or could you just output the pieces? Max size of each piece?
 

Keeper

Honorary Master
Joined
Mar 29, 2008
Messages
23,624
What if you don't know how many lines are in the original source? What if there are only two lines, and you write the first into line 1000. Now you have a 1000 line text file with loads of wasted space, you can't do a second pass to cleanup the empty lines because who's to say there aren't blank lines in the source?

Doing this in memory will always be faster, you are doubling the disk IO (which is the slowest part), especially since Acid doesn't specify *what* he wants to do with the text once it's randomized.

@Acid how big is quite big? 50MB? 1GB?

Easy peasy lemon squezy, dequadin.... let me tell you.



What if you don't know how many lines are in the original source?
easy.
most languages have a command that counts the number of lines in a text file *instantly*
mine is EOF (End of File) which returns a value with the amount of lines.



What if there are only two lines, and you write the first into line 1000.

easy.
1st iteration it will random between 0 and 1 - so line 1.
2nd iteration it will random between 0 and 2 - so it can randomize before, OR, after - so it will still be random even if there are only a *few* lines
55th iteration it will random between 0 and 55
1055th iteration it will random between 0 and 1055

So, It won't write into line 1000 if it hasn't reached the 1000th iteration - you use the iteration as the max value it can random to.



Now you have a 1000 line text file with loads of wasted space, you can't do a second pass to cleanup the empty lines because who's to say there aren't blank lines in the source?

no second pass needed as explained

Doing this in memory will always be faster, you are doubling the disk IO (which is the slowest part), especially since Acid doesn't specify *what* he wants to do with the text once it's randomized.

well loading 2GB into memory might take long too, and what if the user only has 512mb ram?
Memory is usually better, but the disk based one will work on any PC - even if it only has 256mb ram
 

herbertk

Expert Member
Joined
Oct 27, 2009
Messages
2,777
all depends how random you wanna go... you can create a bunch of string variables and just con-cat them randomly ... but it wont be true random...

what language are you dong this in ?
 

Keeper

Honorary Master
Joined
Mar 29, 2008
Messages
23,624
Acid, here is code for you in "Semi-coding-semi-English" - I have no idea what language you will be using..


use command to count how many lines the file has. (eg. total 200 lines detected)
For i = 1 to LineAmount (eg 1 to 200)
copy the files line i string (eg. line 55)
randomvalue = Generate a random number with a max of i (eg. 37)
output to file, into line randomvalue, and move lines down (do not replace)
Next i
 
Last edited:

HavocXphere

Honorary Master
Joined
Oct 19, 2007
Messages
33,155
1. Find out how many lines there are.
2. Jump to random line & copy to second file.
3. Remove copied line from first file completely.
4. Goto step 1 until lines = 0

Files can be opened in read&write mode so you should be able to do 2 & 3 in one go.
 

Keeper

Honorary Master
Joined
Mar 29, 2008
Messages
23,624
yeah, that could work too Havoc - but it might be slower because it needs to delete from file A too now - BUT, IMO, it will be fast enough also.


You can actually combine the two to make it even more random hehe!
grab random line string from fileA, insert into random line into fileB, delete line out of FileA
 

herbertk

Expert Member
Joined
Oct 27, 2009
Messages
2,777
ok misunderstood it... both above looks like it will suit your needs keepers will prob be quicker just put the file in an array and access the specific lines and add to a new array, or work with 3 arrays one for the random line numbers one original lines of the file and on new array...
 

dequadin

Expert Member
Joined
May 9, 2008
Messages
1,434
most languages have a command that counts the number of lines in a text file *instantly*
mine is EOF (End of File) which returns a value with the amount of lines.

Crap, you have to parse the entire file to count the lines, next...

1st iteration it will random between 0 and 1 - so line 1.
2nd iteration it will random between 0 and 2 - so it can randomize before, OR, after - so it will still be random even if there are only a *few* lines
55th iteration it will random between 0 and 55
1055th iteration it will random between 0 and 1055

So, It won't write into line 1000 if it hasn't reached the 1000th iteration - you use the iteration as the max value it can random to.

  1. This will give you a terrible distribution
  2. You'll most likely have to cater for collisions, since you're effectively scaling your random output. i.e. The first iteration (between 0 and 1) say it picks 1, second iteration (between 0 and 2) say it picks 1 again.

EDIT: I see you handle point 2 by "move lines down", this will take very long when you need to insert a line in position 10 and then move 1.5GB's of text "down one line".

well loading 2GB into memory might take long too, and what if the user only has 512mb ram?
Memory is usually better, but the disk based one will work on any PC - even if it only has 256mb ram

See my comment after Acid had given us the file size where I say it's impractical.
 
Last edited:
Top