Need help parsing a simple string in PHP

zamrg

Senior Member
Joined
Oct 19, 2005
Messages
804
Reaction score
11
Location
Cape Town
The application receives this response from an API.

eg:
firstname=Hello;surname=World;address=My Address Here etc etc

I can't split/explode the string because any parameter can have a ; inside.
 
You will need to walk through the string using regular expressions or an old school "for loop". Psudo code would be to say that everything up till the "=" is your field name, mark it (ptr) then you walk through till you get a ";" mark it as end of field value, then walk through till you get either another ";" set the marker again (end of field value). If you get a "=" then you know its a field name.

Have fun.
 
OK, this was fun (just taking a break from some other PHP code I'm writing). I would rather do it with regex (more geeky and so much cooler):

Here is a crude version:

PHP:
<?php

$str="firstname=Hello;world;surname=World;address=My Address; Here; etc etc";

$field = "";
$list=array();

foreach (explode(";", $str) as $x) {
  // do we have a valid field seperator?                                                                                                                     
  if(strpos($x, "=")===False)  // no                                                                                                                         
    $field .= ';'.$x;
  else
    $field = $x; // yes                                                                                                                                      

  $tmp = explode("=", $field);
  $list[$tmp[0]]=$tmp[1];
}

print_r($list);

EDIT: This was just a quick slap. You would need to extend this code to catch exceptions where you only have one field.

eg

if(is_array(explode(";", $str))==True) {
foreach (explode(";", $str) as $x) {
....
} else {
$tmp = explode("=", $str);
$list[$tmp[0]]=$tmp[1];
}
 
Last edited:
Oh and the output is:

LOCAL: chrisb@rhino [ 16:04:58 ][ ~ ]
> ./play.php
Array
(
[firstname] => Hello;world
[surname] => World
[address] => My Address; Here; etc etc
)
 

... may as well have linked me to google

Oh and the output is:

LOCAL: chrisb@rhino [ 16:04:58 ][ ~ ]
> ./play.php
Array
(
[firstname] => Hello;world
[surname] => World
[address] => My Address; Here; etc etc
)

thanks cbrunsdonza

I originally tried parsing it character by character, but it was too painful a solution

I tried a regex split, something like (([a-z0-9]+)=(.*?);)+ but I'm evidently not too clued up on regexes :) and I was battling with the greedy ;

I landed up using this; not sure how efficient it is but it works.

Code:
$line = 'id=1;firstname=First Name;surname=Surname;Address=Try;To;Break;Me';

$results = array();
$segments = explode(';', $line);
$last_key = '';

foreach ($segments as $segment)
{
	if (preg_match('/^([a-z0-9]+)=(.*)$/i', $segment, $matches))
	{
		$last_key = $matches[1];

		$results[$last_key] = $matches[2];
	}
	else
	{
		$results[$last_key] .=  ';' . $segment;
	}
}
 
Last edited:
That's fair enough. So long as it works, its good enough to start. Depends on how many times you will have to run that query in a given period of time.

If you ever need to delve more into this, these are the functions you can generally use for this kind of thing:

preg_match() - Perform a regular expression match
stristr() - Case-insensitive strstr
strpos() - Find position of first occurrence of a string
strrchr() - Find the last occurrence of a character in a string
substr() - Return part of a string

PHP's docs are perfectly adequate in explaining and demonstrating how to do this.
 
I would:
1 - Build a list of all keys.
2 - Find all keys and split on key.
3 - Solve for key value after the =.
4 - Strip trailing ;

Keys are:
firstname=
surname=
address=
...

Add error handler for keys appearing more than once.

Other approaches would lead to problems when values contain "=" or ";".
 
... may as well have linked me to google



thanks cbrunsdonza

I originally tried parsing it character by character, but it was too painful a solution

I tried a regex split, something like (([a-z0-9]+)=(.*?);)+ but I'm evidently not too clued up on regexes :) and I was battling with the greedy ;

I landed up using this; not sure how efficient it is but it works.

Code:
$line = 'id=1;firstname=First Name;surname=Surname;Address=Try;To;Break;Me';

$results = array();
$segments = explode(';', $line);
$last_key = '';

foreach ($segments as $segment)
{
	if (preg_match('/^([a-z0-9]+)=(.*)$/i', $segment, $matches))
	{
		$last_key = $matches[1];

		$results[$last_key] = $matches[2];
	}
	else
	{
		$results[$last_key] .=  ';' . $segment;
	}
}

If you wanted to do it without regex you could just as easily use substr to do it. My personal philosophy is if you can do it without regex, do it. Use regex as a last resort, not a first.
 
I would:
1 - Build a list of all keys.
2 - Find all keys and split on key.
3 - Solve for key value after the =.
4 - Strip trailing ;

Keys are:
firstname=
surname=
address=
...

Add error handler for keys appearing more than once.

Other approaches would lead to problems when values contain "=" or ";".

The problem with this approach is that I don't always know the keys beforehand, so the function has to be generic; and like you said, the parser would easily break if a value contained "key=".

If you wanted to do it without regex you could just as easily use substr to do it. My personal philosophy is if you can do it without regex, do it. Use regex as a last resort, not a first.

ye, I changed my solution to use substr instead.
 
this assumes lines end with a ';'
it allows for ';', but not a '=' in the variable or value declaration
it would make things easier if there were some kind of escape character

PHP:
$string = 'var;1=value1;2;var2=value2;var3=value;3;';
preg_match_all('/([^;][^=]+)=([^=]+)(?=;)/', $string, $matches);
$result = array_combine($matches[1], $matches[2]);
 
Last edited:
Just remember that CPU cycles are cheap so rather write a few extra lines of code if you it makes more sense. We no longer need to code like we did back in the 80's where we spent 99% of our time optimising a single line.
 
Just remember that CPU cycles are cheap so rather write a few extra lines of code if you it makes more sense. We no longer need to code like we did back in the 80's where we spent 99% of our time optimising a single line.

This is very true, although I was not programming in the 80's :D
 
Top
Sign up to the MyBroadband newsletter
X