List Info

Thread: note 61866 added to function.split




note 61866 added to function.split
user name
2006-02-14 08:15:49
Some corrections to robin-at-teddyb's CSV splitting
function.  Recall that the point of this is to properly
implement a split() function that handles data exported to
CSV, where data containing commas gets quote-delimited.

* Problem 1: As jh-at-junetz pointed out, the +1 in robin's
nonquoted splitting command mistakenly adds an extra element
to the resulting array.
* Problem 2: If consecutive fields are quote-delimited, the
remaining "separator" between them only contains
one delimiter and no actual fields - so an extra element
gets added to the parsed array.
* Problem 3: When double-quotes appear in a spreadsheet
exported to CSV, they get escaped by doubling them, i.e. a
data field reading "this is a test of a
"special" case" gets written to CSV as,
"this is a test of a
""special"" case".  These
quotes are also interpreted as top-level delimiters and
(mistakenly) add extra array elements to the output.  

I have hacked a conversion of "" to a single
quote ( ' ), but a truly clever preg_split for the
top-level splitter (instead of the explode) might preserve
the original doubled "s without bugging up the
top-level parsing.  i.e., a smarter man than I could solve
the problem rather than avoiding it by replacing the bad
data.

(current) Solution:

<?php

function quotesplit( $splitter=',', $s, $restore_quotes=0
) {
	// hack because i'm a bad programmer - replace doubled
"s with a '
	$s = str_replace('""', "'", $s);
	
	//First step is to split it up into the bits that are
surrounded by quotes
	//and the bits that aren't. Adding the delimiter to the
ends simplifies
	//the logic further down
	$getstrings = explode('"', $splitter.$s.$splitter);

	//$instring toggles so we know if we are in a quoted string
or not
	$delimlen = strlen($splitter);
	$instring = 0;

	while (list($arg, $val) = each($getstrings)) {
		if ($instring==1) {
			if( $restore_quotes ) {
				//Add the whole string, untouched to the previous value
in the array
				$result[count($result)-1] =
$result[count($result)-1].'"'.$val.'"';
			} else {
				//Add the whole string, untouched to the array
				$result[] = $val;
			}
			$instring = 0;
		} else {
			// check that we have data between multiple $splitter
delimiters
			if ((strlen($val)-$delimlen-$delimlen) >= 1) {

				//Break up the string according to the delimiter
character
				//Each string has extraneous delimiters around it (inc
the ones we added
				//above), so they need to be stripped off
				$temparray = split($splitter, substr($val, $delimlen,
strlen($val)-$delimlen-$delimlen ) );

				while(list($iarg, $ival) = each($temparray)) {
					$result[] = trim($ival);
				}
			}
			// else, the next element needing parsing is a quoted
string and the comma
			// here is just a single separator and contains no data,
so skip it

			$instring = 1;
		}
	}

	return $result;
}

?>
----
Server IP: 64.71.164.2
Probable Submitter: 160.39.165.72
----
Manual Page -- http:
//www.php.net/manual/en/function.split.php
Edit        -- http://master.php.net/manage/user-notes.php?action=
edit+61866
Delete: added to the manual -- htt
p://master.php.net/manage/user-notes.php?action=delete+61866
&report=yes&reason=added+to+the+manual
Delete: bad code            -- http://master.
php.net/manage/user-notes.php?action=delete+61866&report
=yes&reason=bad+code
Delete: spam                -- http://master.php.
net/manage/user-notes.php?action=delete+61866&report=yes
&reason=spam
Delete: useless             -- http://master.p
hp.net/manage/user-notes.php?action=delete+61866&report=
yes&reason=useless
Delete: non-english         -- http://mast
er.php.net/manage/user-notes.php?action=delete+61866&rep
ort=yes&reason=non-english
Delete: already in docs     -- http://
master.php.net/manage/user-notes.php?action=delete+61866&
;report=yes&reason=already+in+docs
Delete: other reasons       -- http://master.php.net/manage/user-
notes.php?action=delete+61866&report=yes
Reject      -- http://master.php.net/manage/user-
notes.php?action=reject+61866&report=yes
Search      -- http://ma
ster.php.net/manage/user-notes.php

-- 
PHP Notes Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub
.php

note 61866 modified in function.split by didou
user name
2007-01-07 19:27:44
Some corrections to robin-at-teddyb's CSV splitting
function.  Recall that the point of this is to properly
implement a split() function that handles data exported to
CSV, where data containing commas gets quote-delimited.

* Problem 1: As jh-at-junetz pointed out, the +1 in robin's
nonquoted splitting command mistakenly adds an extra element
to the resulting array.
* Problem 2: If consecutive fields are quote-delimited, the
remaining "separator" between them only contains
one delimiter and no actual fields - so an extra element
gets added to the parsed array.
* Problem 3: When double-quotes appear in a spreadsheet
exported to CSV, they get escaped by doubling them, i.e. a
data field reading "this is a test of a
"special" case" gets written to CSV as,
"this is a test of a ""special""
case".  These quotes are also interpreted as top-level
delimiters and (mistakenly) add extra array elements to the
output.  

I have hacked a conversion of "" to a single quote
( ' ), but a truly clever preg_split for the top-level
splitter (instead of the explode) might preserve the
original doubled "s without bugging up the top-level
parsing.  i.e., a smarter man than I could solve the problem
rather than avoiding it by replacing the bad data.

(current) Solution:

<?php

function quotesplit( $splitter=',', $s, $restore_quotes=0 )
{
	// hack because i'm a bad programmer - replace doubled
"s with a '
	$s = str_replace('""', "'", $s);
	
	//First step is to split it up into the bits that are
surrounded by quotes
	//and the bits that aren't. Adding the delimiter to the
ends simplifies
	//the logic further down
	$getstrings = explode('"', $splitter.$s.$splitter);

	//$instring toggles so we know if we are in a quoted string
or not
	$delimlen = strlen($splitter);
	$instring = 0;

	while (list($arg, $val) = each($getstrings)) {
		if ($instring==1) {
			if( $restore_quotes ) {
				//Add the whole string, untouched to the previous value
in the array
				$result[count($result)-1] =
$result[count($result)-1].'"'.$val.'"';
			} else {
				//Add the whole string, untouched to the array
				$result[] = $val;
			}
			$instring = 0;
		} else {
			// check that we have data between multiple $splitter
delimiters
                        if ((strlen($val)-$delimlen) >=
1) {
			
				//Break up the string according to the delimiter
character
				//Each string has extraneous delimiters around it (inc
the ones we added
				//above), so they need to be stripped off
				$temparray = split($splitter, substr($val, $delimlen,
strlen($val)-$delimlen-$delimlen ) );

				while(list($iarg, $ival) = each($temparray)) {
					$result[] = trim($ival);
				}
			}
			// else, the next element needing parsing is a quoted
string and the comma
			// here is just a single separator and contains no data,
so skip it

			$instring = 1;
		}
	}

	return $result;
}

?>

--was--
Some corrections to robin-at-teddyb's CSV splitting
function.  Recall that the point of this is to properly
implement a split() function that handles data exported to
CSV, where data containing commas gets quote-delimited.

* Problem 1: As jh-at-junetz pointed out, the +1 in robin's
nonquoted splitting command mistakenly adds an extra element
to the resulting array.
* Problem 2: If consecutive fields are quote-delimited, the
remaining "separator" between them only contains
one delimiter and no actual fields - so an extra element
gets added to the parsed array.
* Problem 3: When double-quotes appear in a spreadsheet
exported to CSV, they get escaped by doubling them, i.e. a
data field reading "this is a test of a
"special" case" gets written to CSV as,
"this is a test of a ""special""
case".  These quotes are also interpreted as top-level
delimiters and (mistakenly) add extra array elements to the
output.  

I have hacked a conversion of "" to a single quote
( ' ), but a truly clever preg_split for the top-level
splitter (instead of the explode) might preserve the
original doubled "s without bugging up the top-level
parsing.  i.e., a smarter man than I could solve the problem
rather than avoiding it by replacing the bad data.

(current) Solution:

<?php

function quotesplit( $splitter=',', $s, $restore_quotes=0 )
{
	// hack because i'm a bad programmer - replace doubled
"s with a '
	$s = str_replace('""', "'", $s);
	
	//First step is to split it up into the bits that are
surrounded by quotes
	//and the bits that aren't. Adding the delimiter to the
ends simplifies
	//the logic further down
	$getstrings = explode('"', $splitter.$s.$splitter);

	//$instring toggles so we know if we are in a quoted string
or not
	$delimlen = strlen($splitter);
	$instring = 0;

	while (list($arg, $val) = each($getstrings)) {
		if ($instring==1) {
			if( $restore_quotes ) {
				//Add the whole string, untouched to the previous value
in the array
				$result[count($result)-1] =
$result[count($result)-1].'"'.$val.'"';
			} else {
				//Add the whole string, untouched to the array
				$result[] = $val;
			}
			$instring = 0;
		} else {
			// check that we have data between multiple $splitter
delimiters
			if ((strlen($val)-$delimlen-$delimlen) >= 1) {

				//Break up the string according to the delimiter
character
				//Each string has extraneous delimiters around it (inc
the ones we added
				//above), so they need to be stripped off
				$temparray = split($splitter, substr($val, $delimlen,
strlen($val)-$delimlen-$delimlen ) );

				while(list($iarg, $ival) = each($temparray)) {
					$result[] = trim($ival);
				}
			}
			// else, the next element needing parsing is a quoted
string and the comma
			// here is just a single separator and contains no data,
so skip it

			$instring = 1;
		}
	}

	return $result;
}

?>

http://ph
p.net/manual/en/function.split.php

-- 
PHP Notes Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub
.php

[1-2]

about | contact  Other archives ( Real Estate discussion Medical topics )