Some corrections to robin-at-teddyb's CSV splitting
function. Recall that the point of this is to properly
implement a split() function that handles data exported to
CSV, where data containing commas gets quote-delimited.
* Problem 1: As jh-at-junetz pointed out, the +1 in robin's
nonquoted splitting command mistakenly adds an extra element
to the resulting array.
* Problem 2: If consecutive fields are quote-delimited, the
remaining "separator" between them only contains
one delimiter and no actual fields - so an extra element
gets added to the parsed array.
* Problem 3: When double-quotes appear in a spreadsheet
exported to CSV, they get escaped by doubling them, i.e. a
data field reading "this is a test of a
"special" case" gets written to CSV as,
"this is a test of a ""special""
case". These quotes are also interpreted as top-level
delimiters and (mistakenly) add extra array elements to the
output.
I have hacked a conversion of "" to a single quote
( ' ), but a truly clever preg_split for the top-level
splitter (instead of the explode) might preserve the
original doubled "s without bugging up the top-level
parsing. i.e., a smarter man than I could solve the problem
rather than avoiding it by replacing the bad data.
(current) Solution:
<?php
function quotesplit( $splitter=',', $s, $restore_quotes=0 )
{
// hack because i'm a bad programmer - replace doubled
"s with a '
$s = str_replace('""', "'", $s);
//First step is to split it up into the bits that are
surrounded by quotes
//and the bits that aren't. Adding the delimiter to the
ends simplifies
//the logic further down
$getstrings = explode('"', $splitter.$s.$splitter);
//$instring toggles so we know if we are in a quoted string
or not
$delimlen = strlen($splitter);
$instring = 0;
while (list($arg, $val) = each($getstrings)) {
if ($instring==1) {
if( $restore_quotes ) {
//Add the whole string, untouched to the previous value
in the array
$result[count($result)-1] =
$result[count($result)-1].'"'.$val.'"';
} else {
//Add the whole string, untouched to the array
$result[] = $val;
}
$instring = 0;
} else {
// check that we have data between multiple $splitter
delimiters
if ((strlen($val)-$delimlen) >=
1) {
//Break up the string according to the delimiter
character
//Each string has extraneous delimiters around it (inc
the ones we added
//above), so they need to be stripped off
$temparray = split($splitter, substr($val, $delimlen,
strlen($val)-$delimlen-$delimlen ) );
while(list($iarg, $ival) = each($temparray)) {
$result[] = trim($ival);
}
}
// else, the next element needing parsing is a quoted
string and the comma
// here is just a single separator and contains no data,
so skip it
$instring = 1;
}
}
return $result;
}
?>
--was--
Some corrections to robin-at-teddyb's CSV splitting
function. Recall that the point of this is to properly
implement a split() function that handles data exported to
CSV, where data containing commas gets quote-delimited.
* Problem 1: As jh-at-junetz pointed out, the +1 in robin's
nonquoted splitting command mistakenly adds an extra element
to the resulting array.
* Problem 2: If consecutive fields are quote-delimited, the
remaining "separator" between them only contains
one delimiter and no actual fields - so an extra element
gets added to the parsed array.
* Problem 3: When double-quotes appear in a spreadsheet
exported to CSV, they get escaped by doubling them, i.e. a
data field reading "this is a test of a
"special" case" gets written to CSV as,
"this is a test of a ""special""
case". These quotes are also interpreted as top-level
delimiters and (mistakenly) add extra array elements to the
output.
I have hacked a conversion of "" to a single quote
( ' ), but a truly clever preg_split for the top-level
splitter (instead of the explode) might preserve the
original doubled "s without bugging up the top-level
parsing. i.e., a smarter man than I could solve the problem
rather than avoiding it by replacing the bad data.
(current) Solution:
<?php
function quotesplit( $splitter=',', $s, $restore_quotes=0 )
{
// hack because i'm a bad programmer - replace doubled
"s with a '
$s = str_replace('""', "'", $s);
//First step is to split it up into the bits that are
surrounded by quotes
//and the bits that aren't. Adding the delimiter to the
ends simplifies
//the logic further down
$getstrings = explode('"', $splitter.$s.$splitter);
//$instring toggles so we know if we are in a quoted string
or not
$delimlen = strlen($splitter);
$instring = 0;
while (list($arg, $val) = each($getstrings)) {
if ($instring==1) {
if( $restore_quotes ) {
//Add the whole string, untouched to the previous value
in the array
$result[count($result)-1] =
$result[count($result)-1].'"'.$val.'"';
} else {
//Add the whole string, untouched to the array
$result[] = $val;
}
$instring = 0;
} else {
// check that we have data between multiple $splitter
delimiters
if ((strlen($val)-$delimlen-$delimlen) >= 1) {
//Break up the string according to the delimiter
character
//Each string has extraneous delimiters around it (inc
the ones we added
//above), so they need to be stripped off
$temparray = split($splitter, substr($val, $delimlen,
strlen($val)-$delimlen-$delimlen ) );
while(list($iarg, $ival) = each($temparray)) {
$result[] = trim($ival);
}
}
// else, the next element needing parsing is a quoted
string and the comma
// here is just a single separator and contains no data,
so skip it
$instring = 1;
}
}
return $result;
}
?>
http://ph
p.net/manual/en/function.split.php
--
PHP Notes Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub
.php
|