List Info

Thread: Created: (HADOOP-1995) Path can not handle a file name that contains a back slash




Created: (HADOOP-1995) Path can not handle a file name that contains a back slash
country flaguser name
United States
2007-10-04 11:40:51
Path can not handle a file name that contains a back slash
----------------------------------------------------------

                 Key: HADOOP-1995
                 URL: htt
ps://issues.apache.org/jira/browse/HADOOP-1995
             Project: Hadoop
          Issue Type: Bug
          Components: fs
    Affects Versions: 0.14.1
            Reporter: Hairong Kuang
             Fix For: 0.16.0


When normalizing a path name, Path incorrectly converts a
back slash to a path separator even if  the path name is of
the unix style. This prohibs a glob from using a back slash
to escape a special character. A fix is to make path
normalization file system dependent.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue
online.


Commented: (HADOOP-1995) Path can not handle a file name that contains a back slash
country flaguser name
United States
2007-10-04 12:44:50
    [ https://issues.apache.org/jira/browse
/HADOOP-1995?page=com.atlassian.jira.plugin.system.issuetabp
anels:comment-tabpanel#action_12532489 ] 

Doug Cutting commented on HADOOP-1995:
--------------------------------------

> A fix is to make path normalization file system
dependent.

First, there's a technical problem, that normalization is
currently done when the FileSystem is unknown, under Path's
constructor.  But, even so, I'm not sure that will solve
it.

By this you mean that a local path that contains backslashes
will have them escaped by Path's constructor.  So that
"[bar,baz]" will be parsed as
"/[bar,baz]", while an HDFS path like
"[bar,baz]" will be parsed as
"[bar,baz]", so that the '[' is unavailable for
globbing.  But then applications which run on both unix and
Windows and using both the local fs and HDFS will have to
pass in different kinds of path strings, no?

Not all paths come from a FileSystem implementation, some
come from environment variables, config files, constant
strings in user code, etc.  Thus we must be able to handle
Windows file names passed to the Path constructor that have
not undergone special escaping, e.g., C:foobar should be
parsed as c:/foo/bar.  We've tried other approaches and
they've not worked well.

This is a hard problem to handle well:

http://www.cygwin.com/ml/cygwin/1999-06/msg00213.html

Perhaps we need to expect some Path-related things to be
broken on Windows, but make those be rarely used things. 
Windows paths that contains '[' or ']' simply might not work
correctly when passed to listPaths unless the user is
careful to insert escapes: we will not attempt to insert
such escapes automatically.  We would  only translate '' to
'/' when running on Windows, and only then when it's not
immediately followed by another backslash.  This will mean
that a directory whose name starts with a glob character
will not work correctly on Windows unless the developer
manually inserts appropriate escapes, but that globs will
work correctly on Windows.  My assumption is that
directories beginning with glob characters are much more
rare than uses of glob characters for globbing.  Could that
work?


> Path can not handle a file name that contains a back
slash
>
----------------------------------------------------------
>
>                 Key: HADOOP-1995
>                 URL: htt
ps://issues.apache.org/jira/browse/HADOOP-1995
>             Project: Hadoop
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 0.14.1
>            Reporter: Hairong Kuang
>             Fix For: 0.16.0
>
>
> When normalizing a path name, Path incorrectly converts
a back slash to a path separator even if  the path name is
of the unix style. This prohibs a glob from using a back
slash to escape a special character. A fix is to make path
normalization file system dependent.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue
online.


Commented: (HADOOP-1995) Path can not handle a file name that contains a back slash
country flaguser name
United States
2007-10-04 16:42:51
    [ https://issues.apache.org/jira/browse
/HADOOP-1995?page=com.atlassian.jira.plugin.system.issuetabp
anels:comment-tabpanel#action_12532531 ] 

Owen O'Malley commented on HADOOP-1995:
---------------------------------------

I would vote that all paths are uris and thus must use
"/" as the separator on all operating systems and
file systems. I would push the flip from "/" to
"" in the local file system when running on
windows. I don't know what would break, but I think the gain
in consistency would be worth it.

> Path can not handle a file name that contains a back
slash
>
----------------------------------------------------------
>
>                 Key: HADOOP-1995
>                 URL: htt
ps://issues.apache.org/jira/browse/HADOOP-1995
>             Project: Hadoop
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 0.14.1
>            Reporter: Hairong Kuang
>             Fix For: 0.16.0
>
>
> When normalizing a path name, Path incorrectly converts
a back slash to a path separator even if  the path name is
of the unix style. This prohibs a glob from using a back
slash to escape a special character. A fix is to make path
normalization file system dependent.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue
online.


Commented: (HADOOP-1995) Path can not handle a file name that contains a back slash
country flaguser name
United States
2007-10-04 17:28:50
    [ https://issues.apache.org/jira/browse
/HADOOP-1995?page=com.atlassian.jira.plugin.system.issuetabp
anels:comment-tabpanel#action_12532540 ] 

Doug Cutting commented on HADOOP-1995:
--------------------------------------

> I would vote that all paths are uris and thus must use
"/" as the separator on all operating systems and
file systems.

That would certainly be nice, and we try to do that as much
as possible.  Paths are always normalized this way.  But if
we start rejecting paths with backslashes, or interpreting
backslashes as quotations, Hadoop on windows will start
exploding all over the place, with no easy central place to
fix things.

> I would push the flip from "/" to
"" in the local file system when running on
windows.

As I mentioned above, not all paths come from a FileSystem
impl so we can't depend on this happening before we see a
path, and folks process paths in os-independent code,
traversing directories, so delaying it until the filesystem
sees the path won't work either.  I've tried the high road,
and it seems impassible.  There are also back-compatibility
constraints: we don't want to break user code, and a lot of
user code processes paths.

I think cygwin is a good analogy.  Cygwin tries to use unix
syntax and, at the same time, support windows paths from,
e.g., environment variables.  For the most part it works,
but there are a few edge cases where things don't work quite
the same, as in the email I cited above.  We need to
minimize those edge cases to rare situations and have a
ready workaround.  But we may not be able to easily
eliminate them.

You're welcome to try to try the high road yourself.  I've
already spent more hours than I care to trying to get Hadoop
paths to work transparently across Windows and linux.  The
current solution is not arbitrary, but the result of lots of
trial and error.



> Path can not handle a file name that contains a back
slash
>
----------------------------------------------------------
>
>                 Key: HADOOP-1995
>                 URL: htt
ps://issues.apache.org/jira/browse/HADOOP-1995
>             Project: Hadoop
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 0.14.1
>            Reporter: Hairong Kuang
>             Fix For: 0.16.0
>
>
> When normalizing a path name, Path incorrectly converts
a back slash to a path separator even if  the path name is
of the unix style. This prohibs a glob from using a back
slash to escape a special character. A fix is to make path
normalization file system dependent.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue
online.


[1-4]

about | contact  Other archives ( Real Estate discussion Medical topics )