List Info

Thread: RE: gdb_realpath: dealing with ./ and ../




RE: gdb_realpath: dealing with ./ and ../
user name
2008-01-03 11:01:16

> -----Original Message-----
> From: Daniel Jacobowitz [mailto:drowfalse.org]
> Sent: January 3, 2008 11:47 AM
> To: Aleksandar Ristovski
> Cc: gdbsourceware.org; Ryan Mansfield
> Subject: Re: gdb_realpath: dealing with ./ and ../
> 
> 
> We should not call realpath on files which are not
known to exist on
> the system running GDB.  If we do that somewhere, it's
a bug.  

I am not sure it is a bug. At the time of gdb_realpath call,
gdb can not
know whether the path exists or not, it takes the path
information from
debug info. Second, it is perfectly normal situation where
sources are not
available at all, and paths mentioned in debug_info do not
physically exist
on the system.

>Your
> patch added several calls of that sort.

No, my patch only checked if realpath returns NULL in which
case instead of
returning filename as-is, it normalizes it. 

I also added gdb_realpath call in create_subfile to create
canonical
pathname for a new subfile, and to canonicalize path we are
looking for.

But I do not claim this is a perfect solution.

> 
> > > > When our cross-compiler generates
binary, it stores relative path in
> > > > .debug_line section (relative to
compilation dir), i.e. '..'.
> > >
> > > What's in .debug_info?  Also, what version of
GDB?
> >
> >
> > In my case:
> >      DW_AT_name        :
> >
c:/QNXTau/eclipse/ide-4.5-workspace/testManagedCC/main.cc>
;~~~~~$
> >      DW_AT_comp_dir    :
> >
c:/QNXTau/eclipse/ide-4.5-workspace/testManagedCC/Debug>~
~~~~
> 
> Well, that's why.  If DW_AT_name was main.cc, things
would have
> worked.  That's what GCC generates for me.  You have
debug info which
> refers to the same file using two different pathnames.
> 
> Perhaps we can forcibly associate the compilation unit
with the first
> entry in the file name table, if they have the same
basename and no
> other file in the line table matches the CU better.

Would this not introduce a bit of guessing? Consider a case
where there are
several files with the same basename but different paths
(possibly specified
using relative path elements, i.e. different pathnames like
in my case). In
this case none of the files with the same base name would
perfectly fit and
picking the first one will likely not give the correct
answer.

> 
> --
> Daniel Jacobowitz
> CodeSourcery

Re: gdb_realpath: dealing with ./ and ../
country flaguser name
United States
2008-01-03 11:08:46
On Thu, Jan 03, 2008 at 12:01:16PM -0500, Aleksandar
Ristovski wrote:
> > We should not call realpath on files which are not
known to exist on
> > the system running GDB.  If we do that somewhere,
it's a bug.  
> 
> I am not sure it is a bug. At the time of gdb_realpath
call, gdb can not
> know whether the path exists or not, it takes the path
information from
> debug info. Second, it is perfectly normal situation
where sources are not
> available at all, and paths mentioned in debug_info do
not physically exist
> on the system.

Paths from debug info are adjusted before we treat them as
local
paths.  e.g. set substitute-path.

> > Well, that's why.  If DW_AT_name was main.cc,
things would have
> > worked.  That's what GCC generates for me.  You
have debug info which
> > refers to the same file using two different
pathnames.
> > 
> > Perhaps we can forcibly associate the compilation
unit with the first
> > entry in the file name table, if they have the
same basename and no
> > other file in the line table matches the CU
better.
> 
> Would this not introduce a bit of guessing? Consider a
case where there are
> several files with the same basename but different
paths (possibly specified
> using relative path elements, i.e. different pathnames
like in my case). In
> this case none of the files with the same base name
would perfectly fit and
> picking the first one will likely not give the correct
answer.

It's a guess, but a good one.  99.9% of the time, a C file
does not
include another C file with the same basename.  If the
compiler is
going to generate debug info which refers to the same file
by two
different names, I don't see what else we can do.

An alternative would be to do some canonicalization - not
gdb_realpath, which accesses the filesystem, but just
string
manipulation - on the subfile names iff nothing matches the
main
file.  You could remove the ".." there.

-- 
Daniel Jacobowitz
CodeSourcery

Re: gdb_realpath: dealing with ./ and ../
country flaguser name
United States
2008-01-07 08:32:12
> An alternative would be to do some canonicalization -
not
> gdb_realpath, which accesses the filesystem, but just
string
> manipulation - on the subfile names iff nothing matches
the main
> file.  You could remove the ".." there.

Aleksandar liked this option best, but I think I liked your
first
suggestion more:

> > Would this not introduce a bit of guessing?
Consider a case where there are
> > several files with the same basename but different
paths (possibly specified
> > using relative path elements, i.e. different
pathnames like in my case). In
> > this case none of the files with the same base
name would perfectly fit and
> > picking the first one will likely not give the
correct answer.
> 
> It's a guess, but a good one.  99.9% of the time, a C
file does not
> include another C file with the same basename.  If the
compiler is
> going to generate debug info which refers to the same
file by two
> different names, I don't see what else we can do.

I looked at the whole discussion and at the patches, and
they seem
pretty simple in the idea, but pretty complex in the
implementation.

If I undertand the problem correctly, the problem is
matching our
main.cc entry in the linetable back to the main unit subfile
where
the name&comp_unit don't exactly match character for
character. Right?

Going further with the idea that 99.9% of the time, one file
does not
depend on another file with the same name, then how about
doing the
matching using basenames?  As a starting point, I propose
the following
idea: Modify start_subfile so that:

  1. file name is reduced to a basename only
     If there is any path information inside name (such as
".." or
     "/foo/bar/"), it is appended to the dir name.
As a special case,
     if name was an absolute path, then we ignore the
dirname and
     use the dirname from the filename only.

  2. subfile matching is done using the basename only.

I am attaching a patch that illustrates roughly what I have
in mind
(not compiled, not tested).  The great advantage (if it
works ,
is that no rewriting is necessary.

The reason why I agree that the changes should be done
inside
start-subfiles is that we don't want to have to handle this
sort of
problem with every debug info reader.  This was, the
handling is
centralized. It also looks a lot simpler than the patches I
have seen
that touch the DWARF reader.

The one thing that this might break, however, is when the
user
provides a relative path in the break command:

   (gdb) break bar/main.c

As a matter of fact, this is already broken if DW_AT_name is
main.c
and not bar/main.c. So I'm pretty sure that it'll break it
more.
But the good news is that it would be easy to fix it: Modify
the
matching to concat the dirname and basename and do the match
with
that.

Does this sound like it would work?

-- 
Joel

  
Re: gdb_realpath: dealing with ./ and ../
user name
2008-01-07 11:00:23
On Jan 7, 2008 6:32 AM, Joel Brobecker <brobeckeradacore.com> wrote:
> The reason why I agree that the changes should be done
inside
> start-subfiles is that we don't want to have to handle
this sort of
> problem with every debug info reader.  This was, the
handling is
> centralized. It also looks a lot simpler than the
patches I have seen
> that touch the DWARF reader.

To what extent is the dwarf patch more complex because it
handles the
possibility of multiple matches with the basename of the
main source
file.  Assume there's only one file with the basename of the
main file
name and things become a lot simpler.  See
http://sourceware.org/ml/gdb-patches/2008-01/msg00090
.html.
One could rewrite that patch to keep dwarf2_start_subfile,
but one
would have to pass an additional parameter so it would know
if it's
dealing with the main source file.  The patch as submitted
just
reorganizes things so that other solutions(/heuristics) are
possible
without major reworking of the code (the patch itself had to
do some
reworking, but once that's done it's done (in the problem
space being
discussed)).  Plus it simplifies all call sites to
dwarf2_start_subfile/start_subfile.  Previously, each call
site had to
process fe->dir_index, and there are three of them.

Handling the issue in each debug info reader is an
important
consideration and may or may not be a problem.  One would
need to
examine to what extent the issue exists in the other
readers, and to
what extent start_subfile can solve it and still be
debug-format
independent without being more complex.

> The one thing that this might break, however, is when
the user
> provides a relative path in the break command:
>
>    (gdb) break bar/main.c
>
> As a matter of fact, this is already broken if
DW_AT_name is main.c
> and not bar/main.c. So I'm pretty sure that it'll break
it more.
> But the good news is that it would be easy to fix it:
Modify the
> matching to concat the dirname and basename and do the
match with
> that.
>
> Does this sound like it would work?

It seems like directory names need to be included in the
file name
comparison by start_subfile to some extent, otherwise
headers with the
same basename will match each other.

e.g.

a/foo.h:
int afoo () { return 0; }

b/foo.h:
int bfoo () { return 0; }

foo.c:
#include "a/foo.h"
#include "b/foo.h"
int main () { return afoo () + bfoo (); }

Re: gdb_realpath: dealing with ./ and ../
country flaguser name
United States
2008-01-07 23:45:18
Doug,

> > Does this sound like it would work?
> 
> It seems like directory names need to be included in
the file name
> comparison by start_subfile to some extent, otherwise
headers with the
> same basename will match each other.

Yes, it very much looks like I've been pretty naive in my
first
proposal . I have
thought this over some more, and it looks like
indeed manual rewriting of the paths will be needed if we
want to
be able to handle all the situations currently discussed. So
here is
a new proposal that adds the dirname matching using a new
function that
will do the dirname matching first unmodified, and then
"normalized".

Now, back to your comment regarding changing the DWARF
reader:

> To what extent is the dwarf patch more complex because
it handles the
> possibility of multiple matches with the basename of
the main source
> file.  Assume there's only one file with the basename
of the main file
> name and things become a lot simpler.  See
> http://sourceware.org/ml/gdb-patches/2008-01/msg00090
.html.

Maybe I shouldn't have talked about complexity as this may
be just me
needing time to understand the purpose of your patch. So I
withdraw
that comment.

> Handling the issue in each debug info reader is an
important
> consideration and may or may not be a problem.  One
would need to
> examine to what extent the issue exists in the other
readers, and to
> what extent start_subfile can solve it and still be
debug-format
> independent without being more complex.

My suggestion has two ideas behind it:

    I reallly think that the wrapper around start_subfile
that adjusts
    NAME and DIRNAME so that NAME is always a basename is a
good step
    forward, because it reduces the number of combinations
we need to
    handle during the matching phase later on.  We don't
have to handle
    the case where NAME is a full path, or a relative path
of a basename.
    
    Second, I still believe at this point that the problem
should be
    handled outside of the debug info reader.  I know that
at least
    stabs should have the same issues as DWARF. It would be
very nice
    to have the problems fixed in both cases without having
to duplicate
    the algorithms we're developing.

> One could rewrite that patch to keep
dwarf2_start_subfile, but one
> would have to pass an additional parameter so it would
know if it's
> dealing with the main source file.  The patch as
submitted just
> reorganizes things so that other solutions(/heuristics)
are possible
> without major reworking of the code (the patch itself
had to do some
> reworking, but once that's done it's done (in the
problem space being
> discussed)).  Plus it simplifies all call sites to
> dwarf2_start_subfile/start_subfile.  Previously, each
call site had to
> process fe->dir_index, and there are three of them.

I think that the reorganization will not be necessary.  I'd
like to
make the subfile layer work in a way that the debug info
reader would
only have to pass the name and dirname as-is, and be
confident that
it'll just work.

-- 
Joel

  
Re: gdb_realpath: dealing with ./ and ../
user name
2008-01-08 13:54:14
On Jan 7, 2008 9:45 PM, Joel Brobecker <brobeckeradacore.com> wrote:
> I have thought this over some more, and it looks like
> indeed manual rewriting of the paths will be needed if
we want to
> be able to handle all the situations currently
discussed.

I think this point is still open for discussion, i.e. do we
want to
try to handle "all" the situations currently
discussed.  One can solve
the 99.9% case without resorting to path rewriting, one just
needs to
know that one particular path is the path of the main source
file.  If
one can know this (e.g. by an extra argument to
start_subfile, or
because the debug info readers have enough context to know
it), then
one can first do a full path comparison and if that fails
and the file
is the main file then do a basename comparison.

In order to make progress I think we need to decide on
whether we are
going to try heuristics (like path normalization) to handle
the 0.1%
case.  The 0.1% case being multiple files with the same base
name as
the main file.  [For completeness sake, there may be more
cases so the
math might not add up.]  And if so we also need to decide
whether
heuristics (like path normalization) will be applied to all
files, or
just the main file.  The proposed patch applies
normalization to all
files, the dwarf patch applies it to just files that are
potentially
the main file and then only if there are multiple files with
the same
basename as the main file.

Of course, we also need to decide on whether we're going to
handle
this problem at all.   Producers
of debug info should follow rules
more strictly, but I think consumers of debug info should be
a bit lax
if it's easy to do and reasonably useful.  [One can still
flag
violations with complaints so producers get fixed - that's
one thing
that should be added to any patch that goes in.  The dwarf
spec is
silent on this issue but I gather the intent is that paths
must be
consistent - this may get fixed in dwarf v4.]

> Maybe I shouldn't have talked about complexity as this
may be just me
> needing time to understand the purpose of your patch.
So I withdraw
> that comment.

Also remember that that patch also adds some simplification:
All three
call sites of dwarf2_start_subfile currently process
fe->dir_index.
One could move that into dwarf2_start_subfile of course, but
there is
still the issue of knowing a file is the main file.

> My suggestion has two ideas behind it:
>
>     I reallly think that the wrapper around
start_subfile that adjusts
>     NAME and DIRNAME so that NAME is always a basename
is a good step
>     forward, because it reduces the number of
combinations we need to
>     handle during the matching phase later on.  We
don't have to handle
>     the case where NAME is a full path, or a relative
path of a basename.

Or one could instead store basename and fullname in struct
subfile
(where "fullname" is the complete path that's
available).  This way
the comparison loop in start_subfile doesn't have to keep
prepending
directories, and can go back to the simple loop it use to
be.  [I
still have an open question regarding why DIRNAME isn't
prepended to
NAME before going into the comparison loop:
http://sourceware.org/ml/gdb-patches/2008-01/msg00091.
html]

> I think that the reorganization will not be necessary. 
I'd like to
> make the subfile layer work in a way that the debug
info reader would
> only have to pass the name and dirname as-is, and be
confident that
> it'll just work.

If one wants to handle this, and one wants to handle it in
start_subfile, then I think start_subfile needs to know when
it is
passed the main file so that it can punt to heuristics only
in this
case.  start_symtab is presumably a good place to tell
start_subfile
"this is the main file".  Either that or maybe use
last_source_file.

[1-6]

about | contact  Other archives ( Real Estate discussion Medical topics )