List Info

Thread: new security doc using object-capabilities




new security doc using object-capabilities
user name
2006-09-07 03:38:07
Hi Brett,

Here are some comments on your proposal.  Sorry this took so
long.
I apologize if any of these comments are out of date (but
also look
forward to your answers to some of the questions, as
they'll help
me understand some more of the details of your proposal). 
Thanks!

> Introduction
> ///////////////////////////////////////
[...]
> Throughout this document several terms are going to be
used.  A
> "sandboxed interpreter" is one where the
built-in namespace is not the
> same as that of an interpreter whose built-ins were
unaltered, which
> is called an "unprotected interpreter".

Is this a definition or an implementation choice?  As in,
are you
defining "sandboxed" to mean "with altered
built-ins" or just
"restricted in some way", and does the above
mean to imply that
altering the built-ins is what triggers other kinds of
restrictions
(as it did in Python's old restricted execution mode)?

> A "bare interpreter" is one where the
built-in namespace has been
> stripped down the bare minimum needed to run any form
of basic Python
> program.  This means that all atomic types (i.e.,
syntactically
> supported types), ``object``, and the exceptions
provided by the
> ``exceptions`` module are considered in the built-in
namespace.  There
> have also been no imports executed in the interpreter.

Is a "bare interpreter" just one example of a
sandboxed interpreter,
or are all sandboxed interpreters in your design initially
bare (i.e.
"sandboxed" = "bare" + zero or more
granted authorities)?

> The "security domain" is the boundary at
which security is cared
> about.  For this dicussion, it is the interpreter.

It might be clearer to say (if i understand correctly)
"Each interpreter
is a separate security domain."

Many interpreters can run within a single operating system
process,
right?  Could you say a bit about what sort of concurrency
model you
have in mind?  How would this interact (if at all) with use
of the
existing threading functionality?

> The "powerbox" is the thing that possesses
the ultimate power in the
> system.  In our case it is the Python process.

This could also be the application process, right?

> Rationale
> ///////////////////////////////////////
[...]
> For instance, think of an application that supports a
plug-in system
> with Python as the language used for writing plug-ins. 
You do not
> want to have to examine every plug-in you download to
make sure that
> it does not alter your filesystem if you can help it. 
With a proper
> security model and implementation in place this
hinderance of having
> to examine all code you execute should be alleviated.

I'm glad to have this use case set out early in the
document, so the
reader can keep it in mind as an example while reading about
the model.

> Approaches to Security
> ///////////////////////////////////////
>
> There are essentially two types of security: who-I-am
> (permissions-based) security and what-I-have
(authority-based)
> security.

As Mark Miller mentioned in another message, your
descriptions of
"who-I-am" security and
"what-I-have" security make sense, but
they don't correspond to "permission" vs.
"authority".  They
correspond to "identity-based" vs.
"authority-based" security.

> Difficulties in Python for Object-Capabilities
> //////////////////////////////////////////////
[...]
> Three key requirements for providing a proper perimeter
defence is
> private namespaces, immutable shared state across
domains, and
> unforgeable references.

Nice summary.

> Problem of No Private Namespace
> ===============================
[...]
> The Python language has no such thing as a private
namespace.

Don't local scopes count as private namespaces?  It seems
clear
that they aren't designed with the intention of being
exposed,
unlike other namespaces in Python.

> It also makes providing security at the object level
using
> object-capabilities non-existent in pure Python code.

I don't think this is necessarily the case.  No Python code
i've
ever seen expects to be able to invade the local scopes of
other
functions, so you could use them as private namespaces. 
There
are two ways i've seen to invade local scopes:

    (a) Use gc.get_referents to get back from a cell object
        to its contents.

    (b) Compare the cell object to another cell object,
thereby
        causing __eq__ to be invoked to compare the contents
of
        the cells.

So you could protect local scopes by prohibiting these or by
simply turning off access to func_closure.  It's clear that
hardly
any code depends on these introspection featuresl, so it
would be
reasonble to turn them off in a sandboxed interpreter.  (It
seems
you would have to turn off some introspection features
anyway in
order to have reliable import guards.)

> Problem of Mutable Shared State
> ===============================
[...]
> Regardless, sharing of state that can be influenced by
another
> interpreter is not safe for object-capabilities.

Yup.

> Threat Model
> ///////////////////////////////////////

Good to see this specified here.  I like the way you've
broken this
down.

> * An interpreter cannot gain abilties the Python
process possesses
>   without explicitly being given those abilities.

It would be good to enumerate which abilities you're
referring to in
this item.  For example, a bare interpreter should be able
to allocate
memory and call most of the built-in functions, but should
not be able
to open network connections.

> * An interpreter cannot influence another interpreter
directly at the
>   Python level without explicitly allowing it.

You mean, without some other entity explicitly allowing it,
right?
What would that other entity be -- presumably the
interpreter that
spawned both of these sub-interpreters?

> * An interpreter cannot use operating system resources
without being
>   explicitly given those resources.

Okay.

> * A bare Python interpreter is always trusted.

What does "trusted" mean in the above?

> * Python bytecode is always distrusted.
> * Pure Python source code is always safe on its own.

It would be helpful to clarify "safe" here.  I
assume by "safe" you
mean that the Python source code can express whatever it
wants,
including potentially dangerous activities, but when run in
a bare
or sandboxed interpreter it cannot have harmful effects. 
But then
in what sense does the "safety" have to do with
the Python source code
rather than the restrictions on the interpreter?

Would it be correct to say:
  + We want to guarantee that Python source code cannot
violate
    the restrictions in a restricted or bare interpreter.
  + We do not prevent arbitrary Python bytecode from
violating
    these restrictions, and assume that it can.

>     + Malicious abilities are derived from C extension
modules,
>       built-in modules, and unsafe types implemented in
C, not from
>       pure Python source.

By "malicious" do you just mean "anything
that isn't accessible to
a bare interpreter"?

> * A sub-interpreter started by another interpreter does
not inherit
>   any state.

Do you envision a tree of interpreters and sub-interpreters?
 Can the
levels of spawning get arbitrarily deep?

If i am visualizing your model correctly, maybe it would be
useful to
introduce the term "parent", where each
interpreter has as its parent
either the Python process or another interpreter.  Then you
could say
that each interpreter acquires authority only by explicit
granting from
its parent.  Then i have another question: can an
interpreter acquire
authorities only when it is started, or can it acquire them
while it is
running, and how?

> Implementation
> ///////////////////////////////////////
>
> Guiding Principles
> ========================
>
> To begin, the Python process garners all power as the
powerbox.  It is
> up to the process to initially hand out access to
resources and
> abilities to interpreters.  This might take the form of
an interpreter
> with all abilities granted (i.e., a standard
interpreter as launched
> when you execute Python), which then creates
sub-interpreters with
> sandboxed abilities.  Another alternative is only
creating
> interpreters with sandboxed abilities (i.e., Python
being embedded in
> an application that only uses sandboxed interpreters).

This sounds like part of your design to me.  It might help
to have
this earlier in the document (maybe even with an example
diagram of a
tree of interpreters).

> All security measures should never have to ask who an
interpreter is.
> This means that what abilities an interpreter has
should not be stored
> at the interpreter level when the security can use a
proxy to protect
> a resource.  This means that while supporting a memory
cap can
> have a per-interpreter setting that is checked (because
access to the
> operating system's memory allocator is not supported
at the program
> level), protecting files and imports should not such a
per-interpreter
> protection at such a low level (because those can have
extension
> module proxies to provide the security).

It might be good to declare two categories of resources --
those
protected by object hiding and those protected by a
per-interpreter
setting -- and make lists.

> Backwards-compatibility will not be a hindrance upon
the design or
> implementation of the security model.  Because the
security model will
> inherently remove resources and abilities that existing
code expects,
> it is not reasonable to expect existing code to work in
a sandboxed
> interpreter.

You might qualify the last statement a bit.  For example, a
Python
implementation of a pure algorithm (e.g. string processing,
data
compression, etc.) would still work in a sandboxed
interpreter.

> Keeping Python "pythonic" is required for
all design decisions.

As Lawrence Oluyede also mentioned, it would be helpful to
say a
little more about what "pythonic" means.

> Restricting what is in the built-in namespace and the
safe-guarding
> the interpreter (which includes safe-guarding the
built-in types) is
> where security will come from.

Sounds good.

> Abilities of a Standard Sandboxed Interpreter
> =============================================
>
[...]
> * You cannot open any files directly.
> * Importation
>     + You can import any pure Python module.
>     + You cannot import any Python bytecode module.
>     + You cannot import any C extension module.
>     + You cannot import any built-in module.
> * You cannot find out any information about the
operating system you
>   are running on.
> * Only safe built-ins are provided.

This looks reasonable.  This is probably a good place to
itemize
exactly which built-ins are considered safe.

> Imports
> -------
>
> A proxy for protecting imports will be provided.  This
is done by
> setting the ``__import__()`` function in the built-in
namespace of the
> sandboxed interpreter to a proxied version of the
function.
>
> The planned proxy will take in a passed-in function to
use for the
> import and a whitelist of C extension modules and
built-in modules to
> allow importation of.

Presumably these are passed in to the proxy's constructor.

> If an import would lead to loading an extension
> or built-in module, it is checked against the whitelist
and allowed
> to be imported based on that list.  All .pyc and .pyo
file will not
> be imported.  All .py files will be imported.

I'm unclear about this.  Is the whitelist a list of module
names only,
or of filenames with extensions?  Does the normal
path-searching process
take place or can it be restricted in some way?  Would it
simplify the
security analysis to have the whitelist be a dictionary that
maps module
names to absolute pathnames?

If both the .py and .pyc are present, the normal import
would find the
.pyc file; would the import proxy reject such an import or
ignore it
and recompile the .py instead?

> It must be warned that importing any C extension module
is dangerous.

Right.

> Implementing Import in Python
> +++++++++++++++++++++++++++++
>
> To help facilitate in the exposure of more of what
importation
> requires (and thus make implementing a proxy easier),
the import
> machinery should be rewritten in Python.

This seems like a good idea.  Can you identify which minimum
essential
pieces of the import machinery have to be written in C?

> Sanitizing Built-In Types
> -------------------------
[...]
> Constructors
> ++++++++++++
>
> Almost all of Python's built-in types
> contain a constructor that allows code to create a new
instance of a
> type as long as you have the type itself. 
Unfortunately this does not
> work in an object-capabilities system without either
providing a proxy
> to the constructor or just turning it off.

The existence of the constructor isn't (by itself) the
problem.
The problem is that both of the following are true:

    (a) From any object you can get its type object.
    (b) Using any type object you can construct a new
instance.

So, you can control this either by hiding the type object,
separating
the constructor from the type, or disabling the constructor.

> Types whose constructors are considered dangerous are:
>
> * ``file``
>     + Will definitely use the ``open()`` built-in.
> * code objects
> * XXX sockets?
> * XXX type?
> * XXX

Looks good so far.  Not sure i see what's dangerous about
'type'.

> Filesystem Information
> ++++++++++++++++++++++
>
> When running code in a sandboxed interpreter, POLA
suggests that you
> do not want to expose information about your
environment on top of
> protecting its use.  This means that filesystem paths
typically should
> not be exposed.  Unfortunately, Python exposes file
paths all over the
> place:
>
> * Modules
>     + ``__file__`` attribute
> * Code objects
>     + ``co_filename`` attribute
> * Packages
>     + ``__path__`` attribute
> * XXX
>
> XXX how to expose safely?

It seems that in most cases, a single Python object is
associated with
a single pathname.  If that's true in general, one solution
would be
to provide an introspection function named 'getpath' or
something
similar that would get the path associated with any object. 
This
function might go in a module containing all the
introspection functions,
so imports of that module could be easily restricted.

> Mutable Shared State
> ++++++++++++++++++++
>
> Because built-in types are shared between interpreters,
they cannot
> expose any mutable shared state.  Unfortunately, as it
stands, some
> do.  Below is a list of types that share some form of
dangerous state,
> how they share it, and how to fix the problem:
>
> * ``object``
>     + ``__subclasses__()`` function
>         - Remove the function; never seen used in
real-world code.
> * XXX

Okay, more to work out here. 

> Perimeter Defences Between a Created Interpreter and
Its Creator
>
------------------------------------------------------------
----
>
> The plan is to allow interpreters to instantiate
sandboxed
> interpreters safely.  By using the creating
interpreter's abilities to
> provide abilities to the created interpreter, you make
sure there is
> no escalation in abilities.

Good.

> * ``__del__`` created in sandboxed interpreter but
object is cleaned
>   up in unprotected interpreter.

How do you envision the launching of a sandboxed interpreter
to look?
Could you sketch out some rough code examples?  Were you
thinking of
something like:

    sys.spawn(code, dict)
        code: a string containing Python source code
        dict: the global namespace in which to run the code

If you allow the parent interpreter to pass mutable objects
into the
child interpreter, then the parent and child can already
communicate
via the object, so '__del__' is a moot issue.  Do you want
to prevent
all communication between parent and child?  It's not
obvious to me
why that would be necessary.

> * Using frames to walk the frame stack back to another
interpreter.

Could you just disable introspection of the frame stack?

> Making the ``sys`` Module Safe
> ------------------------------
[...]
> This means that the ``sys`` module needs to have its
safe information
> separated out from the unsafe settings.

Yes.

> XXX separate modules, ``sys.settings`` and
``sys.info``, or strip
> ``sys`` to settings and put info somewhere else?  Or
provide a method
> that will create a faked sys module that has the safe
values copied
> into it?

I think the last suggestion above would lead to confusion. 
The two
groups should have two distinct names and it should be clear
which
attribute goes with which group.

> Protecting I/O
> ++++++++++++++
>
> The ``print`` keyword and the built-ins ``raw_input()``
and
> ``input()`` use the values stored in ``sys.stdout`` and
``sys.stdin``.
> By exposing these attributes to the creating
interpreter, one can set
> them to safe objects, such as instances of
``StringIO``.

Sounds good.

> Safe Networking
> ---------------
>
> XXX proxy on socket module, modify open() to be the
constructor, etc.

Lots more to think about here. 

> Protecting Memory Usage
> -----------------------
>
> To protect memory, low-level hooks into the memory
allocator for
> Python is needed.  By hooking into the C API for memory
allocation and
> deallocation a very rough running count of used memory
can kept.  This
> can be used to prevent sandboxed interpreters from
using so much
> memory that it impacts the overall performance of the
system.

Preventing denial-of-service is in general quite difficult,
but i
applaud the attempt.  I agree with your decision to separate
this
work from the rest of the security model.


-- ?!ng
_______________________________________________
Python-Dev mailing list
Python-Devpython.org
ht
tp://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/p
ython-dev/nessto%40sharedlog.com
new security doc using object-capabilities
user name
2006-09-07 18:26:53


On 9/6/06, Ka-Ping Yee <zesty.ca" target="_blank" onclick="return top.js.OpenExtLink(window,event,this)">python-devzesty.ca > wrote:
Hi Brett,

Here are some comments on your proposal.&nbsp; Sorry this took so long.
I apologize if any of these comments are out of date (but also look
forward to your answers to some of the questions, as they'll help
me understand some more of the details of your proposal).  ;Thanks!

I think they are slightly outdated.&nbsp; The latest version of the doc is in the bcannon-objcap branch and is named securing_python.txt ( http://svn.python.org/view/python/branches/bcannon-objcap/securing_python.txt ).

> Introduction
> ///////////////////////////////////////
[...]
> Throughout this document several terms are going to be used. ; A
&gt; "sandboxed interpreter" is one where the built-in namespace is not the
> same as that of an interpreter whose built-ins were unaltered, which
> is called an "unprotected interpreter".

Is this a definition or an implementation choice?&nbsp; As in, are you
defining "sandboxed" to mean "with altered built-ins&quot; or just
&quot;restricted in some way", and does the above mean to imply that
altering the built-ins is what triggers other kinds of restrictions
(as it did in Python's old restricted execution mode)?

There is no "triggering"; of other restrictions.  This is an implementation choice.&nbsp; "Sandboxed" means "with altered built-ins&quot;.

> A "bare interpreter" is one where the built-in namespace has been
> stripped down the bare minimum needed to run any form of basic Python
>; program.&nbsp; This means that all atomic types (i.e., syntactically
> supported types), ``object``, and the exceptions provided by the
> ``exceptions`` module are considered in the built-in namespace.  ;There
> have also been no imports executed in the interpreter.

Is a "bare interpreter" just one example of a sandboxed interpreter,
or are all sandboxed interpreters in your design initially bare (i.e.
&quot;sandboxed" = "bare" + zero or more granted authorities)?

You build up from a bare interpreter by adding in authorities (e.g., providing a wrapped version of open()) to reach the level of security you want.

> The "security domain&quot; is the boundary at which security is cared
> about.&nbsp; For this dicussion, it is the interpreter.

It might be clearer to say (if i understand correctly) "Each interpreter
is a separate security domain.&quot;
 ;
Many interpreters can run within a single operating system process,
right?

Yes.

 &nbsp;Could you say a bit about what sort of concurrency model you
have in mind?

None specifically.  Each new interpreter automatically runs in its own Python thread, so they have essentially the same concurrency as using the 'thread' module.

 &nbsp;How would this interact (if at all) with use of the
existing threading functionality?

See above.

> The "powerbox" is the thing that possesses the ultimate power in the
> system.&nbsp; In our case it is the Python process.

This could also be the application process, right?

If Python is embedded, yes.

> Rationale
> ///////////////////////////////////////
[...]
> For instance, think of an application that supports a plug-in system
&gt; with Python as the language used for writing plug-ins.&nbsp; You do not
> want to have to examine every plug-in you download to make sure that
> it does not alter your filesystem if you can help it. &nbsp;With a proper
&gt; security model and implementation in place this hinderance of having
>; to examine all code you execute should be alleviated.

I'm glad to have this use case set out early in the document, so the
reader can keep it in mind as an example while reading about the model.

&gt; Approaches to Security
&gt; ///////////////////////////////////////
>;
> There are essentially two types of security: who-I-am
&gt; (permissions-based) security and what-I-have (authority-based)
&gt; security.

As Mark Miller mentioned in another message, your descriptions of
"who-I-am&quot; security and "what-I-have&quot; security make sense, but
they don't correspond to "permission"; vs. "authority". &nbsp;They
correspond to "identity-based&quot; vs. "authority-based" security.

Right.&nbsp; This was fixed the day Mark and Alan Karp made the comment.

> Difficulties in Python for Object-Capabilities
&gt; //////////////////////////////////////////////
[...]
&gt; Three key requirements for providing a proper perimeter defence is
> private namespaces, immutable shared state across domains, and
> unforgeable references.

Nice summary.

> Problem of No Private Namespace
> ===============================
[...]
>; The Python language has no such thing as a private namespace.

Don't local scopes count as private namespaces? &nbsp;It seems clear
that they aren't designed with the intention of being exposed,
unlike other namespaces in Python.

Sort of.  But you can still get access to them if you have an execution frame and they are not persistent.  Generators are are worse since they store their execution frame with the generator itself, completely exposing the local namespace.

> It also makes providing security at the object level using
> object-capabilities non-existent in pure Python code.

I don't think this is necessarily the case. ; No Python code i've
ever seen expects to be able to invade the local scopes of other
functions, so you could use them as private namespaces. &nbsp;There
are two ways i've seen to invade local scopes:

&nbsp;   ;(a) Use gc.get_referents to get back from a cell object
&nbsp; &nbsp; &nbsp;   ;to its contents.

 &nbsp; &nbsp;(b) Compare the cell object to another cell object, thereby
&nbsp;   ; &nbsp; &nbsp;causing __eq__ to be invoked to compare the contents of
 &nbsp;   ; &nbsp; the cells.

Or the execution frame which is exposed directly on generators.

But regardless, the comment was meant to apply to Python as it stands, not that it couldn't be possibly tweaked somehow.

So you could protect local scopes by prohibiting these or by
simply turning off access to func_closure. &nbsp;It's clear that hardly
any code depends on these introspection featuresl, so it would be
reasonble to turn them off in a sandboxed interpreter. &nbsp;(It seems
you would have to turn off some introspection features anyway in
order to have reliable import guards.)

Maybe this can be changed in the future, but this more than I need at the moment so I am not going to go down that path right now.  But I added a quick mention of this.

> Problem of Mutable Shared State
> ===============================
[...]
> Regardless, sharing of state that can be influenced by another
&gt; interpreter is not safe for object-capabilities.

Yup.

>; Threat Model
> ///////////////////////////////////////

Good to see this specified here. ; I like the way you've broken this
down.

The current version has more details per point than the one you read.

> * An interpreter cannot gain abilties the Python process possesses
  without explicitly being given those abilities.

It would be good to enumerate which abilities you're referring to in
this item. ; For example, a bare interpreter should be able to allocate
memory and call most of the built-in functions, but should not be able
to open network connections.

> * An interpreter cannot influence another interpreter directly at the
>&nbsp;  Python level without explicitly allowing it.

You mean, without some other entity explicitly allowing it, right?

Yep.

What would that other entity be -- presumably the interpreter that
spawned both of these sub-interpreters?

Sure.  You could stick something in the built-in namespace of the sub-interpreter to use for communicating.

> * An interpreter cannot use operating system resources without being
>  ; explicitly given those resources.

Okay.

&gt; * A bare Python interpreter is always trusted.

What does "trusted" mean in the above?

It means that if Python source code can execute within a bare interpreter it is considered safe code. ; This is covered in the new version of the doc.

> * Python bytecode is always distrusted.
> * Pure Python source code is always safe on its own.

It would be helpful to clarify "safe" here. ; I assume by "safe" you
mean that the Python source code can express whatever it wants,
including potentially dangerous activities, but when run in a bare
or sandboxed interpreter it cannot have harmful effects.&nbsp; But then
in what sense does the "safety" have to do with the Python source code
rather than the restrictions on the interpreter?

Would it be correct to say:
&nbsp; + We want to guarantee that Python source code cannot violate
&nbsp; &nbsp; the restrictions in a restricted or bare interpreter.
 &nbsp;+ We do not prevent arbitrary Python bytecode from violating
  ; &nbsp;these restrictions, and assume that it can.

  ;  + Malicious abilities are derived from C extension modules,
> &nbsp;   ;  built-in modules, and unsafe types implemented in C, not from
>&nbsp;   ; &nbsp; pure Python source.

By "malicious" do you just mean "anything that isn't accessible to
a bare interpreter"?

Anything that could harm the system or interpreter.

> * A sub-interpreter started by another interpreter does not inherit
&gt; &nbsp; any state.

Do you envision a tree of interpreters and sub-interpreters?&nbsp; Can the
levels of spawning get arbitrarily deep?

Yes and yes.

If i am visualizing your model correctly, maybe it would be useful to
introduce the term "parent", where each interpreter has as its parent
either the Python process or another interpreter. &nbsp;Then you could say
that each interpreter acquires authority only by explicit granting from
its parent.

You could, although there is not hierarchy at the implementation level.&nbsp; But it works in terms of who has a reference to whom and who gives each interpreter their authority.
 

Then i have another question: can an interpreter acquire
authorities only when it is started, or can it acquire them while it is
running, and how?

&nbsp;Well, whatever you want to do through the built-in namespace.  So if you pass in a mutable object like a dict and add stuff to it on the fly, I don't see why you couldn't give new authorities on the fly.

> Implementation
> ///////////////////////////////////////
>;
> Guiding Principles
> ========================
>
&gt; To begin, the Python process garners all power as the powerbox.&nbsp; It is
> up to the process to initially hand out access to resources and
> abilities to interpreters. &nbsp;This might take the form of an interpreter
> with all abilities granted (i.e., a standard interpreter as launched
> when you execute Python), which then creates sub-interpreters with
> sandboxed abilities.  ;Another alternative is only creating
&gt; interpreters with sandboxed abilities (i.e., Python being embedded in
> an application that only uses sandboxed interpreters).

This sounds like part of your design to me. &nbsp;It might help to have
this earlier in the document (maybe even with an example diagram of a
tree of interpreters).

Made Guiding Principles its own section and split off the bottom part of the section and put it under Implementation.

> All security measures should never have to ask who an interpreter is.
> This means that what abilities an interpreter has should not be stored
>; at the interpreter level when the security can use a proxy to protect
&gt; a resource.&nbsp; This means that while supporting a memory cap can
> have a per-interpreter setting that is checked (because access to the
> operating system's memory allocator is not supported at the program
&gt; level), protecting files and imports should not such a per-interpreter
> protection at such a low level (because those can have extension
> module proxies to provide the security).

It might be good to declare two categories of resources -- those
protected by object hiding and those protected by a per-interpreter
setting -- and make lists.

That is rather unknown since I am constantly finding stuff that is global to the process compared to the interpreter, so making the list seems premature.

> Backwards-compatibility will not be a hindrance upon the design or
> implementation of the security model.&nbsp; Because the security model will
> inherently remove resources and abilities that existing code expects,
&gt; it is not reasonable to expect existing code to work in a sandboxed
> interpreter.

You might qualify the last statement a bit.  For example, a Python
implementation of a pure algorithm (e.g. string processing, data
compression, etc.) would still work in a sandboxed interpreter.

I tossed in "all&quot; to clarify.

> Keeping Python "pythonic" is required for all design decisions.

As Lawrence Oluyede also mentioned, it would be helpful to say a
little more about what "pythonic" means.

Done in the current version.

> Restricting what is in the built-in namespace and the safe-guarding
> the interpreter (which includes safe-guarding the built-in types) is
> where security will come from.

Sounds good.

>; Abilities of a Standard Sandboxed Interpreter
> =============================================
>
[...]
> * You cannot open any files directly.
> * Importation
; &nbsp;  + You can import any pure Python module.
&gt; &nbsp; &nbsp; + You cannot import any Python bytecode module.
&gt; &nbsp; &nbsp; + You cannot import any C extension module.
&gt; &nbsp; &nbsp; + You cannot import any built-in module.
&gt; * You cannot find out any information about the operating system you
>&nbsp;  are running on.
> * Only safe built-ins are provided.

This looks reasonable. &nbsp;This is probably a good place to itemize
exactly which built-ins are considered safe.

>; Imports
&gt; -------
&gt;
> A proxy for protecting imports will be provided.&nbsp; This is done by
> setting the ``__import__()`` function in the built-in namespace of the
> sandboxed interpreter to a proxied version of the function.
>
> The planned proxy will take in a passed-in function to use for the
> import and a whitelist of C extension modules and built-in modules to
> allow importation of.

Presumably these are passed in to the proxy's constructor.

Current plan is to expose the built-in namespace, imported modules, and sys module dict when creating an Interpreter instance.

> If an import would lead to loading an extension
> or built-in module, it is checked against the whitelist and allowed
&gt; to be imported based on that list. ; All .pyc and .pyo file will not
> be imported.&nbsp; All .py files will be imported.

I'm unclear about this. ; Is the whitelist a list of module names only,
or of filenames with extensions?

Have not deciced, but probably module name.

 &nbsp;Does the normal path-searching process
take place or can it be restricted in some way?

Have not decided.

 &nbsp;Would it simplify the
security analysis to have the whitelist be a dictionary that maps module
names to absolute pathnames?

Don't know. ; Protecting imports is the last thing I am going to implement since it is the trickiest.

If both the .py and .pyc are present, the normal import would find the
.pyc file; would the import proxy reject such an import or ignore it
and recompile the .py instead?

Somethign along those lines.

> It must be warned that importing any C extension module is dangerous.

Right.

&gt; Implementing Import in Python
&gt; +++++++++++++++++++++++++++++
&gt;
> To help facilitate in the exposure of more of what importation
> requires (and thus make implementing a proxy easier), the import
>; machinery should be rewritten in Python.

This seems like a good idea. ; Can you identify which minimum essential
pieces of the import machinery have to be written in C?

Loading of C extensions, stating files, reading files, etc.  Pretty much that requires help from the OS.

> Sanitizing Built-In Types
> -------------------------
[...]
> Constructors
> ++++++++++++
>
>; Almost all of Python's built-in types
> contain a constructor that allows code to create a new instance of a
> type as long as you have the type itself.&nbsp; Unfortunately this does not
> work in an object-capabilities system without either providing a proxy
> to the constructor or just turning it off.

The existence of the constructor isn't (by itself) the problem.
The problem is that both of the following are true:

&nbsp; &nbsp; (a) From any object you can get its type object.
&nbsp; &nbsp; (b) Using any type object you can construct a new instance.

So, you can control this either by hiding the type object, separating
the constructor from the type, or disabling the constructor.

I separated the constructor or initializer (tp_new or tp_init) into a factory function.

> Types whose constructors are considered dangerous are:
>
> * ``file``
&gt; &nbsp; &nbsp; + Will definitely use the ``open()`` built-in.
> * code objects
&gt; * XXX sockets?
&gt; * XXX type?
>; * XXX

Looks good so far.  Not sure i see what's dangerous about 'type'.

That's why it has the question mark. ; =)

> Filesystem Information
> ++++++++++++++++++++++
>
> When running code in a sandboxed interpreter, POLA suggests that you
> do not want to expose information about your environment on top of
> protecting its use.  This means that filesystem paths typically should
>; not be exposed.&nbsp; Unfortunately, Python exposes file paths all over the
> place:
>;
> * Modules
&gt; &nbsp; &nbsp; + ``__file__`` attribute
> * Code objects
&gt; &nbsp; &nbsp; + ``co_filename`` attribute
> * Packages
&gt; &nbsp; &nbsp; + ``__path__`` attribute
> * XXX
>
> XXX how to expose safely?

It seems that in most cases, a single Python object is associated with
a single pathname.&nbsp; If that's true in general, one solution would be
to provide an introspection function named 'getpath' or something
similar that would get the path associated with any object.&nbsp; This
function might go in a module containing all the introspection functions,
so imports of that module could be easily restricted.

That is the current thinking.

> Mutable Shared State
>; ++++++++++++++++++++
>
> Because built-in types are shared between interpreters, they cannot
>; expose any mutable shared state.&nbsp; Unfortunately, as it stands, some
> do. &nbsp;Below is a list of types that share some form of dangerous state,
&gt; how they share it, and how to fix the problem:
&gt;
> * ``object``
  ;  + ``__subclasses__()`` function
&gt; &nbsp; &nbsp;   ;  - Remove the function; never seen used in real-world code.
> * XXX

Okay, more to work out here.

Possibly.  I might have to wait until I am much closer to being done to discover more places where mutable shared state is exposed in a bare interpreter because I have not been able to think of anymore.

> Perimeter Defences Between a Created Interpreter and Its Creator
&gt; ----------------------------------------------------------------
>
> The plan is to allow interpreters to instantiate sandboxed
> interpreters safely.&nbsp; By using the creating interpreter's abilities to
> provide abilities to the created interpreter, you make sure there is
> no escalation in abilities.

Good.

&gt; * ``__del__`` created in sandboxed interpreter but object is cleaned
&gt; &nbsp; up in unprotected interpreter.

How do you envision the launching of a sandboxed interpreter to look?
Could you sketch out some rough code examples?

>>&gt; interp = interpreter.Interpreter ()
>&gt;> interp.builtins['open'] = wrapped_open()
>&gt;> interp.sys_dict['path'] = []
>>;> interp.exec("2 + 3")
&nbsp;

Were you thinking of
something like:

&nbsp; &nbsp;  sys.spawn(code, dict)
&nbsp; &nbsp; &nbsp; &nbsp; code: a string containing Python source code
 ; &nbsp; &nbsp; &nbsp; dict: the global namespace in which to run the code

If you allow the parent interpreter to pass mutable objects into the
child interpreter, then the parent and child can already communicate
via the object, so '__del__' is a moot issue.&nbsp; Do you want to prevent
all communication between parent and child?&nbsp; It's not obvious to me
why that would be necessary.

No, I don't since there should be a secure way to allow that. ; The __del__ worry came up from Guido pointing out you might be able to screw with it.  But if you pass in something implemented in C you should be okay.

> * Using frames to walk the frame stack back to another interpreter.

Could you just disable introspection of the frame stack?

If you don't allow importing of 'sys' then yes, and that is planned.&nbsp; I just wanted to make sure I didn't forget this needs to be protected.

I do need to check what a generator's frame exposes, though.

> Making the ``sys`` Module Safe
> ------------------------------
[...]
> This means that the ``sys`` module needs to have its safe information
> separated out from the unsafe settings.

Yes.

> XXX separate modules, ``sys.settings`` and ``sys.info``, or strip
> ``sys`` to settings and put info somewhere else? ; Or provide a method
&gt; that will create a faked sys module that has the safe values copied
&gt; into it?

I think the last suggestion above would lead to confusion.  ;The two
groups should have two distinct names and it should be clear which
attribute goes with which group.

This is also more complicated by the fact that some things are for the entire process while others are per interpreter.  Might have to separate things out even more.

> Protecting I/O
> ++++++++++++++
>
&gt; The ``print`` keyword and the built-ins ``raw_input()`` and
> ``input()`` use the values stored in ``sys.stdout`` and ``sys.stdin``.
> By exposing these attributes to the creating interpreter, one can set
> them to safe objects, such as instances of ``StringIO``.

Sounds good.

>; Safe Networking
> ---------------
>
> XXX proxy on socket module, modify open() to be the constructor, etc.

Lots more to think about here.

Oh yeah. ; =)

> Protecting Memory Usage
>; -----------------------
>
>; To protect memory, low-level hooks into the memory allocator for
> Python is needed.&nbsp; By hooking into the C API for memory allocation and
> deallocation a very rough running count of used memory can kept. ; This
> can be used to prevent sandboxed interpreters from using so much
> memory that it impacts the overall performance of the system.

Preventing denial-of-service is in general quite difficult, but i
applaud the attempt.&nbsp; I agree with your decision to separate this

The memory tracking has a proof-of-concept done in the bcannon-sandboxing branch.&nbsp; Not perfect, but it does show how one could go about accounting for every byte of data in terms of what it is basically used for.

-Brett
[1-2]

about | contact  Other archives ( Real Estate discussion Medical topics )