chiark / gitweb /
namespace: beef up read-only bind mount logic
[elogind.git] / man / systemd.exec.xml
index 8b7645c4d6993b66fc3dafc7ae8adfeefae1a11d..c419424d9d6d0603587dc71b8b8e31baabe8bbeb 100644 (file)
                                 <listitem><para>Controls the CPU
                                 affinity of the executed
                                 processes. Takes a space-separated
-                                list of CPU indexes. This option may
+                                list of CPU indices. This option may
                                 be specified more than once in which
                                 case the specificed CPU affinity masks
                                 are merged. If the empty string is
 
                                 <para>The files listed with this
                                 directive will be read shortly before
-                                the process is executed. Settings from
-                                these files override settings made
-                                with
+                                the process is executed (more
+                                specifically, after all
+                                processes from a previous unit state
+                                terminated. This means you can
+                                generate these files in one unit
+                                state, and read it with this option in
+                                the next). Settings from these files
+                                override settings made with
                                 <varname>Environment=</varname>. If
                                 the same variable is set twice from
                                 these files, the files will be read in
                         <varlistentry>
                                 <term><varname>StandardError=</varname></term>
                                 <listitem><para>Controls where file
-                                descriptor 2 (STDERR) of the executed
-                                processes is connected to. The
-                                available options are identical to
+                                descriptor 2 (STDERR) of the
+                                executed processes is connected to.
+                                The available options are identical to
                                 those of
                                 <varname>StandardOutput=</varname>,
                                 with one exception: if set to
                         <varlistentry>
                                 <term><varname>TTYPath=</varname></term>
                                 <listitem><para>Sets the terminal
-                                device node to use if STDIN, STDOUT,
-                                or STDERR are connected to a
+                                device node to use if standard input, output,
+                                or error are connected to a
                                 TTY (see above). Defaults to
                                 <filename>/dev/console</filename>.</para></listitem>
                         </varlistentry>
                                 for details.</para></listitem>
                         </varlistentry>
 
-                        <varlistentry>
-                                <term><varname>TCPWrapName=</varname></term>
-                                <listitem><para>If this is a
-                                socket-activated service, this sets the
-                                tcpwrap service name to check the
-                                permission for the current connection
-                                with. This is only useful in
-                                conjunction with socket-activated
-                                services, and stream sockets (TCP) in
-                                particular. It has no effect on other
-                                socket types (e.g. datagram/UDP) and
-                                on processes unrelated to socket-based
-                                activation. If the tcpwrap
-                                verification fails, daemon start-up
-                                will fail and the connection is
-                                terminated. See
-                                <citerefentry><refentrytitle>tcpd</refentrytitle><manvolnum>8</manvolnum></citerefentry>
-                                for details. Note that this option may
-                                be used to do access control checks
-                                only. Shell commands and commands
-                                described in
-                                <citerefentry><refentrytitle>hosts_options</refentrytitle><manvolnum>5</manvolnum></citerefentry>
-                                are not supported.</para></listitem>
-                        </varlistentry>
-
                         <varlistentry>
                                 <term><varname>CapabilityBoundingSet=</varname></term>
 
                                 capability sets as documented in
                                 <citerefentry><refentrytitle>cap_from_text</refentrytitle><manvolnum>3</manvolnum></citerefentry>.
                                 Note that these capability sets are
-                                usually influenced by the capabilities
+                                usually influenced (and filtered) by the capabilities
                                 attached to the executed file. Due to
                                 that
                                 <varname>CapabilityBoundingSet=</varname>
                                 <term><varname>ReadOnlyDirectories=</varname></term>
                                 <term><varname>InaccessibleDirectories=</varname></term>
 
-                                <listitem><para>Sets up a new
-                                file system namespace for executed
+                                <listitem><para>Sets up a new file
+                                system namespace for executed
                                 processes. These options may be used
                                 to limit access a process might have
                                 to the main file system
                                 processes inside the namespace. Note
                                 that restricting access with these
                                 options does not extend to submounts
-                                of a directory. You must list
-                                submounts separately in these settings
-                                to ensure the same limited
-                                access. These options may be specified
+                                of a directory that are created later
+                                on. These options may be specified
                                 more than once in which case all
                                 directories listed will have limited
                                 access from within the namespace. If
                                 the empty string is assigned to this
-                                option, the specific list is reset, and
-                                all prior assignments have no
+                                option, the specific list is reset,
+                                and all prior assignments have no
                                 effect.</para>
                                 <para>Paths in
                                 <varname>ReadOnlyDirectories=</varname>
                                 may be prefixed with
                                 <literal>-</literal>, in which case
                                 they will be ignored when they do not
-                                exist.</para></listitem>
+                                exist. Note that using this
+                                setting will disconnect propagation of
+                                mounts from the service to the host
+                                (propagation in the opposite direction
+                                continues to work). This means that
+                                this setting may not be used for
+                                services which shall be able to
+                                install mount points in the main mount
+                                namespace.</para></listitem>
                         </varlistentry>
 
                         <varlistentry>
                                 processes via
                                 <filename>/tmp</filename> or
                                 <filename>/var/tmp</filename>
-                                impossible. All temporary data created
-                                by service will be removed after
-                                the service is stopped. Defaults to
-                                false. Note that it is possible to run
-                                two or more units within the same
-                                private <filename>/tmp</filename> and
+                                impossible. If this is enabled, all
+                                temporary files created by a service
+                                in these directories will be removed
+                                after the service is stopped. Defaults
+                                to false. It is possible to run two or
+                                more units within the same private
+                                <filename>/tmp</filename> and
                                 <filename>/var/tmp</filename>
                                 namespace by using the
                                 <varname>JoinsNamespaceOf=</varname>
                                 directive, see
                                 <citerefentry><refentrytitle>systemd.unit</refentrytitle><manvolnum>5</manvolnum></citerefentry>
-                                for details.</para></listitem>
+                                for details. Note that using this
+                                setting will disconnect propagation of
+                                mounts from the service to the host
+                                (propagation in the opposite direction
+                                continues to work). This means that
+                                this setting may not be used for
+                                services which shall be able to install
+                                mount points in the main mount
+                                namespace.</para></listitem>
+                        </varlistentry>
+
+                        <varlistentry>
+                                <term><varname>PrivateDevices=</varname></term>
+
+                                <listitem><para>Takes a boolean
+                                argument. If true, sets up a new /dev
+                                namespace for the executed processes
+                                and only adds API pseudo devices such
+                                as <filename>/dev/null</filename>,
+                                <filename>/dev/zero</filename> or
+                                <filename>/dev/random</filename> (as
+                                well as the pseudo TTY subsystem) to
+                                it, but no physical devices such as
+                                <filename>/dev/sda</filename>. This is
+                                useful to securely turn off physical
+                                device access by the executed
+                                process. Defaults to false. Enabling
+                                this option will also remove
+                                <constant>CAP_MKNOD</constant> from
+                                the capability bounding set for the
+                                unit (see above), and set
+                                <varname>DevicePolicy=closed</varname>
+                                (see
+                                <citerefentry><refentrytitle>systemd.resource-control</refentrytitle><manvolnum>5</manvolnum></citerefentry>
+                                for details). Note that using this
+                                setting will disconnect propagation of
+                                mounts from the service to the host
+                                (propagation in the opposite direction
+                                continues to work). This means that
+                                this setting may not be used for
+                                services which shall be able to
+                                install mount points in the main mount
+                                namespace.</para></listitem>
                         </varlistentry>
 
                         <varlistentry>
                                 available to the executed process.
                                 This is useful to securely turn off
                                 network access by the executed
-                                process. Defaults to false. Note that
-                                it is possible to run two or more
-                                units within the same private network
+                                process. Defaults to false. It is
+                                possible to run two or more units
+                                within the same private network
                                 namespace by using the
                                 <varname>JoinsNamespaceOf=</varname>
                                 directive, see
                                 <citerefentry><refentrytitle>systemd.unit</refentrytitle><manvolnum>5</manvolnum></citerefentry>
-                                for details.</para></listitem>
+                                for details. Note that this option
+                                will disconnect all socket families
+                                from the host, this includes
+                                AF_NETLINK and AF_UNIX. The latter has
+                                the effect that AF_UNIX sockets in the
+                                abstract socket namespace will become
+                                unavailable to the processes (however,
+                                those located in the file system will
+                                continue to be
+                                accessible).</para></listitem>
                         </varlistentry>
 
                         <varlistentry>
-                                <term><varname>PrivateDevices=</varname></term>
+                                <term><varname>ProtectSystem=</varname></term>
 
                                 <listitem><para>Takes a boolean
-                                argument. If true, sets up a new /dev
-                                namespace for the executed processes
-                                and only adds API pseudo devices such
-                                as <filename>/dev/null</filename>,
-                                <filename>/dev/zero</filename> or
-                                <filename>/dev/random</filename> to
-                                it, but no physical devices such as
-                                <filename>/dev/sda</filename>. This is
-                                useful to securely turn off physical
-                                device access by the executed
-                                process. Defaults to
-                                false.</para></listitem>
+                                argument or
+                                <literal>full</literal>. If true,
+                                mounts the <filename>/usr</filename>
+                                directory read-only for processes
+                                invoked by this unit. If set to
+                                <literal>full</literal> the
+                                <filename>/etc</filename> is mounted
+                                read-only, too. This setting ensures
+                                that any modification of the vendor
+                                supplied operating system (and
+                                optionally its configuration) is
+                                prohibited for the service. It is
+                                recommended to enable this setting for
+                                all long-running services, unless they
+                                are involved with system updates or
+                                need to modify the operating system in
+                                other ways. Note however, that
+                                processes retaining the CAP_SYS_ADMIN
+                                capability can undo the effect of this
+                                setting. This setting is hence
+                                particularly useful for daemons which
+                                have this capability removed, for
+                                example with
+                                <varname>CapabilityBoundingSet=</varname>. Defaults
+                                to off.</para></listitem>
+                        </varlistentry>
+
+                        <varlistentry>
+                                <term><varname>ProtectHome=</varname></term>
+
+                                <listitem><para>Takes a boolean
+                                argument or
+                                <literal>read-only</literal>. If true,
+                                the directories
+                                <filename>/home</filename> and
+                                <filename>/run/user</filename> are
+                                made inaccessible and empty for
+                                processes invoked by this unit. If set
+                                to <literal>read-only</literal> the
+                                two directores are made read-only
+                                instead. It is recommended to enable
+                                this setting for all long-running
+                                services (in particular network-facing
+                                ones), to ensure they cannot get access
+                                to private user data, unless the
+                                services actually require access to
+                                the user's private data. Note however,
+                                that processes retaining the
+                                CAP_SYS_ADMIN capability can undo the
+                                effect of this setting. This setting
+                                is hence particularly useful for
+                                daemons which have this capability
+                                removed, for example with
+                                <varname>CapabilityBoundingSet=</varname>. Defaults
+                                to off.</para></listitem>
                         </varlistentry>
 
                         <varlistentry>
                                 <option>shared</option>,
                                 <option>slave</option> or
                                 <option>private</option>, which
-                                control whether the file system
-                                namespace set up for this unit's
-                                processes will receive or propagate
-                                new mounts. See
+                                control whether mounts in the file
+                                system namespace set up for this
+                                unit's processes will receive or
+                                propagate mounts or unmounts. See
                                 <citerefentry><refentrytitle>mount</refentrytitle><manvolnum>2</manvolnum></citerefentry>
-                                for details. Default to
-                                <option>shared</option>.</para></listitem>
+                                for details. Defaults to
+                                <option>shared</option>. Use
+                                <option>shared</option> to ensure that
+                                mounts and unmounts are propagated
+                                from the host to the container and
+                                vice versa. Use <option>slave</option>
+                                to run processes so that none of their
+                                mounts and unmounts will propagate to
+                                the host. Use <option>private</option>
+                                to also ensure that no mounts and
+                                unmounts from the host will propagate
+                                into the unit processes'
+                                namespace. Note that
+                                <option>slave</option> means that file
+                                systems mounted on the host might stay
+                                mounted continously in the unit's
+                                namespace, and thus keep the device
+                                busy. Note that the file system
+                                namespace related options
+                                (<varname>PrivateTmp=</varname>,
+                                <varname>PrivateDevices=</varname>,
+                                <varname>ReadOnlySystem=</varname>,
+                                <varname>ProtectedHome=</varname>,
+                                <varname>ReadOnlyDirectories=</varname>,
+                                <varname>InaccessibleDirectories=</varname>
+                                and
+                                <varname>ReadWriteDirectories=</varname>)
+                                require that mount and unmount
+                                propagation from the unit's file
+                                system namespace is disabled, and
+                                hence downgrade
+                                <option>shared</option> to
+                                <option>slave</option>.
+                                </para></listitem>
                         </varlistentry>
 
                         <varlistentry>
                                 for details.</para></listitem>
                         </varlistentry>
 
+                        <varlistentry>
+                                <term><varname>AppArmorProfile=</varname></term>
+
+                                <listitem><para>Takes a profile name as argument.
+                                The process executed by the unit will switch to
+                                this profile when started. Profiles must already
+                                be loaded in the kernel, or the unit will fail.
+                                This result in a non operation if AppArmor is not
+                                enabled. If prefixed by <literal>-</literal>, all errors
+                                will be ignored.
+                                </para></listitem>
+                        </varlistentry>
+
                         <varlistentry>
                                 <term><varname>IgnoreSIGPIPE=</varname></term>
 
                         <varlistentry>
                                 <term><varname>SystemCallFilter=</varname></term>
 
-                                <listitem><para>Takes a space-separated
-                                list of system call
+                                <listitem><para>Takes a
+                                space-separated list of system call
                                 names. If this setting is used, all
                                 system calls executed by the unit
                                 processes except for the listed ones
                                 the effect is inverted: only the
                                 listed system calls will result in
                                 immediate process termination
-                                (blacklisting). If this option is used,
+                                (blacklisting). If running in user
+                                mode and this option is used,
                                 <varname>NoNewPrivileges=yes</varname>
-                                is implied. This feature makes use of
-                                the Secure Computing Mode 2 interfaces
-                                of the kernel ('seccomp filtering')
-                                and is useful for enforcing a minimal
+                                is implied. This feature makes use of the
+                                Secure Computing Mode 2 interfaces of
+                                the kernel ('seccomp filtering') and
+                                is useful for enforcing a minimal
                                 sandboxing environment. Note that the
                                 <function>execve</function>,
                                 <function>rt_sigreturn</function>,
 
                                 <para>If you specify both types of
                                 this option (i.e. whitelisting and
-                                blacklisting) the first encountered
+                                blacklisting), the first encountered
                                 will take precedence and will dictate
                                 the default action (termination or
                                 approval of a system call). Then the
                                 add or delete the listed system calls
                                 from the set of the filtered system
                                 calls, depending of its type and the
-                                default action (e.g. You have started
+                                default action. (For example, if you have started
                                 with a whitelisting of
                                 <function>read</function> and
-                                <function>write</function> and right
+                                <function>write</function>, and right
                                 after it add a blacklisting of
                                 <function>write</function>, then
                                 <function>write</function> will be
-                                removed from the set).
+                                removed from the set.)
                                 </para></listitem>
-
-                                <para>Note that setting
-                                <varname>SystemCallFilter=</varname>
-                                implies a
-                                <varname>SystemCallArchitectures=</varname>
-                                setting of <literal>native</literal>
-                                (see below), unless that option is
-                                configured otherwise.</para>
                         </varlistentry>
 
                         <varlistentry>
                                 is triggered, instead of terminating
                                 the process immediately. Takes an
                                 error name such as
-                                <literal>EPERM</literal>,
-                                <literal>EACCES</literal> or
-                                <literal>EUCLEAN</literal>. When this
+                                <constant>EPERM</constant>,
+                                <constant>EACCES</constant> or
+                                <constant>EUCLEAN</constant>. When this
                                 setting is not used, or when the empty
-                                string is assigned the process will be
+                                string is assigned, the process will be
                                 terminated immediately when the filter
                                 is triggered.</para></listitem>
                         </varlistentry>
                                 identifiers to include in the system
                                 call filter. The known architecture
                                 identifiers are
-                                <literal>x86</literal>,
-                                <literal>x86-64</literal>,
-                                <literal>x32</literal>,
-                                <literal>arm</literal> as well as the
-                                special identifier
-                                <literal>native</literal>. Only system
-                                calls of the specified architectures
-                                will be permitted to processes of this
-                                unit. This is an effective way to
-                                disable compatibility with non-native
-                                architectures for processes, for
-                                example to prohibit execution of 32bit
-                                x86 binaries on 64bit x86-64
-                                systems. The special
-                                <literal>native</literal> identifier
+                                <constant>x86</constant>,
+                                <constant>x86-64</constant>,
+                                <constant>x32</constant>,
+                                <constant>arm</constant> as well as
+                                the special identifier
+                                <constant>native</constant>. Only
+                                system calls of the specified
+                                architectures will be permitted to
+                                processes of this unit. This is an
+                                effective way to disable compatibility
+                                with non-native architectures for
+                                processes, for example to prohibit
+                                execution of 32-bit x86 binaries on
+                                64-bit x86-64 systems. The special
+                                <constant>native</constant> identifier
                                 implicitly maps to the native
                                 architecture of the system (or more
                                 strictly: to the architecture the
-                                system manager is compiled for). Note
-                                that setting this option to a
-                                non-empty list implies that
-                                <literal>native</literal> is included
-                                too. By default this option is set to
-                                the empty list, i.e. no architecture
-                                system call filtering is applied. Note
-                                that configuring a system call filter
-                                with
-                                <varname>SystemCallFilter=</varname>
-                                (above) implies a
-                                <literal>native</literal> architecture
-                                list, unless configured
-                                otherwise.</para></listitem>
+                                system manager is compiled for). If
+                                running in user mode and this option
+                                is used,
+                                <varname>NoNewPrivileges=yes</varname>
+                                is implied. Note that setting this
+                                option to a non-empty list implies
+                                that <constant>native</constant> is
+                                included too. By default, this option
+                                is set to the empty list, i.e. no
+                                architecture system call filtering is
+                                applied.</para></listitem>
+                        </varlistentry>
+
+                        <varlistentry>
+                                <term><varname>RestrictAddressFamilies=</varname></term>
+
+                                <listitem><para>Restricts the set of
+                                socket address families accessible to
+                                the processes of this unit. Takes a
+                                space-separated list of address family
+                                names to whitelist, such as
+                                <constant>AF_UNIX</constant>,
+                                <constant>AF_INET</constant> or
+                                <constant>AF_INET6</constant>. When
+                                prefixed with <constant>~</constant>
+                                the listed address families will be
+                                applied as blacklist, otherwise as
+                                whitelist. Note that this restricts
+                                access to the
+                                <citerefentry><refentrytitle>socket</refentrytitle><manvolnum>2</manvolnum></citerefentry>
+                                system call only. Sockets passed into
+                                the process by other means (for
+                                example, by using socket activation
+                                with socket units, see
+                                <citerefentry><refentrytitle>systemd.socket</refentrytitle><manvolnum>5</manvolnum></citerefentry>)
+                                are unaffected. Also, sockets created
+                                with <function>socketpair()</function>
+                                (which creates connected AF_UNIX
+                                sockets only) are unaffected. Note
+                                that this option has no effect on
+                                32-bit x86 and is ignored (but works
+                                correctly on x86-64). If running in user
+                                mode and this option is used,
+                                <varname>NoNewPrivileges=yes</varname>
+                                is implied. By default, no
+                                restriction applies, all address
+                                families are accessible to
+                                processes. If assigned the empty
+                                string, any previous list changes are
+                                undone.</para>
+
+                                <para>Use this option to limit
+                                exposure of processes to remote
+                                systems, in particular via exotic
+                                network protocols. Note that in most
+                                cases, the local
+                                <constant>AF_UNIX</constant> address
+                                family should be included in the
+                                configured whitelist as it is
+                                frequently used for local
+                                communication, including for
+                                <citerefentry><refentrytitle>syslog</refentrytitle><manvolnum>2</manvolnum></citerefentry>
+                                logging.</para></listitem>
+                        </varlistentry>
+
+                        <varlistentry>
+                                <term><varname>Personality=</varname></term>
+
+                                <listitem><para>Controls which
+                                kernel architecture
+                                <citerefentry><refentrytitle>uname</refentrytitle><manvolnum>2</manvolnum></citerefentry>
+                                shall report, when invoked by unit
+                                processes. Takes one of
+                                <constant>x86</constant> and
+                                <constant>x86-64</constant>. This is
+                                useful when running 32-bit services on
+                                a 64-bit host system. If not specified,
+                                the personality is left unmodified and
+                                thus reflects the personality of the
+                                host system's
+                                kernel.</para></listitem>
+                        </varlistentry>
+
+                        <varlistentry>
+                                <term><varname>RuntimeDirectory=</varname></term>
+                                <term><varname>RuntimeDirectoryMode=</varname></term>
+
+                                <listitem><para>Takes a list of
+                                directory names. If set, one or more
+                                directories by the specified names
+                                will be created below
+                                <filename>/run</filename> (for system
+                                services) or below
+                                <varname>$XDG_RUNTIME_DIR</varname>
+                                (for user services) when the unit is
+                                started, and removed when the unit is
+                                stopped. The directories will have the
+                                access mode specified in
+                                <varname>RuntimeDirectoryMode=</varname>,
+                                and will be owned by the user and
+                                group specified in
+                                <varname>User=</varname> and
+                                <varname>Group=</varname>. Use this to
+                                manage one or more runtime directories
+                                of the unit and bind their lifetime to
+                                the daemon runtime. The specified
+                                directory names must be relative, and
+                                may not include a
+                                <literal>/</literal>, i.e. must refer
+                                to simple directories to create or
+                                remove. This is particularly useful
+                                for unprivileged daemons that cannot
+                                create runtime directories in
+                                <filename>/run</filename> due to lack
+                                of privileges, and to make sure the
+                                runtime directory is cleaned up
+                                automatically after use. For runtime
+                                directories that require more complex
+                                or different configuration or lifetime
+                                guarantees, please consider using
+                                <citerefentry><refentrytitle>tmpfiles.d</refentrytitle><manvolnum>5</manvolnum></citerefentry>.</para></listitem>
                         </varlistentry>
 
                 </variablelist>
                                 tty.</para></listitem>
                         </varlistentry>
 
+                        <varlistentry>
+                                <term><varname>$MAINPID</varname></term>
+
+                                <listitem><para>The PID of the units
+                                main process if it is known. This is
+                                only set for control processes as
+                                invoked by
+                                <varname>ExecReload=</varname> and
+                                similar.  </para></listitem>
+                        </varlistentry>
+
                         <varlistentry>
                                 <term><varname>$MANAGERPID</varname></term>
 
                 <varname>systemd.setenv=</varname> (see
                 <citerefentry><refentrytitle>systemd</refentrytitle><manvolnum>1</manvolnum></citerefentry>). Additional
                 variables may also be set through PAM,
-                c.f. <citerefentry><refentrytitle>pam_env</refentrytitle><manvolnum>8</manvolnum></citerefentry>.</para>
+                cf. <citerefentry><refentrytitle>pam_env</refentrytitle><manvolnum>8</manvolnum></citerefentry>.</para>
         </refsect1>
 
         <refsect1>
                           <citerefentry><refentrytitle>systemd.kill</refentrytitle><manvolnum>5</manvolnum></citerefentry>,
                           <citerefentry><refentrytitle>systemd.resource-control</refentrytitle><manvolnum>5</manvolnum></citerefentry>,
                           <citerefentry><refentrytitle>systemd.directives</refentrytitle><manvolnum>7</manvolnum></citerefentry>,
+                          <citerefentry><refentrytitle>tmpfiles.d</refentrytitle><manvolnum>5</manvolnum></citerefentry>,
                           <citerefentry><refentrytitle>exec</refentrytitle><manvolnum>3</manvolnum></citerefentry>
                   </para>
         </refsect1>