se-files.txt revision 13883:f44e21d3aaa7
1Copyright (c) 2015-Present Advanced Micro Devices, Inc. 2All rights reserved. 3 4Redistribution and use in source and binary forms, with or without 5modification, are permitted provided that the following conditions are 6met: redistributions of source code must retain the above copyright 7notice, this list of conditions and the following disclaimer; 8redistributions in binary form must reproduce the above copyright 9notice, this list of conditions and the following disclaimer in the 10documentation and/or other materials provided with the distribution; 11neither the name of the copyright holders nor the names of its 12contributors may be used to endorse or promote products derived from 13this software without specific prior written permission. 14 15THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 16"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT 17LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR 18A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT 19OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, 20SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT 21LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, 22DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY 23THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT 24(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 25OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 26 27Authors: Brandon Potter 28 29=============================================================================== 30 31This file exists to educate users and notify them that some filesystem open 32system calls may have been redirected by system call emulation mode 33(henceforth se-mode). 34 35To provide background, system calls to open files with SYS_OPEN (man 2 open) 36inside se-mode will resolve by pass-through to glibc calls (man 3 open) on the 37host machine. The host machine will open the file on behalf of the simulator. 38Subsequently, se-mode acts as a shim for file access to the opened file. By 39utilizing the host machine, se-mode gains quite a bit of utility without 40needing to implement an actual filesystem. 41 42A scenario for using normal files might be `/bin/cat $HOME/my_data_file` 43as the simulated application (and option). The simulator leverages the host 44file system to provide access to my_data_file in this case. Several things 45happen inside the simulator: 46 1) The cat command will open $HOME/my_data_file by invoking the open 47system call (SYS_OPEN). In se-mode, SYS_OPEN is trapped by the simulator and 48the syscall_emul.hh:openImpl implementation is provided as a drop-in 49replacement for what normally occurs inside a real operating system. 50 2) The openImpl code will pass through several path checks and realize 51that the file needs to be handled in the 'normal' case where se-mode utilizes 52the host filesystem. 53 3) The openImpl code will use the glibc open library call on 54$HOME/my_data_file after normalizing invocation options. 55 4) If the file successfully opens, se-mode will record the file descriptor 56returned from the glibc open and provide a translated file descriptor to the 57application. (If the glibc's file descriptor was passed back to the 58application, it would be noticable that the application runtime environment 59was wonky. The gem5.{opt,debug,fast} process needs to open files for its own 60purposes and the file descriptors for the simulated application perspective 61would appear out-of-order and arbitrary. They should appear in-order with the 62lowest available file-desciptor assigned on calls to SYS_OPEN. So, se-mode 63adds a level of indirection to resolve this problem.) 64 65However, there are files which users might not want to open on the host 66machine; providing file access and/or file visibility to the simulated 67application may not make sense in these cases. Historically, these files 68have been handled by os-specific code in se-mode. The os-specific 69implementation has been referred to as 'special files'. Examples of 70special file implementations include /proc/meminfo and /etc/passwd. (See 71src/kern/linux/linux.cc for more details.) 72 73A scenario for using special files might be running `/bin/cat /proc/meminfo` 74as the simulated application (and option). Several things will happen inside 75the simulator: 76 1) The cat command will open the /proc/meminfo file by invoking the open 77system call (SYS_OPEN). In se-mode, SYS_OPEN is trapped by the simulator and 78the syscall_emul.hh:openImpl implementation is provided as a drop-in 79replacement for what normally occurs inside a real operating system. 80 2) The openImpl code checks to see if /proc/meminfo matches a special 81file. When it notices the match, it invokes code to generate a replacement 82file rather than open the file on the host machine. (As it turns out, opening 83the host's version of /proc/meminfo will resolve to the gem5 executable which 84is probably not what the application intended.) 85 3) The generated file is provided a file descriptor (which itself has 86special handling to preserve the illusion that the application is not running 87inside a simulator under weird conditions). The file descriptor is passed 88back to the application and it can subsequently use the file descriptor to 89access the redirected /proc/meminfo file. 90 91Regarding special files, a subtle but important point is that these files 92are generated dynamically during simulation (in C++ code). Certain files, 93such as /proc/meminfo depend on the application state inside the simulator to 94have valid contents. With some files, you generally cannot anticipate what 95file contents should be before the application actually tries to inspect the 96contents. These types of files should all be handled using the special files 97method. 98 99As an aside, users might also want to restrict the contents of a file to 100prevent non-determinism in the simulation. (This is another case for special 101handling of files.) It can be annoying to try to generate statistics for your 102new hardware widget (which of course will improve performance by some 103non-trivial percentage) when variance in the statistics is caused by 104randomness of file contents. A specific example which comes to mind is 105reading the contents of /dev/random. Ideally, se-mode should introduce no 106non-determinism. However, that is difficult (if not impossible) to achieve in 107practice for every application thrown at the simulator. 108 109In addition to special files, there is another method to handle filesystem 110redirection. Instead of dynamically generating a file and providing it to 111the application, it is possible to pregenerate files on the host filesystem 112and redirect open calls to the pregenerated files. This is achieved by 113capturing the paths provided by the application SYS_OPEN and modifying the 114path before issuing the pass-through call to the host filesystem glibc open. 115The name for this feature is 'faux filesystem' (henceforth faux-fs). 116 117With faux-fs, users can add paths via command line (via --chroot) or by 118modifying their configuration file to use the RedirectPath class. These 119paths take the form of original_path-->set_of_modified_paths. For instance, 120/proc/cpuinfo might be redirected to /usr/local/gem5_fs/cpuinfo __OR__ 121/home/me/gem5_folder/cpuinfo __OR__ /nonsensical_name/foo_bar, etc.. The 122matching pattern and directory/file-structure is controlled by the user. The 123pattern match hits on the first available file which actually exists on the 124host machine. 125 126As another subtle point, the faux-fs handling is fixed at simulator 127configuration time. The path redirection becomes static after configuration 128and the Python generated files in simout/fs/.. also exist after configuration. 129The faux-fs mechanism is __NOT__ suitable for files such a /proc/meminfo 130since those types of files rely on runtime application characteristics. 131 132Currently, faux-fs is setup to create a few files on behalf of the average 133user. These files are all stuffed into the simout directory under a 'fs' 134folder. By default, the path is $gem5_dir/m5out/fs. These files are all 135hardcoded in the configuration since it is unlikely that an application wants 136to see the host version of the files. At the time of writing, the list can be 137viewed in configs/example/se.py by searching for RedirectPath. Most of 138the faux-fs Python generated files depend on simulator configuration (i.e. 139number of cores, caches, nodes, etc..). Sophisiticated runtimes might query 140these files for hardware information in certain applications (i.e. 141applications using MPI or ROCm since these runtimes utilize libnuma.so). 142 143Of note, dynamically executables will open shared object files in the same 144manner as normal files. It is possible and maybe enen preferential to utilize 145the faux-fs to create a platform independent way of running applications in 146se-mode. Users can stuff all the shared libraries into a folder and commit the 147folder as part of their repository state. The chroot option can be made to 148point to the shared library folder (for each library) and these libraries will 149be redirected away from host libraries. This can help to alleviate environment 150problems between machines. 151 152If there is any confusion on path redirection, the system call debug traces 153can be used to emit information regarding path redirection. 154