]> www.pilppa.org Git - linux-2.6-omap-h63xx.git/commitdiff
Merge branch 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus
authorLinus Torvalds <torvalds@linux-foundation.org>
Tue, 29 Jan 2008 11:48:03 +0000 (22:48 +1100)
committerLinus Torvalds <torvalds@linux-foundation.org>
Tue, 29 Jan 2008 11:48:03 +0000 (22:48 +1100)
* 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus: (68 commits)
  [MIPS] remove Documentation/mips/GT64120.README
  [MIPS] Malta: remaining bits of the board support code cleanup
  [MIPS] Malta: make the helper function static
  [MIPS] Malta: fix braces at single statement blocks
  [MIPS] Malta, Atlas: move an extern function declaration to the header file
  [MIPS] Malta: Use C89 style for comments
  [MIPS] Malta: else should follow close brace in malta_int.c
  [MIPS] Malta: remove a superfluous comment
  [MIPS] Malta: include <linux/cpu.h> instead of <asm/cpu.h>
  [MIPS] Malta, Atlas, Sead: remove an extern from .c files
  [MIPS] Malta: fix oversized lines in malta_int.c
  [MIPS] Malta: remove a dead function declaration
  [MIPS] Malta: use tabs not spaces
  [MIPS] Malta: set up the screen info in a separate function
  [MIPS] Malta: check the PCI clock frequency in a separate function
  [MIPS] Malta: use the KERN_ facility level in printk()
  [MIPS] Malta: use Linux kernel style for structure initialization
  [MIPS]: constify function pointer tables
  [MIPS] compat: handle argument endianess of sys32_(f)truncate64 with merge_64
  [MIPS] Cobalt 64-bits kernels can be safely unmarked experimental
  ...

161 files changed:
.gitignore
Documentation/filesystems/ext4.txt
Documentation/filesystems/proc.txt
Documentation/kbuild/kconfig-language.txt
Makefile
arch/alpha/kernel/vmlinux.lds.S
arch/alpha/lib/dec_and_lock.c
arch/arm/kernel/vmlinux.lds.S
arch/arm/mach-imx/Makefile
arch/arm/mach-netx/Makefile
arch/avr32/kernel/vmlinux.lds.S
arch/blackfin/kernel/vmlinux.lds.S
arch/cris/arch-v10/vmlinux.lds.S
arch/cris/arch-v32/boot/compressed/Makefile
arch/cris/arch-v32/vmlinux.lds.S
arch/frv/boot/Makefile
arch/frv/kernel/gdb-stub.c
arch/frv/kernel/vmlinux.lds.S
arch/h8300/kernel/vmlinux.lds.S
arch/ia64/kernel/vmlinux.lds.S
arch/m32r/kernel/vmlinux.lds.S
arch/m68k/kernel/vmlinux-std.lds
arch/m68k/kernel/vmlinux-sun3.lds
arch/m68knommu/kernel/vmlinux.lds.S
arch/mips/kernel/vmlinux.lds.S
arch/mips/tx4927/common/Makefile
arch/mips/tx4938/common/Makefile
arch/mips/tx4938/toshiba_rbtx4938/Makefile
arch/parisc/kernel/vmlinux.lds.S
arch/powerpc/boot/Makefile
arch/powerpc/kernel/sysfs.c
arch/powerpc/kernel/vmlinux.lds.S
arch/powerpc/oprofile/op_model_power4.c
arch/ppc/kernel/vmlinux.lds.S
arch/s390/kernel/vmlinux.lds.S
arch/sh/kernel/vmlinux_32.lds.S
arch/sh/kernel/vmlinux_64.lds.S
arch/sparc/kernel/vmlinux.lds.S
arch/sparc64/kernel/unaligned.c
arch/sparc64/kernel/vmlinux.lds.S
arch/um/include/init.h
arch/um/kernel/dyn.lds.S
arch/um/kernel/uml.lds.S
arch/v850/kernel/vmlinux.lds.S
arch/x86/kernel/vmlinux_32.lds.S
arch/x86/kernel/vmlinux_64.lds.S
arch/xtensa/kernel/vmlinux.lds.S
arch/xtensa/mm/Makefile
arch/xtensa/platform-iss/Makefile
drivers/base/power/Makefile
drivers/infiniband/hw/cxgb3/Makefile
drivers/rapidio/rio.h
fs/Kconfig
fs/afs/dir.c
fs/afs/inode.c
fs/buffer.c
fs/compat_ioctl.c
fs/ext2/super.c
fs/ext3/super.c
fs/ext4/Makefile
fs/ext4/balloc.c
fs/ext4/dir.c
fs/ext4/extents.c
fs/ext4/file.c
fs/ext4/group.h
fs/ext4/ialloc.c
fs/ext4/inode.c
fs/ext4/ioctl.c
fs/ext4/mballoc.c [new file with mode: 0644]
fs/ext4/migrate.c [new file with mode: 0644]
fs/ext4/namei.c
fs/ext4/resize.c
fs/ext4/super.c
fs/ext4/xattr.c
fs/inode.c
fs/jbd2/checkpoint.c
fs/jbd2/commit.c
fs/jbd2/journal.c
fs/jbd2/recovery.c
fs/jbd2/revoke.c
fs/jbd2/transaction.c
fs/ocfs2/cluster/sys.c
fs/read_write.c
fs/smbfs/Makefile
include/asm-arm/bitops.h
include/asm-avr32/setup.h
include/asm-generic/bitops/ext2-non-atomic.h
include/asm-generic/bitops/le.h
include/asm-generic/vmlinux.lds.h
include/asm-ia64/gcc_intrin.h
include/asm-m68k/bitops.h
include/asm-m68knommu/bitops.h
include/asm-powerpc/bitops.h
include/asm-s390/bitops.h
include/asm-sh/machvec.h
include/asm-sh/thread_info.h
include/asm-x86/thread_info_32.h
include/linux/Kbuild
include/linux/buffer_head.h
include/linux/compiler-gcc3.h
include/linux/compiler-gcc4.h
include/linux/compiler.h
include/linux/elfnote.h
include/linux/ext4_fs.h
include/linux/ext4_fs_extents.h
include/linux/ext4_fs_i.h
include/linux/ext4_fs_sb.h
include/linux/fs.h
include/linux/init.h
include/linux/jbd2.h
include/linux/module.h
include/linux/moduleparam.h
include/linux/pci.h
init/Kconfig
kernel/extable.c
kernel/kallsyms.c
kernel/module.c
kernel/params.c
lib/Kconfig.debug
lib/find_next_bit.c
scripts/Makefile.build
scripts/Makefile.lib
scripts/Makefile.modinst
scripts/Makefile.modpost
scripts/basic/docproc.c
scripts/decodecode
scripts/gcc-version.sh
scripts/genksyms/genksyms.c
scripts/kconfig/Makefile
scripts/kconfig/POTFILES.in
scripts/kconfig/conf.c
scripts/kconfig/confdata.c
scripts/kconfig/expr.c
scripts/kconfig/expr.h
scripts/kconfig/gconf.c
scripts/kconfig/lex.zconf.c_shipped
scripts/kconfig/lkc.h
scripts/kconfig/lxdialog/check-lxdialog.sh
scripts/kconfig/lxdialog/checklist.c
scripts/kconfig/lxdialog/dialog.h
scripts/kconfig/lxdialog/inputbox.c
scripts/kconfig/lxdialog/menubox.c
scripts/kconfig/lxdialog/textbox.c
scripts/kconfig/lxdialog/util.c
scripts/kconfig/lxdialog/yesno.c
scripts/kconfig/mconf.c
scripts/kconfig/menu.c
scripts/kconfig/qconf.cc
scripts/kconfig/symbol.c
scripts/kconfig/util.c
scripts/kconfig/zconf.gperf
scripts/kconfig/zconf.hash.c_shipped
scripts/kconfig/zconf.l
scripts/kernel-doc
scripts/mkmakefile
scripts/mod/modpost.c
scripts/mod/modpost.h
scripts/package/Makefile
scripts/package/buildtar
scripts/patch-kernel
scripts/setlocalversion

index 8d14531846b95bfa3564b58ccfb7913a034323b8..8363e48cdcdc67bad8834a08ec8dd57b8cb41635 100644 (file)
@@ -17,6 +17,7 @@
 *.i
 *.lst
 *.symtypes
+*.order
 
 #
 # Top-level generic files
index 6a4adcae9f9a699d624a0bb114f8e0d5f86161c5..560f88dc7090dadcfd7c2db4fc6821c64b8da3fe 100644 (file)
@@ -86,9 +86,21 @@ Alex is working on a new set of patches right now.
 When mounting an ext4 filesystem, the following option are accepted:
 (*) == default
 
-extents                        ext4 will use extents to address file data.  The
+extents                (*)     ext4 will use extents to address file data.  The
                        file system will no longer be mountable by ext3.
 
+noextents              ext4 will not use extents for newly created files
+
+journal_checksum       Enable checksumming of the journal transactions.
+                       This will allow the recovery code in e2fsck and the
+                       kernel to detect corruption in the kernel.  It is a
+                       compatible change and will be ignored by older kernels.
+
+journal_async_commit   Commit block can be written to disk without waiting
+                       for descriptor blocks. If enabled older kernels cannot
+                       mount the device. This will enable 'journal_checksum'
+                       internally.
+
 journal=update         Update the ext4 file system's journal to the current
                        format.
 
@@ -196,6 +208,12 @@ nobh                       (a) cache disk block mapping information
                        "nobh" option tries to avoid associating buffer
                        heads (supported only for "writeback" mode).
 
+mballoc                (*)     Use the multiple block allocator for block allocation
+nomballoc              disabled multiple block allocator for block allocation.
+stripe=n               Number of filesystem blocks that mballoc will try
+                       to use for allocation size and alignment. For RAID5/6
+                       systems this should be the number of data
+                       disks *  RAID chunk size in file system blocks.
 
 Data Mode
 ---------
index dec99455321fdf86018ca5b2b62936103fd73031..4413a2d4646fc4fd5305cb60e06a57bd5243bd16 100644 (file)
@@ -857,6 +857,45 @@ CPUs.
 The   "procs_blocked" line gives  the  number of  processes currently blocked,
 waiting for I/O to complete.
 
+1.9 Ext4 file system parameters
+------------------------------
+Ext4 file system have one directory per partition under /proc/fs/ext4/
+# ls /proc/fs/ext4/hdc/
+group_prealloc  max_to_scan  mb_groups  mb_history  min_to_scan  order2_req
+stats  stream_req
+
+mb_groups:
+This file gives the details of mutiblock allocator buddy cache of free blocks
+
+mb_history:
+Multiblock allocation history.
+
+stats:
+This file indicate whether the multiblock allocator should start collecting
+statistics. The statistics are shown during unmount
+
+group_prealloc:
+The multiblock allocator normalize the block allocation request to
+group_prealloc filesystem blocks if we don't have strip value set.
+The stripe value can be specified at mount time or during mke2fs.
+
+max_to_scan:
+How long multiblock allocator can look for a best extent (in found extents)
+
+min_to_scan:
+How long multiblock allocator  must look for a best extent
+
+order2_req:
+Multiblock allocator use  2^N search using buddies only for requests greater
+than or equal to order2_req. The request size is specfied in file system
+blocks. A value of 2 indicate only if the requests are greater than or equal
+to 4 blocks.
+
+stream_req:
+Files smaller than stream_req are served by the stream allocator, whose
+purpose is to pack requests as close each to other as possible to
+produce smooth I/O traffic. Avalue of 16 indicate that file smaller than 16
+filesystem block size will use group based preallocation.
 
 ------------------------------------------------------------------------------
 Summary
index 616043a6da99a0afd945e3cb5acb93b26bbb3f17..649cb87998900e235afe6b83c5e9f10077be2085 100644 (file)
@@ -24,7 +24,7 @@ visible if its parent entry is also visible.
 Menu entries
 ------------
 
-Most entries define a config option, all other entries help to organize
+Most entries define a config option; all other entries help to organize
 them. A single configuration option is defined like this:
 
 config MODVERSIONS
@@ -50,7 +50,7 @@ applicable everywhere (see syntax).
 
 - type definition: "bool"/"tristate"/"string"/"hex"/"int"
   Every config option must have a type. There are only two basic types:
-  tristate and string, the other types are based on these two. The type
+  tristate and string; the other types are based on these two. The type
   definition optionally accepts an input prompt, so these two examples
   are equivalent:
 
@@ -108,7 +108,7 @@ applicable everywhere (see syntax).
        equal to 'y' without visiting the dependencies. So abusing
        select you are able to select a symbol FOO even if FOO depends
        on BAR that is not set. In general use select only for
-       non-visible symbols (no promts anywhere) and for symbols with
+       non-visible symbols (no prompts anywhere) and for symbols with
        no dependencies. That will limit the usefulness but on the
        other hand avoid the illegal configurations all over. kconfig
        should one day warn about such things.
@@ -127,6 +127,27 @@ applicable everywhere (see syntax).
   used to help visually separate configuration logic from help within
   the file as an aid to developers.
 
+- misc options: "option" <symbol>[=<value>]
+  Various less common options can be defined via this option syntax,
+  which can modify the behaviour of the menu entry and its config
+  symbol. These options are currently possible:
+
+  - "defconfig_list"
+    This declares a list of default entries which can be used when
+    looking for the default configuration (which is used when the main
+    .config doesn't exists yet.)
+
+  - "modules"
+    This declares the symbol to be used as the MODULES symbol, which
+    enables the third modular state for all config symbols.
+
+  - "env"=<value>
+    This imports the environment variable into Kconfig. It behaves like
+    a default, except that the value comes from the environment, this
+    also means that the behaviour when mixing it with normal defaults is
+    undefined at this point. The symbol is currently not exported back
+    to the build environment (if this is desired, it can be done via
+    another symbol).
 
 Menu dependencies
 -----------------
@@ -162,9 +183,9 @@ An expression can have a value of 'n', 'm' or 'y' (or 0, 1, 2
 respectively for calculations). A menu entry becomes visible when it's
 expression evaluates to 'm' or 'y'.
 
-There are two types of symbols: constant and nonconstant symbols.
-Nonconstant symbols are the most common ones and are defined with the
-'config' statement. Nonconstant symbols consist entirely of alphanumeric
+There are two types of symbols: constant and non-constant symbols.
+Non-constant symbols are the most common ones and are defined with the
+'config' statement. Non-constant symbols consist entirely of alphanumeric
 characters or underscores.
 Constant symbols are only part of expressions. Constant symbols are
 always surrounded by single or double quotes. Within the quote, any
@@ -301,3 +322,81 @@ mainmenu:
 
 This sets the config program's title bar if the config program chooses
 to use it.
+
+
+Kconfig hints
+-------------
+This is a collection of Kconfig tips, most of which aren't obvious at
+first glance and most of which have become idioms in several Kconfig
+files.
+
+Adding common features and make the usage configurable
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+It is a common idiom to implement a feature/functionality that are
+relevant for some architectures but not all.
+The recommended way to do so is to use a config variable named HAVE_*
+that is defined in a common Kconfig file and selected by the relevant
+architectures.
+An example is the generic IOMAP functionality.
+
+We would in lib/Kconfig see:
+
+# Generic IOMAP is used to ...
+config HAVE_GENERIC_IOMAP
+
+config GENERIC_IOMAP
+       depends on HAVE_GENERIC_IOMAP && FOO
+
+And in lib/Makefile we would see:
+obj-$(CONFIG_GENERIC_IOMAP) += iomap.o
+
+For each architecture using the generic IOMAP functionality we would see:
+
+config X86
+       select ...
+       select HAVE_GENERIC_IOMAP
+       select ...
+
+Note: we use the existing config option and avoid creating a new
+config variable to select HAVE_GENERIC_IOMAP.
+
+Note: the use of the internal config variable HAVE_GENERIC_IOMAP, it is
+introduced to overcome the limitation of select which will force a
+config option to 'y' no matter the dependencies.
+The dependencies are moved to the symbol GENERIC_IOMAP and we avoid the
+situation where select forces a symbol equals to 'y'.
+
+Build as module only
+~~~~~~~~~~~~~~~~~~~~
+To restrict a component build to module-only, qualify its config symbol
+with "depends on m".  E.g.:
+
+config FOO
+       depends on BAR && m
+
+limits FOO to module (=m) or disabled (=n).
+
+
+Build limited by a third config symbol which may be =y or =m
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+A common idiom that we see (and sometimes have problems with) is this:
+
+When option C in B (module or subsystem) uses interfaces from A (module
+or subsystem), and both A and B are tristate (could be =y or =m if they
+were independent of each other, but they aren't), then we need to limit
+C such that it cannot be built statically if A is built as a loadable
+module.  (C already depends on B, so there is no dependency issue to
+take care of here.)
+
+If A is linked statically into the kernel image, C can be built
+statically or as loadable module(s).  However, if A is built as loadable
+module(s), then C must be restricted to loadable module(s) also.  This
+can be expressed in kconfig language as:
+
+config C
+       depends on A = y || A = B
+
+or for real examples, use this command in a kernel tree:
+
+$ find . -name Kconfig\* | xargs grep -ns "depends on.*=.*||.*=" | grep -v orig
+
index 6d419f67939c29f28a2b003272ce70d9881bc769..0f84c742ed0e1c8363d27121fb2ce563fb582bc7 100644 (file)
--- a/Makefile
+++ b/Makefile
@@ -520,6 +520,11 @@ KBUILD_CFLAGS      += -g
 KBUILD_AFLAGS  += -gdwarf-2
 endif
 
+# We trigger additional mismatches with less inlining
+ifdef CONFIG_DEBUG_SECTION_MISMATCH
+KBUILD_CFLAGS += $(call cc-option, -fno-inline-functions-called-once)
+endif
+
 # Force gcc to behave correct even for buggy distributions
 KBUILD_CFLAGS         += $(call cc-option, -fno-stack-protector)
 
@@ -793,7 +798,7 @@ define rule_vmlinux-modpost
 endef
 
 # vmlinux image - including updated kernel symbols
-vmlinux: $(vmlinux-lds) $(vmlinux-init) $(vmlinux-main) $(kallsyms.o) vmlinux.o FORCE
+vmlinux: $(vmlinux-lds) $(vmlinux-init) $(vmlinux-main) vmlinux.o $(kallsyms.o) FORCE
 ifdef CONFIG_HEADERS_CHECK
        $(Q)$(MAKE) -f $(srctree)/Makefile headers_check
 endif
@@ -804,7 +809,9 @@ endif
        $(call if_changed_rule,vmlinux__)
        $(Q)rm -f .old_version
 
-vmlinux.o: $(vmlinux-lds) $(vmlinux-init) $(vmlinux-main) $(kallsyms.o) FORCE
+# build vmlinux.o first to catch section mismatch errors early
+$(kallsyms.o): vmlinux.o
+vmlinux.o: $(vmlinux-lds) $(vmlinux-init) $(vmlinux-main) FORCE
        $(call if_changed_rule,vmlinux-modpost)
 
 # The actual objects are generated when descending, 
@@ -1021,9 +1028,14 @@ ifdef CONFIG_MODULES
 all: modules
 
 #      Build modules
+#
+#      A module can be listed more than once in obj-m resulting in
+#      duplicate lines in modules.order files.  Those are removed
+#      using awk while concatenating to the final file.
 
 PHONY += modules
 modules: $(vmlinux-dirs) $(if $(KBUILD_BUILTIN),vmlinux)
+       $(Q)$(AWK) '!x[$$0]++' $(vmlinux-dirs:%=$(objtree)/%/modules.order) > $(objtree)/modules.order
        @echo '  Building modules, stage 2.';
        $(Q)$(MAKE) -f $(srctree)/scripts/Makefile.modpost
 
@@ -1051,6 +1063,7 @@ _modinst_:
                rm -f $(MODLIB)/build ; \
                ln -s $(objtree) $(MODLIB)/build ; \
        fi
+       @cp -f $(objtree)/modules.order $(MODLIB)/
        $(Q)$(MAKE) -f $(srctree)/scripts/Makefile.modinst
 
 # This depmod is only for convenience to give the initial
@@ -1110,7 +1123,7 @@ clean: archclean $(clean-dirs)
        @find . $(RCS_FIND_IGNORE) \
                \( -name '*.[oas]' -o -name '*.ko' -o -name '.*.cmd' \
                -o -name '.*.d' -o -name '.*.tmp' -o -name '*.mod.c' \
-               -o -name '*.symtypes' \) \
+               -o -name '*.symtypes' -o -name 'modules.order' \) \
                -type f -print | xargs rm -f
 
 # mrproper - Delete all generated files, including .config
@@ -1175,7 +1188,7 @@ help:
        @echo  '  dir/            - Build all files in dir and below'
        @echo  '  dir/file.[ois]  - Build specified target only'
        @echo  '  dir/file.ko     - Build module including final link'
-       @echo  '  rpm             - Build a kernel as an RPM package'
+       @echo  '  prepare         - Set up for building external modules'
        @echo  '  tags/TAGS       - Generate tags file for editors'
        @echo  '  cscope          - Generate cscope index'
        @echo  '  kernelrelease   - Output the release version string'
@@ -1188,6 +1201,8 @@ help:
        @echo  'Static analysers'
        @echo  '  checkstack      - Generate a list of stack hogs'
        @echo  '  namespacecheck  - Name space analysis on compiled kernel'
+       @echo  '  versioncheck    - Sanity check on version.h usage'
+       @echo  '  includecheck    - Check for duplicate included header files'
        @echo  '  export_report   - List the usages of all exported symbols'
        @if [ -r $(srctree)/include/asm-$(SRCARCH)/Kbuild ]; then \
         echo  '  headers_check   - Sanity check on exported headers'; \
@@ -1371,6 +1386,7 @@ define xtags
        if $1 --version 2>&1 | grep -iq exuberant; then \
            $(all-sources) | xargs $1 -a \
                -I __initdata,__exitdata,__acquires,__releases \
+               -I __read_mostly,____cacheline_aligned,____cacheline_aligned_in_smp,____cacheline_internodealigned_in_smp \
                -I EXPORT_SYMBOL,EXPORT_SYMBOL_GPL \
                --extra=+f --c-kinds=+px \
                --regex-asm='/^ENTRY\(([^)]*)\).*/\1/'; \
@@ -1428,12 +1444,12 @@ tags: FORCE
 includecheck:
        find * $(RCS_FIND_IGNORE) \
                -name '*.[hcS]' -type f -print | sort \
-               | xargs $(PERL) -w scripts/checkincludes.pl
+               | xargs $(PERL) -w $(srctree)/scripts/checkincludes.pl
 
 versioncheck:
        find * $(RCS_FIND_IGNORE) \
                -name '*.[hcS]' -type f -print | sort \
-               | xargs $(PERL) -w scripts/checkversion.pl
+               | xargs $(PERL) -w $(srctree)/scripts/checkversion.pl
 
 namespacecheck:
        $(PERL) $(srctree)/scripts/namespace.pl
index 55c05b511f4c060d5a8cab8f7140d6f07563bd0f..f13249be17c50736e23686158854799164582aeb 100644 (file)
@@ -46,11 +46,11 @@ SECTIONS
        __init_begin = .;
        .init.text : {
                _sinittext = .;
-               *(.init.text)
+               INIT_TEXT
                _einittext = .;
        }
        .init.data : {
-               *(.init.data)
+               INIT_DATA
        }
 
        . = ALIGN(16);
@@ -136,8 +136,8 @@ SECTIONS
 
        /* Sections to be discarded */
        /DISCARD/ : {
-               *(.exit.text)
-               *(.exit.data)
+               EXIT_TEXT
+               EXIT_DATA
                *(.exitcall.exit)
        }
 
index 6ae2500a9d9e8dec21fd4fb05302ef1c6dae11ca..0f5520d2f45fc41d72d2d333fb291c50517bfa3c 100644 (file)
@@ -30,8 +30,7 @@ _atomic_dec_and_lock:                         \n\
        .previous                               \n\
        .end _atomic_dec_and_lock");
 
-static int __attribute_used__
-atomic_dec_and_lock_1(atomic_t *atomic, spinlock_t *lock)
+static int __used atomic_dec_and_lock_1(atomic_t *atomic, spinlock_t *lock)
 {
        /* Slow path */
        spin_lock(lock);
index 30f732c7fdb505b9762b6d12f4fbc6ebf5ebb5ae..4898bdcfe7dd638b5c4d9962f3932244a1249a38 100644 (file)
@@ -30,7 +30,7 @@ SECTIONS
        }
 
        .init : {                       /* Init code and data           */
-                       *(.init.text)
+                       INIT_TEXT
                _einittext = .;
                __proc_info_begin = .;
                        *(.proc.info.init)
@@ -70,15 +70,15 @@ SECTIONS
                __per_cpu_end = .;
 #ifndef CONFIG_XIP_KERNEL
                __init_begin = _stext;
-               *(.init.data)
+               INIT_DATA
                . = ALIGN(4096);
                __init_end = .;
 #endif
        }
 
        /DISCARD/ : {                   /* Exit code and data           */
-               *(.exit.text)
-               *(.exit.data)
+               EXIT_TEXT
+               EXIT_DATA
                *(.exitcall.exit)
 #ifndef CONFIG_MMU
                *(.fixup)
@@ -130,7 +130,7 @@ SECTIONS
 #ifdef CONFIG_XIP_KERNEL
                . = ALIGN(4096);
                __init_begin = .;
-               *(.init.data)
+               INIT_DATA
                . = ALIGN(4096);
                __init_end = .;
 #endif
index 02272aa36e90c69e433b465ab799f08574478a5d..88d5e61a2e13d6f85a90d77d8219c0f11474392d 100644 (file)
@@ -1,9 +1,6 @@
 #
 # Makefile for the linux kernel.
 #
-# Note! Dependencies are done automagically by 'make dep', which also
-# removes any old dependencies. DON'T put your own dependencies here
-# unless it's something special (ie not a .c file).
 
 # Object file lists.
 
index 18785ff37657f71a272d33e2b5057ec371ae1e3d..7ce4ba9eb242b1ec13855d918d625368d5d7220b 100644 (file)
@@ -1,9 +1,6 @@
 #
 # Makefile for the linux kernel.
 #
-# Note! Dependencies are done automagically by 'make dep', which also
-# removes any old dependencies. DON'T put your own dependencies here
-# unless it's something special (ie not a .c file).
 
 # Object file lists.
 
index 11f08e35a2eb36df6ee235549a0f9779f3120f91..481cfd40c0539e85297ed33366ec579041d95680 100644 (file)
@@ -27,19 +27,19 @@ SECTIONS
                __init_begin = .;
                        _sinittext = .;
                        *(.text.reset)
-                       *(.init.text)
+                       INIT_TEXT
                        /*
                         * .exit.text is discarded at runtime, not
                         * link time, to deal with references from
                         * __bug_table
                         */
-                       *(.exit.text)
+                       EXIT_TEXT
                        _einittext = .;
                . = ALIGN(4);
                __tagtable_begin = .;
                        *(.taglist.init)
                __tagtable_end = .;
-                       *(.init.data)
+                       INIT_DATA
                . = ALIGN(16);
                __setup_start = .;
                        *(.init.setup)
@@ -135,7 +135,7 @@ SECTIONS
         * thrown away, as cleanup code is never called unless it's a module.
         */
        /DISCARD/               : {
-               *(.exit.data)
+               EXIT_DATA
                *(.exitcall.exit)
        }
 
index 9b75bc83c71fac9847cd8e0f578183dbfb87115c..858722421b40db4f693dcdc33122e60b47b8801f 100644 (file)
@@ -91,13 +91,13 @@ SECTIONS
        {
                . = ALIGN(PAGE_SIZE);
                __sinittext = .;
-               *(.init.text)
+               INIT_TEXT
                __einittext = .;
        }
        .init.data :
        {
                . = ALIGN(16);
-               *(.init.data)
+               INIT_DATA
        }
        .init.setup :
        {
@@ -198,8 +198,8 @@ SECTIONS
 
        /DISCARD/ :
        {
-               *(.exit.text)
-               *(.exit.data)
+               EXIT_TEXT
+               EXIT_DATA
                *(.exitcall.exit)
        }
 }
index 97a7876ed6819061fec0a9be042f60be1d260d77..93c9f0ea286b8f23d5445f154f7e783a298b86c2 100644 (file)
@@ -57,10 +57,10 @@ SECTIONS
        __init_begin = .;
        .init.text : { 
                   _sinittext = .;
-                  *(.init.text)
+                  INIT_TEXT
                   _einittext = .;
        }
-       .init.data : { *(.init.data) }
+       .init.data : { INIT_DATA }
        . = ALIGN(16);
        __setup_start = .;
        .init.setup : { *(.init.setup) }
@@ -109,8 +109,8 @@ SECTIONS
 
        /* Sections to be discarded */
        /DISCARD/ : {
-               *(.text.exit)
-               *(.data.exit)
+               EXIT_TEXT
+               EXIT_DATA
                *(.exitcall.exit)
         }
 
index 9f77eda914ba7448a9f38d515876566d0f338f29..609692f9d5eb424bd906abdb9248701938b188be 100644 (file)
@@ -7,7 +7,7 @@
 target = $(target_compressed_dir)
 src    = $(src_compressed_dir)
 
-CC = gcc-cris -mlinux -march=v32 -I $(TOPDIR)/include
+CC = gcc-cris -mlinux -march=v32 $(LINUXINCLUDE)
 CFLAGS = -O2
 LD = gcc-cris -mlinux -march=v32 -nostdlib
 OBJCOPY = objcopy-cris
index b076c134c0bbd797d9c28940a651523abadf3a63..fead8c59ea63bfcf727091a774982b55b91d131a 100644 (file)
@@ -61,10 +61,10 @@ SECTIONS
        __init_begin = .;
        .init.text : {
                   _sinittext = .;
-                  *(.init.text)
+                  INIT_TEXT
                   _einittext = .;
        }
-       .init.data : { *(.init.data) }
+       .init.data : { INIT_DATA }
        . = ALIGN(16);
        __setup_start = .;
        .init.setup : { *(.init.setup) }
@@ -124,8 +124,8 @@ SECTIONS
 
        /* Sections to be discarded */
        /DISCARD/ : {
-               *(.text.exit)
-               *(.data.exit)
+               EXIT_TEXT
+               EXIT_DATA
                *(.exitcall.exit)
         }
 
index dc6f03824423c76f378c1cc384671e70e2b1503b..6ae3254da01976b6fdaa374f588c3c58a081058c 100644 (file)
@@ -10,7 +10,7 @@
 
 targets := Image zImage bootpImage
 
-SYSTEM =$(TOPDIR)/$(LINUX)
+SYSTEM =$(LINUX)
 
 ZTEXTADDR       = 0x02080000
 PARAMS_PHYS     = 0x0207c000
@@ -45,7 +45,7 @@ zImage:       $(CONFIGURE) compressed/$(LINUX)
 bootpImage: bootp/bootp
        $(OBJCOPY) -O binary -R .note -R .comment -S bootp/bootp $@
 
-compressed/$(LINUX): $(TOPDIR)/$(LINUX) dep
+compressed/$(LINUX): $(LINUX) dep
        @$(MAKE) -C compressed $(LINUX)
 
 bootp/bootp: zImage initrd
@@ -59,10 +59,10 @@ initrd:
 # installation
 #
 install: $(CONFIGURE) Image
-       sh ./install.sh $(KERNELRELEASE) Image $(TOPDIR)/System.map "$(INSTALL_PATH)"
+       sh ./install.sh $(KERNELRELEASE) Image System.map "$(INSTALL_PATH)"
 
 zinstall: $(CONFIGURE) zImage
-       sh ./install.sh $(KERNELRELEASE) zImage $(TOPDIR)/System.map "$(INSTALL_PATH)"
+       sh ./install.sh $(KERNELRELEASE) zImage System.map "$(INSTALL_PATH)"
 
 #
 # miscellany
index e89cad1192a99e5949e12f01020f76a7053671dd..48a0393e7cee1cd0e14fb15a7c20bbee9008fde3 100644 (file)
@@ -87,7 +87,7 @@
  *  Example:
  *    $ cd ~/linux
  *    $ make menuconfig <go to "Kernel Hacking" and turn on remote debugging>
- *    $ make dep; make vmlinux
+ *    $ make vmlinux
  *
  *  Step 3:
  *  Download the kernel to the remote target and start
index a17a81d58bf69386638f89b66a01548e22a4b61e..f42b328b1dd0475e1af868308cdc000c89d40942 100644 (file)
@@ -28,14 +28,14 @@ SECTIONS
   .init.text : {
        *(.text.head)
 #ifndef CONFIG_DEBUG_INFO
-       *(.init.text)
-       *(.exit.text)
-       *(.exit.data)
+       INIT_TEXT
+       EXIT_TEXT
+       EXIT_DATA
        *(.exitcall.exit)
 #endif
   }
   _einittext = .;
-  .init.data : { *(.init.data) }
+  .init.data : { INIT_DATA }
 
   . = ALIGN(8);
   __setup_start = .;
@@ -106,8 +106,8 @@ SECTIONS
        LOCK_TEXT
 #ifdef CONFIG_DEBUG_INFO
        *(
-       .init.text
-       .exit.text
+       INIT_TEXT
+       EXIT_TEXT
        .exitcall.exit
        )
 #endif
@@ -138,7 +138,7 @@ SECTIONS
   .data : {                    /* Data */
        DATA_DATA
        *(.data.*)
-       *(.exit.data)
+       EXIT_DATA
        CONSTRUCTORS
        }
 
index a2e72d495551cd370454556898a9bf687fa869aa..43a87b9085b6b449cb7fbfb64bfb5d715551185c 100644 (file)
@@ -110,9 +110,9 @@ SECTIONS
        . = ALIGN(0x4) ;
        ___init_begin = .;
        __sinittext = .; 
-               *(.init.text)
+               INIT_TEXT
        __einittext = .; 
-               *(.init.data)
+               INIT_DATA
        . = ALIGN(0x4) ;
        ___setup_start = .;
                *(.init.setup)
@@ -124,8 +124,8 @@ SECTIONS
        ___con_initcall_start = .;
                *(.con_initcall.init)
        ___con_initcall_end = .;
-               *(.exit.text)
-               *(.exit.data)
+               EXIT_TEXT
+               EXIT_DATA
 #if defined(CONFIG_BLK_DEV_INITRD)
                . = ALIGN(4);
        ___initramfs_start = .;
index 757e419ebcf81f0ea1da02916e82d7302dc8395f..80622acc95deea98c83f5e2ce97cc11cc4bc883a 100644 (file)
@@ -27,8 +27,8 @@ SECTIONS
 {
   /* Sections to be discarded */
   /DISCARD/ : {
-       *(.exit.text)
-       *(.exit.data)
+       EXIT_TEXT
+       EXIT_DATA
        *(.exitcall.exit)
        *(.IA_64.unwind.exit.text)
        *(.IA_64.unwind_info.exit.text)
@@ -119,12 +119,12 @@ SECTIONS
   .init.text : AT(ADDR(.init.text) - LOAD_OFFSET)
        {
          _sinittext = .;
-         *(.init.text)
+         INIT_TEXT
          _einittext = .;
        }
 
   .init.data : AT(ADDR(.init.data) - LOAD_OFFSET)
-       { *(.init.data) }
+       { INIT_DATA }
 
 #ifdef CONFIG_BLK_DEV_INITRD
   .init.ramfs : AT(ADDR(.init.ramfs) - LOAD_OFFSET)
index 942a8c7a44174b0c010d88c10694523ec93c248f..41b07854fcc609c5d12c2254e1e09a807f1e9417 100644 (file)
@@ -76,10 +76,10 @@ SECTIONS
   __init_begin = .;
   .init.text : {
        _sinittext = .;
-       *(.init.text)
+       INIT_TEXT
        _einittext = .;
   }
-  .init.data : { *(.init.data) }
+  .init.data : { INIT_DATA }
   . = ALIGN(16);
   __setup_start = .;
   .init.setup : { *(.init.setup) }
@@ -100,8 +100,8 @@ SECTIONS
   .altinstr_replacement : { *(.altinstr_replacement) }
   /* .exit.text is discard at runtime, not link time, to deal with references
      from .altinstructions and .eh_frame */
-  .exit.text : { *(.exit.text) }
-  .exit.data : { *(.exit.data) }
+  .exit.text : { EXIT_TEXT }
+  .exit.data : { EXIT_DATA }
 
 #ifdef CONFIG_BLK_DEV_INITRD
   . = ALIGN(4096);
@@ -124,8 +124,8 @@ SECTIONS
 
   /* Sections to be discarded */
   /DISCARD/ : {
-       *(.exit.text)
-       *(.exit.data)
+       EXIT_TEXT
+       EXIT_DATA
        *(.exitcall.exit)
        }
 
index 59fe285865ec050b958545a94007568d94c7f707..7537cc5e61592ac87c0af127850b214d6d56f977 100644 (file)
@@ -45,10 +45,10 @@ SECTIONS
   __init_begin = .;
   .init.text : {
        _sinittext = .;
-       *(.init.text)
+       INIT_TEXT
        _einittext = .;
   }
-  .init.data : { *(.init.data) }
+  .init.data : { INIT_DATA }
   . = ALIGN(16);
   __setup_start = .;
   .init.setup : { *(.init.setup) }
@@ -82,8 +82,8 @@ SECTIONS
 
   /* Sections to be discarded */
   /DISCARD/ : {
-       *(.exit.text)
-       *(.exit.data)
+       EXIT_TEXT
+       EXIT_DATA
        *(.exitcall.exit)
        }
 
index 4adffefb5c48c673b5f349a6c0142b4a10a26ada..cdc313e7c299a66c2d7700526f00bcd0463142cf 100644 (file)
@@ -38,10 +38,10 @@ SECTIONS
 __init_begin = .;
        .init.text : {
                _sinittext = .;
-               *(.init.text)
+               INIT_TEXT
                _einittext = .;
        }
-       .init.data : { *(.init.data) }
+       .init.data : { INIT_DATA }
        . = ALIGN(16);
        __setup_start = .;
        .init.setup : { *(.init.setup) }
@@ -77,8 +77,8 @@ __init_begin = .;
 
   /* Sections to be discarded */
   /DISCARD/ : {
-       *(.exit.text)
-       *(.exit.data)
+       EXIT_TEXT
+       EXIT_DATA
        *(.exitcall.exit)
        }
 
index 07a0055602f447b4466442c7bedb125d3a149e7f..b44edb08e21276f3dce89e8b938c4e6a5252596c 100644 (file)
@@ -143,9 +143,9 @@ SECTIONS {
                . = ALIGN(4096);
                __init_begin = .;
                _sinittext = .;
-               *(.init.text)
+               INIT_TEXT
                _einittext = .;
-               *(.init.data)
+               INIT_DATA
                . = ALIGN(16);
                __setup_start = .;
                *(.init.setup)
@@ -170,8 +170,8 @@ SECTIONS {
        } > INIT
 
        /DISCARD/ : {
-               *(.exit.text)
-               *(.exit.data)
+               EXIT_TEXT
+               EXIT_DATA
                *(.exitcall.exit)
        }
 
index 5fc2398bdb76673ac8ebf92cbd21db0a5d34b7b9..b5470ceb418b301847f385d72e306eff0b1bf6dc 100644 (file)
@@ -114,11 +114,11 @@ SECTIONS
        __init_begin = .;
        .init.text : {
                _sinittext = .;
-               *(.init.text)
+               INIT_TEXT
                _einittext = .;
        }
        .init.data : {
-               *(.init.data)
+               INIT_DATA
        }
        . = ALIGN(16);
        .init.setup : {
@@ -144,10 +144,10 @@ SECTIONS
         * references from .rodata
         */
        .exit.text : {
-               *(.exit.text)
+               EXIT_TEXT
        }
        .exit.data : {
-               *(.exit.data)
+               EXIT_DATA
        }
 #if defined(CONFIG_BLK_DEV_INITRD)
        . = ALIGN(_PAGE_SIZE);
index e4a5e464b376614139f0119ba8bbc39a0c57f757..a7fe76a64964db66d66ab960c7a96b5c0d1ca3f9 100644 (file)
@@ -1,10 +1,6 @@
 #
 # Makefile for common code for Toshiba TX4927 based systems
 #
-# Note! Dependencies are done automagically by 'make dep', which also
-# removes any old dependencies. DON'T put your own dependencies here
-# unless it's something special (ie not a .c file).
-#
 
 obj-y  += tx4927_prom.o tx4927_irq.o
 
index c5c6ceaa71ca55ebfbfb0f3063599991942fc627..56aa1ed1ee0c947cd8419a349d1476d6921ea728 100644 (file)
@@ -1,10 +1,6 @@
 #
 # Makefile for common code for Toshiba TX4927 based systems
 #
-# Note! Dependencies are done automagically by 'make dep', which also
-# removes any old dependencies. DON'T put your own dependencies here
-# unless it's something special (ie not a .c file).
-#
 
 obj-y  += prom.o irq.o
 obj-$(CONFIG_KGDB) += dbgio.o
index 675bb1c3e40c72cc4776b906c05dd2abb8840904..2316dd7dd1bdf91ba62e0034a4a3b322ef54698c 100644 (file)
@@ -1,10 +1,6 @@
 #
 # Makefile for common code for Toshiba TX4927 based systems
 #
-# Note! Dependencies are done automagically by 'make dep', which also
-# removes any old dependencies. DON'T put your own dependencies here
-# unless it's something special (ie not a .c file).
-#
 
 obj-y  += prom.o setup.o irq.o spi_eeprom.o
 
index 40d0ff9b81ab685acfd88dd5ca1a04158a25b1dd..50b4a3a25d0af70dbfb59134aae12940cbd03f12 100644 (file)
@@ -172,11 +172,11 @@ SECTIONS
        __init_begin = .;
        .init.text : { 
                _sinittext = .;
-               *(.init.text)
+               INIT_TEXT
                _einittext = .;
        }
        .init.data : {
-               *(.init.data)
+               INIT_DATA
        }
        . = ALIGN(16);
        .init.setup : {
@@ -215,10 +215,10 @@ SECTIONS
         *  from .altinstructions and .eh_frame
         */
        .exit.text : {
-               *(.exit.text)
+               EXIT_TEXT
        }
        .exit.data : {
-               *(.exit.data)
+               EXIT_DATA
        }
 #ifdef CONFIG_BLK_DEV_INITRD
        . = ALIGN(PAGE_SIZE);
index 18e32719d0ed4544202bae47cc9c1361e6116649..4b1d98b8135ef5db6da95654bf4f430e4704e4f3 100644 (file)
@@ -65,7 +65,7 @@ obj-wlib := $(addsuffix .o, $(basename $(addprefix $(obj)/, $(src-wlib))))
 obj-plat := $(addsuffix .o, $(basename $(addprefix $(obj)/, $(src-plat))))
 
 quiet_cmd_copy_zlib = COPY    $@
-      cmd_copy_zlib = sed "s@__attribute_used__@@;s@<linux/\([^>]*\).*@\"\1\"@" $< > $@
+      cmd_copy_zlib = sed "s@__used@@;s@<linux/\([^>]*\).*@\"\1\"@" $< > $@
 
 quiet_cmd_copy_zlibheader = COPY    $@
       cmd_copy_zlibheader = sed "s@<linux/\([^>]*\).*@\"\1\"@" $< > $@
index 25d9a96484ddeb914ff0d7fe63aa1b0c3baf7e8e..c8127f832df0f6d28d8efb0ee3a2097f8933dcce 100644 (file)
@@ -158,7 +158,7 @@ static ssize_t show_##NAME(struct sys_device *dev, char *buf) \
        unsigned long val = run_on_cpu(cpu->sysdev.id, read_##NAME, 0); \
        return sprintf(buf, "%lx\n", val); \
 } \
-static ssize_t __attribute_used__ \
+static ssize_t __used \
        store_##NAME(struct sys_device *dev, const char *buf, size_t count) \
 { \
        struct cpu *cpu = container_of(dev, struct cpu, sysdev); \
index f66fa5d966b0d75ebecf99d53ac4723e8f2622ce..0afb9e31d2a008a79f1b5a4d2ff70d8dc798a458 100644 (file)
@@ -23,7 +23,7 @@ SECTIONS
        /* Sections to be discarded. */
        /DISCARD/ : {
        *(.exitcall.exit)
-       *(.exit.data)
+       EXIT_DATA
        }
 
        . = KERNELBASE;
@@ -76,17 +76,19 @@ SECTIONS
 
        .init.text : {
                _sinittext = .;
-               *(.init.text)
+               INIT_TEXT
                _einittext = .;
        }
 
        /* .exit.text is discarded at runtime, not link time,
         * to deal with references from __bug_table
         */
-       .exit.text : { *(.exit.text) }
+       .exit.text : {
+               EXIT_TEXT
+       }
 
        .init.data : {
-               *(.init.data);
+               INIT_DATA
                __vtop_table_begin = .;
                *(.vtop_fixup);
                __vtop_table_end = .;
index cddc250a6a5cf13325556175cbbed3473c61ec02..446a8bbb847b0f2dcd679ac74eaece531cee4476 100644 (file)
@@ -172,15 +172,15 @@ static void power4_stop(void)
 }
 
 /* Fake functions used by canonicalize_pc */
-static void __attribute_used__ hypervisor_bucket(void)
+static void __used hypervisor_bucket(void)
 {
 }
 
-static void __attribute_used__ rtas_bucket(void)
+static void __used rtas_bucket(void)
 {
 }
 
-static void __attribute_used__ kernel_unknown_bucket(void)
+static void __used kernel_unknown_bucket(void)
 {
 }
 
index 98c1212674f6e76c2953c5a9a8a5344bb6b94ca1..52b64fcbdfc50d493a76bef799400e2365cdd15e 100644 (file)
@@ -97,14 +97,14 @@ SECTIONS
   __init_begin = .;
   .init.text : {
        _sinittext = .;
-       *(.init.text)
+       INIT_TEXT
        _einittext = .;
   }
   /* .exit.text is discarded at runtime, not link time,
      to deal with references from __bug_table */
-  .exit.text : { *(.exit.text) }
+  .exit.text : { EXIT_TEXT }
   .init.data : {
-    *(.init.data);
+    INIT_DATA
     __vtop_table_begin = .;
     *(.vtop_fixup);
     __vtop_table_end = .;
@@ -164,6 +164,6 @@ SECTIONS
   /* Sections to be discarded. */
   /DISCARD/ : {
     *(.exitcall.exit)
-    *(.exit.data)
+    EXIT_DATA
   }
 }
index 936159199346520387f895c7fec5f6ffcba33345..7d43c3cd3ef35880b5b0fd5a5fbb08b33a51e605 100644 (file)
@@ -97,7 +97,7 @@ SECTIONS
        __init_begin = .;
        .init.text : {
                _sinittext = .;
-               *(.init.text)
+               INIT_TEXT
                _einittext = .;
        }
        /*
@@ -105,11 +105,11 @@ SECTIONS
         * to deal with references from __bug_table
        */
        .exit.text : {
-               *(.exit.text)
+               EXIT_TEXT
        }
 
        .init.data : {
-               *(.init.data)
+               INIT_DATA
        }
        . = ALIGN(0x100);
        .init.setup : {
@@ -156,7 +156,7 @@ SECTIONS
 
        /* Sections to be discarded */
        /DISCARD/ : {
-               *(.exit.data)
+               EXIT_DATA
                *(.exitcall.exit)
        }
 
index d549fac6d3e7dc5cb1692122ad890d670c1a4b9a..c7113786ecd4c1158c461128880cfef650a31857 100644 (file)
@@ -84,9 +84,9 @@ SECTIONS
        . = ALIGN(PAGE_SIZE);           /* Init code and data */
        __init_begin = .;
        _sinittext = .;
-       .init.text : { *(.init.text) }
+       .init.text : { INIT_TEXT }
        _einittext = .;
-       .init.data : { *(.init.data) }
+       .init.data : { INIT_DATA }
 
        . = ALIGN(16);
        __setup_start = .;
@@ -122,8 +122,8 @@ SECTIONS
         * .exit.text is discarded at runtime, not link time, to deal with
         * references from __bug_table
         */
-       .exit.text : { *(.exit.text) }
-       .exit.data : { *(.exit.data) }
+       .exit.text : { EXIT_TEXT }
+       .exit.data : { EXIT_DATA }
 
        . = ALIGN(PAGE_SIZE);
        .bss : {
index 2fd0f74014846ac5c920ea58fcbc3c124a6ae66a..3f1bd6392bb343c8699c309c5d98da640ec839d3 100644 (file)
@@ -96,9 +96,9 @@ SECTIONS
        . = ALIGN(PAGE_SIZE);           /* Init code and data */
        __init_begin = .;
        _sinittext = .;
-       .init.text : C_PHYS(.init.text) { *(.init.text) }
+       .init.text : C_PHYS(.init.text) { INIT_TEXT }
        _einittext = .;
-       .init.data : C_PHYS(.init.data) { *(.init.data) }
+       .init.data : C_PHYS(.init.data) { INIT_DATA }
        . = ALIGN(L1_CACHE_BYTES);      /* Better if Cache Line aligned */
        __setup_start = .;
        .init.setup : C_PHYS(.init.setup) { *(.init.setup) }
@@ -134,8 +134,8 @@ SECTIONS
         * .exit.text is discarded at runtime, not link time, to deal with
         * references from __bug_table
         */
-       .exit.text : C_PHYS(.exit.text) { *(.exit.text) }
-       .exit.data : C_PHYS(.exit.data) { *(.exit.data) }
+       .exit.text : C_PHYS(.exit.text) { EXIT_TEXT }
+       .exit.data : C_PHYS(.exit.data) { EXIT_DATA }
 
        . = ALIGN(PAGE_SIZE);
        .bss : C_PHYS(.bss) {
index a8b4200f9cc379e0bc6319e6f2c786ae3df2f4b6..216147d6e61f6882d8050c161d356c4bd6f15bec 100644 (file)
@@ -48,12 +48,12 @@ SECTIONS
        __init_begin = .;
        .init.text : {
                _sinittext = .;
-               *(.init.text)
+               INIT_TEXT
                _einittext = .;
        }
        __init_text_end = .;
        .init.data : {
-               *(.init.data)
+               INIT_DATA
        }
        . = ALIGN(16);
        .init.setup : {
@@ -102,8 +102,8 @@ SECTIONS
        _end = . ;
        PROVIDE (end = .);
        /DISCARD/ : {
-               *(.exit.text)
-               *(.exit.data)
+               EXIT_TEXT
+               EXIT_DATA
                *(.exitcall.exit)
        }
 
index 953be816fa2564b6692a7f3a25d884337958f017..dc7bf1b6321ceb89fbe658cce07f08f7df513808 100644 (file)
@@ -175,7 +175,7 @@ unsigned long compute_effective_address(struct pt_regs *regs,
 }
 
 /* This is just to make gcc think die_if_kernel does return... */
-static void __attribute_used__ unaligned_panic(char *str, struct pt_regs *regs)
+static void __used unaligned_panic(char *str, struct pt_regs *regs)
 {
        die_if_kernel(str, regs);
 }
index 9fcd503bc04ad5574488379dc43787cf4cabfec3..01f809617e5e708df97764794de91b3a6bad8c81 100644 (file)
@@ -56,11 +56,11 @@ SECTIONS
        .init.text : {
                __init_begin = .;
                _sinittext = .;
-               *(.init.text)
+               INIT_TEXT
                _einittext = .;
        }
        .init.data : {
-               *(.init.data)
+               INIT_DATA
        }
        . = ALIGN(16);
        .init.setup : {
@@ -137,8 +137,8 @@ SECTIONS
        PROVIDE (end = .);
 
        /DISCARD/ : {
-               *(.exit.text)
-               *(.exit.data)
+               EXIT_TEXT
+               EXIT_DATA
                *(.exitcall.exit)
        }
 
index d4de7c0120ced888209122f47424d9754c1ed50f..cebc6cae91903442235e8f4b4e782b71a5feb5c9 100644 (file)
@@ -42,15 +42,15 @@ typedef void (*exitcall_t)(void);
 
 /* These are for everybody (although not all archs will actually
    discard it in modules) */
-#define __init         __attribute__ ((__section__ (".init.text")))
-#define __initdata     __attribute__ ((__section__ (".init.data")))
-#define __exitdata     __attribute__ ((__section__(".exit.data")))
-#define __exit_call    __attribute_used__ __attribute__ ((__section__ (".exitcall.exit")))
+#define __init         __section(.init.text)
+#define __initdata     __section(.init.data)
+#define __exitdata     __section(.exit.data)
+#define __exit_call    __used __section(.exitcall.exit)
 
 #ifdef MODULE
-#define __exit         __attribute__ ((__section__(".exit.text")))
+#define __exit         __section(.exit.text)
 #else
-#define __exit         __attribute_used__ __attribute__ ((__section__(".exit.text")))
+#define __exit         __used __section(.exit.text)
 #endif
 
 #endif
@@ -103,16 +103,16 @@ extern struct uml_param __uml_setup_start, __uml_setup_end;
  * Mark functions and data as being only used at initialization
  * or exit time.
  */
-#define __uml_init_setup       __attribute_used__ __attribute__ ((__section__ (".uml.setup.init")))
-#define __uml_setup_help       __attribute_used__ __attribute__ ((__section__ (".uml.help.init")))
-#define __uml_init_call                __attribute_used__ __attribute__ ((__section__ (".uml.initcall.init")))
-#define __uml_postsetup_call   __attribute_used__ __attribute__ ((__section__ (".uml.postsetup.init")))
-#define __uml_exit_call                __attribute_used__ __attribute__ ((__section__ (".uml.exitcall.exit")))
+#define __uml_init_setup       __used __section(.uml.setup.init)
+#define __uml_setup_help       __used __section(.uml.help.init)
+#define __uml_init_call                __used __section(.uml.initcall.init)
+#define __uml_postsetup_call   __used __section(.uml.postsetup.init)
+#define __uml_exit_call                __used __section(.uml.exitcall.exit)
 
 #ifndef __KERNEL__
 
 #define __define_initcall(level,fn) \
-       static initcall_t __initcall_##fn __attribute_used__ \
+       static initcall_t __initcall_##fn __used \
        __attribute__((__section__(".initcall" level ".init"))) = fn
 
 /* Userspace initcalls shouldn't depend on anything in the kernel, so we'll
@@ -122,7 +122,7 @@ extern struct uml_param __uml_setup_start, __uml_setup_end;
 
 #define __exitcall(fn) static exitcall_t __exitcall_##fn __exit_call = fn
 
-#define __init_call    __attribute_used__ __attribute__ ((__section__ (".initcall.init")))
+#define __init_call    __used __section(.initcall.init)
 
 #endif
 
index 3866f4960f04519de02a1ab4c6cdff3452d29ffc..26090b7f323ecf3806f0a86668250ebad253bc6f 100644 (file)
@@ -17,7 +17,7 @@ SECTIONS
   __init_begin = .;
   .init.text : {
        _sinittext = .;
-       *(.init.text)
+       INIT_TEXT
        _einittext = .;
   }
 
@@ -84,7 +84,7 @@ SECTIONS
 
   #include "asm/common.lds.S"
 
-  init.data : { *(.init.data) }
+  init.data : { INIT_DATA }
 
   /* Ensure the __preinit_array_start label is properly aligned.  We
      could instead move the label definition inside the section, but
index 13df191e2b41e6b66fe6c3668b5daa689dada71a..5828c1d54505f7fa0603c86a1b819fd52e528e7a 100644 (file)
@@ -23,7 +23,7 @@ SECTIONS
   __init_begin = .;
   .init.text : {
        _sinittext = .;
-       *(.init.text)
+       INIT_TEXT
        _einittext = .;
   }
   . = ALIGN(4096);
@@ -48,7 +48,7 @@ SECTIONS
 
   #include "asm/common.lds.S"
 
-  init.data : { *(init.data) }
+  init.data : { INIT_DATA }
   .data    :
   {
     . = ALIGN(KERNEL_STACK_SIZE);              /* init_task */
index 6172599b4ce2aab5e3fe3551cb351b44ecd5ff0c..d08cd1d27f27a77cfdf5375bd14d54534acf27ea 100644 (file)
 #define DATA_CONTENTS                                                        \
                __sdata = . ;                                                 \
                DATA_DATA                                                     \
-                       *(.exit.data)   /* 2.5 convention */                  \
+                       EXIT_DATA       /* 2.5 convention */                  \
                        *(.data.exit)   /* 2.4 convention */                  \
                . = ALIGN (16) ;                                              \
                *(.data.cacheline_aligned)                                    \
                . = ALIGN (4096) ;                                            \
                __init_start = . ;                                            \
                        __sinittext = .;                                      \
-                       *(.init.text)   /* 2.5 convention */                  \
+                       INIT_TEXT       /* 2.5 convention */                  \
                        __einittext = .;                                      \
-                       *(.init.data)                                         \
+                       INIT_DATA                                             \
                        *(.text.init)   /* 2.4 convention */                  \
                        *(.data.init)                                         \
                INITCALL_CONTENTS                                             \
 #define ROMK_INIT_RAM_CONTENTS                                               \
                . = ALIGN (4096) ;                                            \
                __init_start = . ;                                            \
-                       *(.init.data)   /* 2.5 convention */                  \
+                       INIT_DATA       /* 2.5 convention */                  \
                        *(.data.init)   /* 2.4 convention */                  \
                __init_end = . ;                                              \
                . = ALIGN (4096) ;
    should go into ROM.  */     
 #define ROMK_INIT_ROM_CONTENTS                                               \
                        _sinittext = .;                                       \
-                       *(.init.text)   /* 2.5 convention */                  \
+                       INIT_TEXT       /* 2.5 convention */                  \
                        _einittext = .;                                       \
                        *(.text.init)   /* 2.4 convention */                  \
                INITCALL_CONTENTS                                             \
index 7d72cce0052946c93446eaa279c0d476da80af51..84c913f38f980b621e6549ccca86f3a01a8666af 100644 (file)
@@ -131,10 +131,12 @@ SECTIONS
   .init.text : AT(ADDR(.init.text) - LOAD_OFFSET) {
        __init_begin = .;
        _sinittext = .;
-       *(.init.text)
+       INIT_TEXT
        _einittext = .;
   }
-  .init.data : AT(ADDR(.init.data) - LOAD_OFFSET) { *(.init.data) }
+  .init.data : AT(ADDR(.init.data) - LOAD_OFFSET) {
+       INIT_DATA
+  }
   . = ALIGN(16);
   .init.setup : AT(ADDR(.init.setup) - LOAD_OFFSET) {
        __setup_start = .;
@@ -169,8 +171,12 @@ SECTIONS
   }
   /* .exit.text is discard at runtime, not link time, to deal with references
      from .altinstructions and .eh_frame */
-  .exit.text : AT(ADDR(.exit.text) - LOAD_OFFSET) { *(.exit.text) }
-  .exit.data : AT(ADDR(.exit.data) - LOAD_OFFSET) { *(.exit.data) }
+  .exit.text : AT(ADDR(.exit.text) - LOAD_OFFSET) {
+       EXIT_TEXT
+  }
+  .exit.data : AT(ADDR(.exit.data) - LOAD_OFFSET) {
+       EXIT_DATA
+  }
 #if defined(CONFIG_BLK_DEV_INITRD)
   . = ALIGN(4096);
   .init.ramfs : AT(ADDR(.init.ramfs) - LOAD_OFFSET) {
index ba8ea97abd219359d6e7c68541e375dbe97f5c7a..ea5386944e67e75da4e88712272ed251cad93761 100644 (file)
@@ -155,12 +155,15 @@ SECTIONS
   __init_begin = .;
   .init.text : AT(ADDR(.init.text) - LOAD_OFFSET) {
        _sinittext = .;
-       *(.init.text)
+       INIT_TEXT
        _einittext = .;
   }
-  __initdata_begin = .;
-  .init.data : AT(ADDR(.init.data) - LOAD_OFFSET) { *(.init.data) }
-  __initdata_end = .;
+  .init.data : AT(ADDR(.init.data) - LOAD_OFFSET) {
+       __initdata_begin = .;
+       INIT_DATA
+       __initdata_end = .;
+   }
+
   . = ALIGN(16);
   __setup_start = .;
   .init.setup : AT(ADDR(.init.setup) - LOAD_OFFSET) { *(.init.setup) }
@@ -187,8 +190,12 @@ SECTIONS
   }
   /* .exit.text is discard at runtime, not link time, to deal with references
      from .altinstructions and .eh_frame */
-  .exit.text : AT(ADDR(.exit.text) - LOAD_OFFSET) { *(.exit.text) }
-  .exit.data : AT(ADDR(.exit.data) - LOAD_OFFSET) { *(.exit.data) }
+  .exit.text : AT(ADDR(.exit.text) - LOAD_OFFSET) {
+       EXIT_TEXT
+  }
+  .exit.data : AT(ADDR(.exit.data) - LOAD_OFFSET) {
+       EXIT_DATA
+  }
 
 /* vdso blob that is mapped into user space */
   vdso_start = . ;
index ac4ed52034dbfb44c00564e9af7020750f0a75aa..7d0f55a4982d41ba997e4ec557807d4a768fde23 100644 (file)
@@ -136,13 +136,13 @@ SECTIONS
   __init_begin = .;
   .init.text : {
        _sinittext = .;
-       *(.init.literal) *(.init.text)
+       *(.init.literal) INIT_TEXT
        _einittext = .;
   }
 
   .init.data :
   {
-    *(.init.data)
+    INIT_DATA
     . = ALIGN(0x4);
     __tagtable_begin = .;
     *(.taglist)
@@ -278,8 +278,9 @@ SECTIONS
   /* Sections to be discarded */
   /DISCARD/ :
   {
-       *(.exit.literal .exit.text)
-       *(.exit.data)
+       *(.exit.literal)
+       EXIT_TEXT
+       EXIT_DATA
         *(.exitcall.exit)
   }
 
index 10aec22a8f98b2478fcc7c8d01b8b55bce88c035..64e304a2f884e4395fdff94d46da28d717a6c79b 100644 (file)
@@ -1,9 +1,5 @@
 #
 # Makefile for the Linux/Xtensa-specific parts of the memory manager.
 #
-# Note! Dependencies are done automagically by 'make dep', which also
-# removes any old dependencies. DON'T put your own dependencies here
-# unless it's something special (ie not a .c file).
-#
 
 obj-y   := init.o fault.o tlb.o misc.o cache.o
index 5b394e9620e5c70a6b104bbc818c23b039711eca..af96e314d71fa7a228abcb371ff862246410f9c9 100644 (file)
@@ -3,11 +3,6 @@
 # Makefile for the Xtensa Instruction Set Simulator (ISS)
 # "prom monitor" library routines under Linux.
 #
-# Note! Dependencies are done automagically by 'make dep', which also
-# removes any old dependencies. DON'T put your own dependencies here
-# unless it's something special (ie not a .c file).
-#
-# Note 2! The CFLAGS definitions are in the main makefile...
 
 obj-y                  = io.o console.o setup.o network.o
 
index 06a86fe6a78d77a1c2d0a3b7e84003456c430933..de28dfd3b96c6ff4db97ed85320f017a8542e799 100644 (file)
@@ -2,9 +2,5 @@ obj-$(CONFIG_PM)        += sysfs.o
 obj-$(CONFIG_PM_SLEEP) += main.o
 obj-$(CONFIG_PM_TRACE) += trace.o
 
-ifeq ($(CONFIG_DEBUG_DRIVER),y)
-EXTRA_CFLAGS += -DDEBUG
-endif
-ifeq ($(CONFIG_PM_VERBOSE),y)
-EXTRA_CFLAGS += -DDEBUG
-endif
+ccflags-$(CONFIG_DEBUG_DRIVER) := -DDEBUG
+ccflags-$(CONFIG_PM_VERBOSE)   += -DDEBUG
index 36b98989b15ecaf79274a547a0733a42859fb0d5..7e7b5a66f042ad096b9960cdb6e6ebf66e759d6a 100644 (file)
@@ -1,5 +1,4 @@
-EXTRA_CFLAGS += -I$(TOPDIR)/drivers/net/cxgb3 \
-               -I$(TOPDIR)/drivers/infiniband/hw/cxgb3/core
+EXTRA_CFLAGS += -Idrivers/net/cxgb3
 
 obj-$(CONFIG_INFINIBAND_CXGB3) += iw_cxgb3.o
 
index b242cee656e77e5bf1a57069e15db0517b389ee2..80e3f03b5041e0348f766b14caca2fa601efa82f 100644 (file)
@@ -31,8 +31,8 @@ extern struct rio_route_ops __end_rio_route_ops[];
 
 /* Helpers internal to the RIO core code */
 #define DECLARE_RIO_ROUTE_SECTION(section, vid, did, add_hook, get_hook)  \
-        static struct rio_route_ops __rio_route_ops __attribute_used__   \
-               __attribute__((__section__(#section))) = { vid, did, add_hook, get_hook };
+       static struct rio_route_ops __rio_route_ops __used   \
+       __section(section)= { vid, did, add_hook, get_hook };
 
 /**
  * DECLARE_RIO_ROUTE_OPS - Registers switch routing operations
index 9656139d2e990ce8c11e6ed34a2b61182da332ff..219ec06a8c7e2cd057a28898c4827776a1abf0c3 100644 (file)
@@ -236,6 +236,7 @@ config JBD_DEBUG
 
 config JBD2
        tristate
+       select CRC32
        help
          This is a generic journaling layer for block devices that support
          both 32-bit and 64-bit block numbers.  It is currently used by
index 33fe39ad4e0327a91e925c10ad54629c5f7c763a..0cc3597c11971380716d10083eee9fa7660f82dd 100644 (file)
@@ -546,11 +546,11 @@ static struct dentry *afs_lookup(struct inode *dir, struct dentry *dentry,
        dentry->d_op = &afs_fs_dentry_operations;
 
        d_add(dentry, inode);
-       _leave(" = 0 { vn=%u u=%u } -> { ino=%lu v=%lu }",
+       _leave(" = 0 { vn=%u u=%u } -> { ino=%lu v=%llu }",
               fid.vnode,
               fid.unique,
               dentry->d_inode->i_ino,
-              dentry->d_inode->i_version);
+              (unsigned long long)dentry->d_inode->i_version);
 
        return NULL;
 }
@@ -630,9 +630,10 @@ static int afs_d_revalidate(struct dentry *dentry, struct nameidata *nd)
                 * been deleted and replaced, and the original vnode ID has
                 * been reused */
                if (fid.unique != vnode->fid.unique) {
-                       _debug("%s: file deleted (uq %u -> %u I:%lu)",
+                       _debug("%s: file deleted (uq %u -> %u I:%llu)",
                               dentry->d_name.name, fid.unique,
-                              vnode->fid.unique, dentry->d_inode->i_version);
+                              vnode->fid.unique,
+                              (unsigned long long)dentry->d_inode->i_version);
                        spin_lock(&vnode->lock);
                        set_bit(AFS_VNODE_DELETED, &vnode->flags);
                        spin_unlock(&vnode->lock);
index d196840127c6bdf7174defa94fc7184c25e2ad92..84750c8e9f95be1b07ba30ed7ab4778cf1ae9599 100644 (file)
@@ -301,7 +301,8 @@ int afs_getattr(struct vfsmount *mnt, struct dentry *dentry,
 
        inode = dentry->d_inode;
 
-       _enter("{ ino=%lu v=%lu }", inode->i_ino, inode->i_version);
+       _enter("{ ino=%lu v=%llu }", inode->i_ino,
+               (unsigned long long)inode->i_version);
 
        generic_fillattr(inode, stat);
        return 0;
index 7249e014819e1a0432621d1d2576fd259f27632d..456c9ab7705b0fce1f7299ee455c26614330fb34 100644 (file)
@@ -3213,6 +3213,50 @@ static int buffer_cpu_notify(struct notifier_block *self,
        return NOTIFY_OK;
 }
 
+/**
+ * bh_uptodate_or_lock: Test whether the buffer is uptodate
+ * @bh: struct buffer_head
+ *
+ * Return true if the buffer is up-to-date and false,
+ * with the buffer locked, if not.
+ */
+int bh_uptodate_or_lock(struct buffer_head *bh)
+{
+       if (!buffer_uptodate(bh)) {
+               lock_buffer(bh);
+               if (!buffer_uptodate(bh))
+                       return 0;
+               unlock_buffer(bh);
+       }
+       return 1;
+}
+EXPORT_SYMBOL(bh_uptodate_or_lock);
+
+/**
+ * bh_submit_read: Submit a locked buffer for reading
+ * @bh: struct buffer_head
+ *
+ * Returns zero on success and -EIO on error.
+ */
+int bh_submit_read(struct buffer_head *bh)
+{
+       BUG_ON(!buffer_locked(bh));
+
+       if (buffer_uptodate(bh)) {
+               unlock_buffer(bh);
+               return 0;
+       }
+
+       get_bh(bh);
+       bh->b_end_io = end_buffer_read_sync;
+       submit_bh(READ, bh);
+       wait_on_buffer(bh);
+       if (buffer_uptodate(bh))
+               return 0;
+       return -EIO;
+}
+EXPORT_SYMBOL(bh_submit_read);
+
 void __init buffer_init(void)
 {
        int nrpages;
index da8cb3b3592c96ade1614ab8e553b5e16597b42a..ffdc022cae64adb62b5108d259f08fa3057b1423 100644 (file)
@@ -1376,7 +1376,7 @@ static int do_atm_ioctl(unsigned int fd, unsigned int cmd32, unsigned long arg)
         return -EINVAL;
 }
 
-static __attribute_used__ int 
+static __used int
 ret_einval(unsigned int fd, unsigned int cmd, unsigned long arg)
 {
        return -EINVAL;
index 154e25f13d772235e7c7d242a0b5e6fdbf977cd7..6abaf75163f0b17b1cbfb9b8b6e7ad563930bfdc 100644 (file)
@@ -680,11 +680,31 @@ static int ext2_check_descriptors (struct super_block * sb)
 static loff_t ext2_max_size(int bits)
 {
        loff_t res = EXT2_NDIR_BLOCKS;
-       /* This constant is calculated to be the largest file size for a
-        * dense, 4k-blocksize file such that the total number of
+       int meta_blocks;
+       loff_t upper_limit;
+
+       /* This is calculated to be the largest file size for a
+        * dense, file such that the total number of
         * sectors in the file, including data and all indirect blocks,
-        * does not exceed 2^32. */
-       const loff_t upper_limit = 0x1ff7fffd000LL;
+        * does not exceed 2^32 -1
+        * __u32 i_blocks representing the total number of
+        * 512 bytes blocks of the file
+        */
+       upper_limit = (1LL << 32) - 1;
+
+       /* total blocks in file system block size */
+       upper_limit >>= (bits - 9);
+
+
+       /* indirect blocks */
+       meta_blocks = 1;
+       /* double indirect blocks */
+       meta_blocks += 1 + (1LL << (bits-2));
+       /* tripple indirect blocks */
+       meta_blocks += 1 + (1LL << (bits-2)) + (1LL << (2*(bits-2)));
+
+       upper_limit -= meta_blocks;
+       upper_limit <<= bits;
 
        res += 1LL << (bits-2);
        res += 1LL << (2*(bits-2));
@@ -692,6 +712,10 @@ static loff_t ext2_max_size(int bits)
        res <<= bits;
        if (res > upper_limit)
                res = upper_limit;
+
+       if (res > MAX_LFS_FILESIZE)
+               res = MAX_LFS_FILESIZE;
+
        return res;
 }
 
index cb14de1502c35783fd89d11f4b362cd28faaf91c..f3675cc630e97a7c5ab42251193050872957c5c1 100644 (file)
@@ -1436,11 +1436,31 @@ static void ext3_orphan_cleanup (struct super_block * sb,
 static loff_t ext3_max_size(int bits)
 {
        loff_t res = EXT3_NDIR_BLOCKS;
-       /* This constant is calculated to be the largest file size for a
-        * dense, 4k-blocksize file such that the total number of
+       int meta_blocks;
+       loff_t upper_limit;
+
+       /* This is calculated to be the largest file size for a
+        * dense, file such that the total number of
         * sectors in the file, including data and all indirect blocks,
-        * does not exceed 2^32. */
-       const loff_t upper_limit = 0x1ff7fffd000LL;
+        * does not exceed 2^32 -1
+        * __u32 i_blocks representing the total number of
+        * 512 bytes blocks of the file
+        */
+       upper_limit = (1LL << 32) - 1;
+
+       /* total blocks in file system block size */
+       upper_limit >>= (bits - 9);
+
+
+       /* indirect blocks */
+       meta_blocks = 1;
+       /* double indirect blocks */
+       meta_blocks += 1 + (1LL << (bits-2));
+       /* tripple indirect blocks */
+       meta_blocks += 1 + (1LL << (bits-2)) + (1LL << (2*(bits-2)));
+
+       upper_limit -= meta_blocks;
+       upper_limit <<= bits;
 
        res += 1LL << (bits-2);
        res += 1LL << (2*(bits-2));
@@ -1448,6 +1468,10 @@ static loff_t ext3_max_size(int bits)
        res <<= bits;
        if (res > upper_limit)
                res = upper_limit;
+
+       if (res > MAX_LFS_FILESIZE)
+               res = MAX_LFS_FILESIZE;
+
        return res;
 }
 
index ae6e7e502ac9c0d5585370ba0c051410ab766ad8..ac6fa8ca0a2f1b7b3a99f03accce58f6f666e209 100644 (file)
@@ -6,7 +6,7 @@ obj-$(CONFIG_EXT4DEV_FS) += ext4dev.o
 
 ext4dev-y      := balloc.o bitmap.o dir.o file.o fsync.o ialloc.o inode.o \
                   ioctl.o namei.o super.o symlink.o hash.o resize.o extents.o \
-                  ext4_jbd2.o
+                  ext4_jbd2.o migrate.o mballoc.o
 
 ext4dev-$(CONFIG_EXT4DEV_FS_XATTR)     += xattr.o xattr_user.o xattr_trusted.o
 ext4dev-$(CONFIG_EXT4DEV_FS_POSIX_ACL) += acl.o
index 71ee95e534fdcb6c7128aa843accea7b9a082eec..ac75ea953d83fbfe3d3919783f85c963d905c264 100644 (file)
@@ -29,7 +29,7 @@
  * Calculate the block group number and offset, given a block number
  */
 void ext4_get_group_no_and_offset(struct super_block *sb, ext4_fsblk_t blocknr,
-               unsigned long *blockgrpp, ext4_grpblk_t *offsetp)
+               ext4_group_t *blockgrpp, ext4_grpblk_t *offsetp)
 {
        struct ext4_super_block *es = EXT4_SB(sb)->s_es;
        ext4_grpblk_t offset;
@@ -46,7 +46,7 @@ void ext4_get_group_no_and_offset(struct super_block *sb, ext4_fsblk_t blocknr,
 /* Initializes an uninitialized block bitmap if given, and returns the
  * number of blocks free in the group. */
 unsigned ext4_init_block_bitmap(struct super_block *sb, struct buffer_head *bh,
-                               int block_group, struct ext4_group_desc *gdp)
+                ext4_group_t block_group, struct ext4_group_desc *gdp)
 {
        unsigned long start;
        int bit, bit_max;
@@ -60,7 +60,7 @@ unsigned ext4_init_block_bitmap(struct super_block *sb, struct buffer_head *bh,
                 * essentially implementing a per-group read-only flag. */
                if (!ext4_group_desc_csum_verify(sbi, block_group, gdp)) {
                        ext4_error(sb, __FUNCTION__,
-                                  "Checksum bad for group %u\n", block_group);
+                                 "Checksum bad for group %lu\n", block_group);
                        gdp->bg_free_blocks_count = 0;
                        gdp->bg_free_inodes_count = 0;
                        gdp->bg_itable_unused = 0;
@@ -153,7 +153,7 @@ unsigned ext4_init_block_bitmap(struct super_block *sb, struct buffer_head *bh,
  *                     group descriptor
  */
 struct ext4_group_desc * ext4_get_group_desc(struct super_block * sb,
-                                            unsigned int block_group,
+                                            ext4_group_t block_group,
                                             struct buffer_head ** bh)
 {
        unsigned long group_desc;
@@ -164,7 +164,7 @@ struct ext4_group_desc * ext4_get_group_desc(struct super_block * sb,
        if (block_group >= sbi->s_groups_count) {
                ext4_error (sb, "ext4_get_group_desc",
                            "block_group >= groups_count - "
-                           "block_group = %d, groups_count = %lu",
+                           "block_group = %lu, groups_count = %lu",
                            block_group, sbi->s_groups_count);
 
                return NULL;
@@ -176,7 +176,7 @@ struct ext4_group_desc * ext4_get_group_desc(struct super_block * sb,
        if (!sbi->s_group_desc[group_desc]) {
                ext4_error (sb, "ext4_get_group_desc",
                            "Group descriptor not loaded - "
-                           "block_group = %d, group_desc = %lu, desc = %lu",
+                           "block_group = %lu, group_desc = %lu, desc = %lu",
                             block_group, group_desc, offset);
                return NULL;
        }
@@ -189,18 +189,70 @@ struct ext4_group_desc * ext4_get_group_desc(struct super_block * sb,
        return desc;
 }
 
+static int ext4_valid_block_bitmap(struct super_block *sb,
+                                       struct ext4_group_desc *desc,
+                                       unsigned int block_group,
+                                       struct buffer_head *bh)
+{
+       ext4_grpblk_t offset;
+       ext4_grpblk_t next_zero_bit;
+       ext4_fsblk_t bitmap_blk;
+       ext4_fsblk_t group_first_block;
+
+       if (EXT4_HAS_INCOMPAT_FEATURE(sb, EXT4_FEATURE_INCOMPAT_FLEX_BG)) {
+               /* with FLEX_BG, the inode/block bitmaps and itable
+                * blocks may not be in the group at all
+                * so the bitmap validation will be skipped for those groups
+                * or it has to also read the block group where the bitmaps
+                * are located to verify they are set.
+                */
+               return 1;
+       }
+       group_first_block = ext4_group_first_block_no(sb, block_group);
+
+       /* check whether block bitmap block number is set */
+       bitmap_blk = ext4_block_bitmap(sb, desc);
+       offset = bitmap_blk - group_first_block;
+       if (!ext4_test_bit(offset, bh->b_data))
+               /* bad block bitmap */
+               goto err_out;
+
+       /* check whether the inode bitmap block number is set */
+       bitmap_blk = ext4_inode_bitmap(sb, desc);
+       offset = bitmap_blk - group_first_block;
+       if (!ext4_test_bit(offset, bh->b_data))
+               /* bad block bitmap */
+               goto err_out;
+
+       /* check whether the inode table block number is set */
+       bitmap_blk = ext4_inode_table(sb, desc);
+       offset = bitmap_blk - group_first_block;
+       next_zero_bit = ext4_find_next_zero_bit(bh->b_data,
+                               offset + EXT4_SB(sb)->s_itb_per_group,
+                               offset);
+       if (next_zero_bit >= offset + EXT4_SB(sb)->s_itb_per_group)
+               /* good bitmap for inode tables */
+               return 1;
+
+err_out:
+       ext4_error(sb, __FUNCTION__,
+                       "Invalid block bitmap - "
+                       "block_group = %d, block = %llu",
+                       block_group, bitmap_blk);
+       return 0;
+}
 /**
  * read_block_bitmap()
  * @sb:                        super block
  * @block_group:       given block group
  *
- * Read the bitmap for a given block_group, reading into the specified
- * slot in the superblock's bitmap cache.
+ * Read the bitmap for a given block_group,and validate the
+ * bits for block/inode/inode tables are set in the bitmaps
  *
  * Return buffer_head on success or NULL in case of failure.
  */
 struct buffer_head *
-read_block_bitmap(struct super_block *sb, unsigned int block_group)
+read_block_bitmap(struct super_block *sb, ext4_group_t block_group)
 {
        struct ext4_group_desc * desc;
        struct buffer_head * bh = NULL;
@@ -210,25 +262,36 @@ read_block_bitmap(struct super_block *sb, unsigned int block_group)
        if (!desc)
                return NULL;
        bitmap_blk = ext4_block_bitmap(sb, desc);
+       bh = sb_getblk(sb, bitmap_blk);
+       if (unlikely(!bh)) {
+               ext4_error(sb, __FUNCTION__,
+                           "Cannot read block bitmap - "
+                           "block_group = %d, block_bitmap = %llu",
+                           (int)block_group, (unsigned long long)bitmap_blk);
+               return NULL;
+       }
+       if (bh_uptodate_or_lock(bh))
+               return bh;
+
        if (desc->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT)) {
-               bh = sb_getblk(sb, bitmap_blk);
-               if (!buffer_uptodate(bh)) {
-                       lock_buffer(bh);
-                       if (!buffer_uptodate(bh)) {
-                               ext4_init_block_bitmap(sb, bh, block_group,
-                                                      desc);
-                               set_buffer_uptodate(bh);
-                       }
-                       unlock_buffer(bh);
-               }
-       } else {
-               bh = sb_bread(sb, bitmap_blk);
+               ext4_init_block_bitmap(sb, bh, block_group, desc);
+               set_buffer_uptodate(bh);
+               unlock_buffer(bh);
+               return bh;
        }
-       if (!bh)
-               ext4_error (sb, __FUNCTION__,
+       if (bh_submit_read(bh) < 0) {
+               put_bh(bh);
+               ext4_error(sb, __FUNCTION__,
                            "Cannot read block bitmap - "
                            "block_group = %d, block_bitmap = %llu",
-                           block_group, bitmap_blk);
+                           (int)block_group, (unsigned long long)bitmap_blk);
+               return NULL;
+       }
+       if (!ext4_valid_block_bitmap(sb, desc, block_group, bh)) {
+               put_bh(bh);
+               return NULL;
+       }
+
        return bh;
 }
 /*
@@ -320,7 +383,7 @@ restart:
  */
 static int
 goal_in_my_reservation(struct ext4_reserve_window *rsv, ext4_grpblk_t grp_goal,
-                       unsigned int group, struct super_block * sb)
+                       ext4_group_t group, struct super_block *sb)
 {
        ext4_fsblk_t group_first_block, group_last_block;
 
@@ -463,7 +526,7 @@ static inline int rsv_is_empty(struct ext4_reserve_window *rsv)
  * when setting the reservation window size through ioctl before the file
  * is open for write (needs block allocation).
  *
- * Needs truncate_mutex protection prior to call this function.
+ * Needs down_write(i_data_sem) protection prior to call this function.
  */
 void ext4_init_block_alloc_info(struct inode *inode)
 {
@@ -514,6 +577,8 @@ void ext4_discard_reservation(struct inode *inode)
        struct ext4_reserve_window_node *rsv;
        spinlock_t *rsv_lock = &EXT4_SB(inode->i_sb)->s_rsv_window_lock;
 
+       ext4_mb_discard_inode_preallocations(inode);
+
        if (!block_i)
                return;
 
@@ -540,7 +605,7 @@ void ext4_free_blocks_sb(handle_t *handle, struct super_block *sb,
 {
        struct buffer_head *bitmap_bh = NULL;
        struct buffer_head *gd_bh;
-       unsigned long block_group;
+       ext4_group_t block_group;
        ext4_grpblk_t bit;
        unsigned long i;
        unsigned long overflow;
@@ -587,11 +652,13 @@ do_more:
            in_range(ext4_inode_bitmap(sb, desc), block, count) ||
            in_range(block, ext4_inode_table(sb, desc), sbi->s_itb_per_group) ||
            in_range(block + count - 1, ext4_inode_table(sb, desc),
-                    sbi->s_itb_per_group))
+                    sbi->s_itb_per_group)) {
                ext4_error (sb, "ext4_free_blocks",
                            "Freeing blocks in system zones - "
                            "Block = %llu, count = %lu",
                            block, count);
+               goto error_return;
+       }
 
        /*
         * We are about to start releasing blocks in the bitmap,
@@ -720,19 +787,29 @@ error_return:
  * @inode:             inode
  * @block:             start physical block to free
  * @count:             number of blocks to count
+ * @metadata:          Are these metadata blocks
  */
 void ext4_free_blocks(handle_t *handle, struct inode *inode,
-                       ext4_fsblk_t block, unsigned long count)
+                       ext4_fsblk_t block, unsigned long count,
+                       int metadata)
 {
        struct super_block * sb;
        unsigned long dquot_freed_blocks;
 
+       /* this isn't the right place to decide whether block is metadata
+        * inode.c/extents.c knows better, but for safety ... */
+       if (S_ISDIR(inode->i_mode) || S_ISLNK(inode->i_mode) ||
+                       ext4_should_journal_data(inode))
+               metadata = 1;
+
        sb = inode->i_sb;
-       if (!sb) {
-               printk ("ext4_free_blocks: nonexistent device");
-               return;
-       }
-       ext4_free_blocks_sb(handle, sb, block, count, &dquot_freed_blocks);
+
+       if (!test_opt(sb, MBALLOC) || !EXT4_SB(sb)->s_group_info)
+               ext4_free_blocks_sb(handle, sb, block, count,
+                                               &dquot_freed_blocks);
+       else
+               ext4_mb_free_blocks(handle, inode, block, count,
+                                               metadata, &dquot_freed_blocks);
        if (dquot_freed_blocks)
                DQUOT_FREE_BLOCK(inode, dquot_freed_blocks);
        return;
@@ -920,9 +997,10 @@ claim_block(spinlock_t *lock, ext4_grpblk_t block, struct buffer_head *bh)
  * ext4_journal_release_buffer(), else we'll run out of credits.
  */
 static ext4_grpblk_t
-ext4_try_to_allocate(struct super_block *sb, handle_t *handle, int group,
-                       struct buffer_head *bitmap_bh, ext4_grpblk_t grp_goal,
-                       unsigned long *count, struct ext4_reserve_window *my_rsv)
+ext4_try_to_allocate(struct super_block *sb, handle_t *handle,
+                       ext4_group_t group, struct buffer_head *bitmap_bh,
+                       ext4_grpblk_t grp_goal, unsigned long *count,
+                       struct ext4_reserve_window *my_rsv)
 {
        ext4_fsblk_t group_first_block;
        ext4_grpblk_t start, end;
@@ -1156,7 +1234,7 @@ static int find_next_reservable_window(
  */
 static int alloc_new_reservation(struct ext4_reserve_window_node *my_rsv,
                ext4_grpblk_t grp_goal, struct super_block *sb,
-               unsigned int group, struct buffer_head *bitmap_bh)
+               ext4_group_t group, struct buffer_head *bitmap_bh)
 {
        struct ext4_reserve_window_node *search_head;
        ext4_fsblk_t group_first_block, group_end_block, start_block;
@@ -1354,7 +1432,7 @@ static void try_to_extend_reservation(struct ext4_reserve_window_node *my_rsv,
  */
 static ext4_grpblk_t
 ext4_try_to_allocate_with_rsv(struct super_block *sb, handle_t *handle,
-                       unsigned int group, struct buffer_head *bitmap_bh,
+                       ext4_group_t group, struct buffer_head *bitmap_bh,
                        ext4_grpblk_t grp_goal,
                        struct ext4_reserve_window_node * my_rsv,
                        unsigned long *count, int *errp)
@@ -1510,7 +1588,7 @@ int ext4_should_retry_alloc(struct super_block *sb, int *retries)
 }
 
 /**
- * ext4_new_blocks() -- core block(s) allocation function
+ * ext4_new_blocks_old() -- core block(s) allocation function
  * @handle:            handle to this transaction
  * @inode:             file inode
  * @goal:              given target block(filesystem wide)
@@ -1523,17 +1601,17 @@ int ext4_should_retry_alloc(struct super_block *sb, int *retries)
  * any specific goal block.
  *
  */
-ext4_fsblk_t ext4_new_blocks(handle_t *handle, struct inode *inode,
+ext4_fsblk_t ext4_new_blocks_old(handle_t *handle, struct inode *inode,
                        ext4_fsblk_t goal, unsigned long *count, int *errp)
 {
        struct buffer_head *bitmap_bh = NULL;
        struct buffer_head *gdp_bh;
-       unsigned long group_no;
-       int goal_group;
+       ext4_group_t group_no;
+       ext4_group_t goal_group;
        ext4_grpblk_t grp_target_blk;   /* blockgroup relative goal block */
        ext4_grpblk_t grp_alloc_blk;    /* blockgroup-relative allocated block*/
        ext4_fsblk_t ret_block;         /* filesyetem-wide allocated block */
-       int bgi;                        /* blockgroup iteration index */
+       ext4_group_t bgi;                       /* blockgroup iteration index */
        int fatal = 0, err;
        int performed_allocation = 0;
        ext4_grpblk_t free_blocks;      /* number of free blocks in a group */
@@ -1544,10 +1622,7 @@ ext4_fsblk_t ext4_new_blocks(handle_t *handle, struct inode *inode,
        struct ext4_reserve_window_node *my_rsv = NULL;
        struct ext4_block_alloc_info *block_i;
        unsigned short windowsz = 0;
-#ifdef EXT4FS_DEBUG
-       static int goal_hits, goal_attempts;
-#endif
-       unsigned long ngroups;
+       ext4_group_t ngroups;
        unsigned long num = *count;
 
        *errp = -ENOSPC;
@@ -1567,7 +1642,7 @@ ext4_fsblk_t ext4_new_blocks(handle_t *handle, struct inode *inode,
 
        sbi = EXT4_SB(sb);
        es = EXT4_SB(sb)->s_es;
-       ext4_debug("goal=%lu.\n", goal);
+       ext4_debug("goal=%llu.\n", goal);
        /*
         * Allocate a block from reservation only when
         * filesystem is mounted with reservation(default,-o reservation), and
@@ -1677,7 +1752,7 @@ retry_alloc:
 
 allocated:
 
-       ext4_debug("using block group %d(%d)\n",
+       ext4_debug("using block group %lu(%d)\n",
                        group_no, gdp->bg_free_blocks_count);
 
        BUFFER_TRACE(gdp_bh, "get_write_access");
@@ -1692,11 +1767,13 @@ allocated:
            in_range(ret_block, ext4_inode_table(sb, gdp),
                     EXT4_SB(sb)->s_itb_per_group) ||
            in_range(ret_block + num - 1, ext4_inode_table(sb, gdp),
-                    EXT4_SB(sb)->s_itb_per_group))
+                    EXT4_SB(sb)->s_itb_per_group)) {
                ext4_error(sb, "ext4_new_block",
                            "Allocating block in system zone - "
                            "blocks from %llu, length %lu",
                             ret_block, num);
+               goto out;
+       }
 
        performed_allocation = 1;
 
@@ -1743,9 +1820,6 @@ allocated:
         * list of some description.  We don't know in advance whether
         * the caller wants to use it as metadata or data.
         */
-       ext4_debug("allocating block %lu. Goal hits %d of %d.\n",
-                       ret_block, goal_hits, goal_attempts);
-
        spin_lock(sb_bgl_lock(sbi, group_no));
        if (gdp->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT))
                gdp->bg_flags &= cpu_to_le16(~EXT4_BG_BLOCK_UNINIT);
@@ -1787,13 +1861,46 @@ out:
 }
 
 ext4_fsblk_t ext4_new_block(handle_t *handle, struct inode *inode,
-                       ext4_fsblk_t goal, int *errp)
+               ext4_fsblk_t goal, int *errp)
 {
-       unsigned long count = 1;
+       struct ext4_allocation_request ar;
+       ext4_fsblk_t ret;
 
-       return ext4_new_blocks(handle, inode, goal, &count, errp);
+       if (!test_opt(inode->i_sb, MBALLOC)) {
+               unsigned long count = 1;
+               ret = ext4_new_blocks_old(handle, inode, goal, &count, errp);
+               return ret;
+       }
+
+       memset(&ar, 0, sizeof(ar));
+       ar.inode = inode;
+       ar.goal = goal;
+       ar.len = 1;
+       ret = ext4_mb_new_blocks(handle, &ar, errp);
+       return ret;
+}
+
+ext4_fsblk_t ext4_new_blocks(handle_t *handle, struct inode *inode,
+               ext4_fsblk_t goal, unsigned long *count, int *errp)
+{
+       struct ext4_allocation_request ar;
+       ext4_fsblk_t ret;
+
+       if (!test_opt(inode->i_sb, MBALLOC)) {
+               ret = ext4_new_blocks_old(handle, inode, goal, count, errp);
+               return ret;
+       }
+
+       memset(&ar, 0, sizeof(ar));
+       ar.inode = inode;
+       ar.goal = goal;
+       ar.len = *count;
+       ret = ext4_mb_new_blocks(handle, &ar, errp);
+       *count = ar.len;
+       return ret;
 }
 
+
 /**
  * ext4_count_free_blocks() -- count filesystem free blocks
  * @sb:                superblock
@@ -1804,8 +1911,8 @@ ext4_fsblk_t ext4_count_free_blocks(struct super_block *sb)
 {
        ext4_fsblk_t desc_count;
        struct ext4_group_desc *gdp;
-       int i;
-       unsigned long ngroups = EXT4_SB(sb)->s_groups_count;
+       ext4_group_t i;
+       ext4_group_t ngroups = EXT4_SB(sb)->s_groups_count;
 #ifdef EXT4FS_DEBUG
        struct ext4_super_block *es;
        ext4_fsblk_t bitmap_count;
@@ -1829,14 +1936,14 @@ ext4_fsblk_t ext4_count_free_blocks(struct super_block *sb)
                        continue;
 
                x = ext4_count_free(bitmap_bh, sb->s_blocksize);
-               printk("group %d: stored = %d, counted = %lu\n",
+               printk(KERN_DEBUG "group %lu: stored = %d, counted = %lu\n",
                        i, le16_to_cpu(gdp->bg_free_blocks_count), x);
                bitmap_count += x;
        }
        brelse(bitmap_bh);
        printk("ext4_count_free_blocks: stored = %llu"
                ", computed = %llu, %llu\n",
-              EXT4_FREE_BLOCKS_COUNT(es),
+               ext4_free_blocks_count(es),
                desc_count, bitmap_count);
        return bitmap_count;
 #else
@@ -1853,7 +1960,7 @@ ext4_fsblk_t ext4_count_free_blocks(struct super_block *sb)
 #endif
 }
 
-static inline int test_root(int a, int b)
+static inline int test_root(ext4_group_t a, int b)
 {
        int num = b;
 
@@ -1862,7 +1969,7 @@ static inline int test_root(int a, int b)
        return num == a;
 }
 
-static int ext4_group_sparse(int group)
+static int ext4_group_sparse(ext4_group_t group)
 {
        if (group <= 1)
                return 1;
@@ -1880,7 +1987,7 @@ static int ext4_group_sparse(int group)
  *     Return the number of blocks used by the superblock (primary or backup)
  *     in this group.  Currently this will be only 0 or 1.
  */
-int ext4_bg_has_super(struct super_block *sb, int group)
+int ext4_bg_has_super(struct super_block *sb, ext4_group_t group)
 {
        if (EXT4_HAS_RO_COMPAT_FEATURE(sb,
                                EXT4_FEATURE_RO_COMPAT_SPARSE_SUPER) &&
@@ -1889,18 +1996,20 @@ int ext4_bg_has_super(struct super_block *sb, int group)
        return 1;
 }
 
-static unsigned long ext4_bg_num_gdb_meta(struct super_block *sb, int group)
+static unsigned long ext4_bg_num_gdb_meta(struct super_block *sb,
+                                       ext4_group_t group)
 {
        unsigned long metagroup = group / EXT4_DESC_PER_BLOCK(sb);
-       unsigned long first = metagroup * EXT4_DESC_PER_BLOCK(sb);
-       unsigned long last = first + EXT4_DESC_PER_BLOCK(sb) - 1;
+       ext4_group_t first = metagroup * EXT4_DESC_PER_BLOCK(sb);
+       ext4_group_t last = first + EXT4_DESC_PER_BLOCK(sb) - 1;
 
        if (group == first || group == first + 1 || group == last)
                return 1;
        return 0;
 }
 
-static unsigned long ext4_bg_num_gdb_nometa(struct super_block *sb, int group)
+static unsigned long ext4_bg_num_gdb_nometa(struct super_block *sb,
+                                       ext4_group_t group)
 {
        if (EXT4_HAS_RO_COMPAT_FEATURE(sb,
                                EXT4_FEATURE_RO_COMPAT_SPARSE_SUPER) &&
@@ -1918,7 +2027,7 @@ static unsigned long ext4_bg_num_gdb_nometa(struct super_block *sb, int group)
  *     (primary or backup) in this group.  In the future there may be a
  *     different number of descriptor blocks in each group.
  */
-unsigned long ext4_bg_num_gdb(struct super_block *sb, int group)
+unsigned long ext4_bg_num_gdb(struct super_block *sb, ext4_group_t group)
 {
        unsigned long first_meta_bg =
                        le32_to_cpu(EXT4_SB(sb)->s_es->s_first_meta_bg);
index f612bef983158f4a82c3d2a4b1211a94e713019a..33888bb58144fc1660d4106dbcbd689b560d2aa0 100644 (file)
@@ -67,7 +67,7 @@ int ext4_check_dir_entry (const char * function, struct inode * dir,
                          unsigned long offset)
 {
        const char * error_msg = NULL;
-       const int rlen = le16_to_cpu(de->rec_len);
+       const int rlen = ext4_rec_len_from_disk(de->rec_len);
 
        if (rlen < EXT4_DIR_REC_LEN(1))
                error_msg = "rec_len is smaller than minimal";
@@ -124,7 +124,7 @@ static int ext4_readdir(struct file * filp,
        offset = filp->f_pos & (sb->s_blocksize - 1);
 
        while (!error && !stored && filp->f_pos < inode->i_size) {
-               unsigned long blk = filp->f_pos >> EXT4_BLOCK_SIZE_BITS(sb);
+               ext4_lblk_t blk = filp->f_pos >> EXT4_BLOCK_SIZE_BITS(sb);
                struct buffer_head map_bh;
                struct buffer_head *bh = NULL;
 
@@ -172,10 +172,10 @@ revalidate:
                                 * least that it is non-zero.  A
                                 * failure will be detected in the
                                 * dirent test below. */
-                               if (le16_to_cpu(de->rec_len) <
-                                               EXT4_DIR_REC_LEN(1))
+                               if (ext4_rec_len_from_disk(de->rec_len)
+                                               EXT4_DIR_REC_LEN(1))
                                        break;
-                               i += le16_to_cpu(de->rec_len);
+                               i += ext4_rec_len_from_disk(de->rec_len);
                        }
                        offset = i;
                        filp->f_pos = (filp->f_pos & ~(sb->s_blocksize - 1))
@@ -197,7 +197,7 @@ revalidate:
                                ret = stored;
                                goto out;
                        }
-                       offset += le16_to_cpu(de->rec_len);
+                       offset += ext4_rec_len_from_disk(de->rec_len);
                        if (le32_to_cpu(de->inode)) {
                                /* We might block in the next section
                                 * if the data destination is
@@ -219,7 +219,7 @@ revalidate:
                                        goto revalidate;
                                stored ++;
                        }
-                       filp->f_pos += le16_to_cpu(de->rec_len);
+                       filp->f_pos += ext4_rec_len_from_disk(de->rec_len);
                }
                offset = 0;
                brelse (bh);
index 85287742f2ae487562be3a8ac39e40774a5de9d3..bc7081f1fbe80dd5add381c5746b1bbc74e4883a 100644 (file)
@@ -61,7 +61,7 @@ static ext4_fsblk_t ext_pblock(struct ext4_extent *ex)
  * idx_pblock:
  * combine low and high parts of a leaf physical block number into ext4_fsblk_t
  */
-static ext4_fsblk_t idx_pblock(struct ext4_extent_idx *ix)
+ext4_fsblk_t idx_pblock(struct ext4_extent_idx *ix)
 {
        ext4_fsblk_t block;
 
@@ -75,7 +75,7 @@ static ext4_fsblk_t idx_pblock(struct ext4_extent_idx *ix)
  * stores a large physical block number into an extent struct,
  * breaking it into parts
  */
-static void ext4_ext_store_pblock(struct ext4_extent *ex, ext4_fsblk_t pb)
+void ext4_ext_store_pblock(struct ext4_extent *ex, ext4_fsblk_t pb)
 {
        ex->ee_start_lo = cpu_to_le32((unsigned long) (pb & 0xffffffff));
        ex->ee_start_hi = cpu_to_le16((unsigned long) ((pb >> 31) >> 1) & 0xffff);
@@ -144,7 +144,7 @@ static int ext4_ext_dirty(handle_t *handle, struct inode *inode,
 
 static ext4_fsblk_t ext4_ext_find_goal(struct inode *inode,
                              struct ext4_ext_path *path,
-                             ext4_fsblk_t block)
+                             ext4_lblk_t block)
 {
        struct ext4_inode_info *ei = EXT4_I(inode);
        ext4_fsblk_t bg_start;
@@ -367,13 +367,14 @@ static void ext4_ext_drop_refs(struct ext4_ext_path *path)
  * the header must be checked before calling this
  */
 static void
-ext4_ext_binsearch_idx(struct inode *inode, struct ext4_ext_path *path, int block)
+ext4_ext_binsearch_idx(struct inode *inode,
+                       struct ext4_ext_path *path, ext4_lblk_t block)
 {
        struct ext4_extent_header *eh = path->p_hdr;
        struct ext4_extent_idx *r, *l, *m;
 
 
-       ext_debug("binsearch for %d(idx):  ", block);
+       ext_debug("binsearch for %u(idx):  ", block);
 
        l = EXT_FIRST_INDEX(eh) + 1;
        r = EXT_LAST_INDEX(eh);
@@ -425,7 +426,8 @@ ext4_ext_binsearch_idx(struct inode *inode, struct ext4_ext_path *path, int bloc
  * the header must be checked before calling this
  */
 static void
-ext4_ext_binsearch(struct inode *inode, struct ext4_ext_path *path, int block)
+ext4_ext_binsearch(struct inode *inode,
+               struct ext4_ext_path *path, ext4_lblk_t block)
 {
        struct ext4_extent_header *eh = path->p_hdr;
        struct ext4_extent *r, *l, *m;
@@ -438,7 +440,7 @@ ext4_ext_binsearch(struct inode *inode, struct ext4_ext_path *path, int block)
                return;
        }
 
-       ext_debug("binsearch for %d:  ", block);
+       ext_debug("binsearch for %u:  ", block);
 
        l = EXT_FIRST_EXTENT(eh) + 1;
        r = EXT_LAST_EXTENT(eh);
@@ -494,7 +496,8 @@ int ext4_ext_tree_init(handle_t *handle, struct inode *inode)
 }
 
 struct ext4_ext_path *
-ext4_ext_find_extent(struct inode *inode, int block, struct ext4_ext_path *path)
+ext4_ext_find_extent(struct inode *inode, ext4_lblk_t block,
+                                       struct ext4_ext_path *path)
 {
        struct ext4_extent_header *eh;
        struct buffer_head *bh;
@@ -763,7 +766,7 @@ static int ext4_ext_split(handle_t *handle, struct inode *inode,
        while (k--) {
                oldblock = newblock;
                newblock = ablocks[--a];
-               bh = sb_getblk(inode->i_sb, (ext4_fsblk_t)newblock);
+               bh = sb_getblk(inode->i_sb, newblock);
                if (!bh) {
                        err = -EIO;
                        goto cleanup;
@@ -783,9 +786,8 @@ static int ext4_ext_split(handle_t *handle, struct inode *inode,
                fidx->ei_block = border;
                ext4_idx_store_pblock(fidx, oldblock);
 
-               ext_debug("int.index at %d (block %llu): %lu -> %llu\n", i,
-                               newblock, (unsigned long) le32_to_cpu(border),
-                               oldblock);
+               ext_debug("int.index at %d (block %llu): %u -> %llu\n",
+                               i, newblock, le32_to_cpu(border), oldblock);
                /* copy indexes */
                m = 0;
                path[i].p_idx++;
@@ -851,7 +853,7 @@ cleanup:
                for (i = 0; i < depth; i++) {
                        if (!ablocks[i])
                                continue;
-                       ext4_free_blocks(handle, inode, ablocks[i], 1);
+                       ext4_free_blocks(handle, inode, ablocks[i], 1, 1);
                }
        }
        kfree(ablocks);
@@ -979,8 +981,8 @@ repeat:
                /* refill path */
                ext4_ext_drop_refs(path);
                path = ext4_ext_find_extent(inode,
-                                           le32_to_cpu(newext->ee_block),
-                                           path);
+                                   (ext4_lblk_t)le32_to_cpu(newext->ee_block),
+                                   path);
                if (IS_ERR(path))
                        err = PTR_ERR(path);
        } else {
@@ -992,8 +994,8 @@ repeat:
                /* refill path */
                ext4_ext_drop_refs(path);
                path = ext4_ext_find_extent(inode,
-                                           le32_to_cpu(newext->ee_block),
-                                           path);
+                                  (ext4_lblk_t)le32_to_cpu(newext->ee_block),
+                                   path);
                if (IS_ERR(path)) {
                        err = PTR_ERR(path);
                        goto out;
@@ -1014,6 +1016,150 @@ out:
        return err;
 }
 
+/*
+ * search the closest allocated block to the left for *logical
+ * and returns it at @logical + it's physical address at @phys
+ * if *logical is the smallest allocated block, the function
+ * returns 0 at @phys
+ * return value contains 0 (success) or error code
+ */
+int
+ext4_ext_search_left(struct inode *inode, struct ext4_ext_path *path,
+                       ext4_lblk_t *logical, ext4_fsblk_t *phys)
+{
+       struct ext4_extent_idx *ix;
+       struct ext4_extent *ex;
+       int depth, ee_len;
+
+       BUG_ON(path == NULL);
+       depth = path->p_depth;
+       *phys = 0;
+
+       if (depth == 0 && path->p_ext == NULL)
+               return 0;
+
+       /* usually extent in the path covers blocks smaller
+        * then *logical, but it can be that extent is the
+        * first one in the file */
+
+       ex = path[depth].p_ext;
+       ee_len = ext4_ext_get_actual_len(ex);
+       if (*logical < le32_to_cpu(ex->ee_block)) {
+               BUG_ON(EXT_FIRST_EXTENT(path[depth].p_hdr) != ex);
+               while (--depth >= 0) {
+                       ix = path[depth].p_idx;
+                       BUG_ON(ix != EXT_FIRST_INDEX(path[depth].p_hdr));
+               }
+               return 0;
+       }
+
+       BUG_ON(*logical < (le32_to_cpu(ex->ee_block) + ee_len));
+
+       *logical = le32_to_cpu(ex->ee_block) + ee_len - 1;
+       *phys = ext_pblock(ex) + ee_len - 1;
+       return 0;
+}
+
+/*
+ * search the closest allocated block to the right for *logical
+ * and returns it at @logical + it's physical address at @phys
+ * if *logical is the smallest allocated block, the function
+ * returns 0 at @phys
+ * return value contains 0 (success) or error code
+ */
+int
+ext4_ext_search_right(struct inode *inode, struct ext4_ext_path *path,
+                       ext4_lblk_t *logical, ext4_fsblk_t *phys)
+{
+       struct buffer_head *bh = NULL;
+       struct ext4_extent_header *eh;
+       struct ext4_extent_idx *ix;
+       struct ext4_extent *ex;
+       ext4_fsblk_t block;
+       int depth, ee_len;
+
+       BUG_ON(path == NULL);
+       depth = path->p_depth;
+       *phys = 0;
+
+       if (depth == 0 && path->p_ext == NULL)
+               return 0;
+
+       /* usually extent in the path covers blocks smaller
+        * then *logical, but it can be that extent is the
+        * first one in the file */
+
+       ex = path[depth].p_ext;
+       ee_len = ext4_ext_get_actual_len(ex);
+       if (*logical < le32_to_cpu(ex->ee_block)) {
+               BUG_ON(EXT_FIRST_EXTENT(path[depth].p_hdr) != ex);
+               while (--depth >= 0) {
+                       ix = path[depth].p_idx;
+                       BUG_ON(ix != EXT_FIRST_INDEX(path[depth].p_hdr));
+               }
+               *logical = le32_to_cpu(ex->ee_block);
+               *phys = ext_pblock(ex);
+               return 0;
+       }
+
+       BUG_ON(*logical < (le32_to_cpu(ex->ee_block) + ee_len));
+
+       if (ex != EXT_LAST_EXTENT(path[depth].p_hdr)) {
+               /* next allocated block in this leaf */
+               ex++;
+               *logical = le32_to_cpu(ex->ee_block);
+               *phys = ext_pblock(ex);
+               return 0;
+       }
+
+       /* go up and search for index to the right */
+       while (--depth >= 0) {
+               ix = path[depth].p_idx;
+               if (ix != EXT_LAST_INDEX(path[depth].p_hdr))
+                       break;
+       }
+
+       if (depth < 0) {
+               /* we've gone up to the root and
+                * found no index to the right */
+               return 0;
+       }
+
+       /* we've found index to the right, let's
+        * follow it and find the closest allocated
+        * block to the right */
+       ix++;
+       block = idx_pblock(ix);
+       while (++depth < path->p_depth) {
+               bh = sb_bread(inode->i_sb, block);
+               if (bh == NULL)
+                       return -EIO;
+               eh = ext_block_hdr(bh);
+               if (ext4_ext_check_header(inode, eh, depth)) {
+                       put_bh(bh);
+                       return -EIO;
+               }
+               ix = EXT_FIRST_INDEX(eh);
+               block = idx_pblock(ix);
+               put_bh(bh);
+       }
+
+       bh = sb_bread(inode->i_sb, block);
+       if (bh == NULL)
+               return -EIO;
+       eh = ext_block_hdr(bh);
+       if (ext4_ext_check_header(inode, eh, path->p_depth - depth)) {
+               put_bh(bh);
+               return -EIO;
+       }
+       ex = EXT_FIRST_EXTENT(eh);
+       *logical = le32_to_cpu(ex->ee_block);
+       *phys = ext_pblock(ex);
+       put_bh(bh);
+       return 0;
+
+}
+
 /*
  * ext4_ext_next_allocated_block:
  * returns allocated block in subsequent extent or EXT_MAX_BLOCK.
@@ -1021,7 +1167,7 @@ out:
  * allocated block. Thus, index entries have to be consistent
  * with leaves.
  */
-static unsigned long
+static ext4_lblk_t
 ext4_ext_next_allocated_block(struct ext4_ext_path *path)
 {
        int depth;
@@ -1054,7 +1200,7 @@ ext4_ext_next_allocated_block(struct ext4_ext_path *path)
  * ext4_ext_next_leaf_block:
  * returns first allocated block from next leaf or EXT_MAX_BLOCK
  */
-static unsigned ext4_ext_next_leaf_block(struct inode *inode,
+static ext4_lblk_t ext4_ext_next_leaf_block(struct inode *inode,
                                        struct ext4_ext_path *path)
 {
        int depth;
@@ -1072,7 +1218,8 @@ static unsigned ext4_ext_next_leaf_block(struct inode *inode,
        while (depth >= 0) {
                if (path[depth].p_idx !=
                                EXT_LAST_INDEX(path[depth].p_hdr))
-                 return le32_to_cpu(path[depth].p_idx[1].ei_block);
+                       return (ext4_lblk_t)
+                               le32_to_cpu(path[depth].p_idx[1].ei_block);
                depth--;
        }
 
@@ -1085,7 +1232,7 @@ static unsigned ext4_ext_next_leaf_block(struct inode *inode,
  * then we have to correct all indexes above.
  * TODO: do we need to correct tree in all cases?
  */
-int ext4_ext_correct_indexes(handle_t *handle, struct inode *inode,
+static int ext4_ext_correct_indexes(handle_t *handle, struct inode *inode,
                                struct ext4_ext_path *path)
 {
        struct ext4_extent_header *eh;
@@ -1171,7 +1318,7 @@ ext4_can_extents_be_merged(struct inode *inode, struct ext4_extent *ex1,
        if (ext1_ee_len + ext2_ee_len > max_len)
                return 0;
 #ifdef AGGRESSIVE_TEST
-       if (le16_to_cpu(ex1->ee_len) >= 4)
+       if (ext1_ee_len >= 4)
                return 0;
 #endif
 
@@ -1239,7 +1386,7 @@ unsigned int ext4_ext_check_overlap(struct inode *inode,
                                    struct ext4_extent *newext,
                                    struct ext4_ext_path *path)
 {
-       unsigned long b1, b2;
+       ext4_lblk_t b1, b2;
        unsigned int depth, len1;
        unsigned int ret = 0;
 
@@ -1260,7 +1407,7 @@ unsigned int ext4_ext_check_overlap(struct inode *inode,
                        goto out;
        }
 
-       /* check for wrap through zero */
+       /* check for wrap through zero on extent logical start block*/
        if (b1 + len1 < b1) {
                len1 = EXT_MAX_BLOCK - b1;
                newext->ee_len = cpu_to_le16(len1);
@@ -1290,7 +1437,8 @@ int ext4_ext_insert_extent(handle_t *handle, struct inode *inode,
        struct ext4_extent *ex, *fex;
        struct ext4_extent *nearex; /* nearest extent */
        struct ext4_ext_path *npath = NULL;
-       int depth, len, err, next;
+       int depth, len, err;
+       ext4_lblk_t next;
        unsigned uninitialized = 0;
 
        BUG_ON(ext4_ext_get_actual_len(newext) == 0);
@@ -1435,114 +1583,8 @@ cleanup:
        return err;
 }
 
-int ext4_ext_walk_space(struct inode *inode, unsigned long block,
-                       unsigned long num, ext_prepare_callback func,
-                       void *cbdata)
-{
-       struct ext4_ext_path *path = NULL;
-       struct ext4_ext_cache cbex;
-       struct ext4_extent *ex;
-       unsigned long next, start = 0, end = 0;
-       unsigned long last = block + num;
-       int depth, exists, err = 0;
-
-       BUG_ON(func == NULL);
-       BUG_ON(inode == NULL);
-
-       while (block < last && block != EXT_MAX_BLOCK) {
-               num = last - block;
-               /* find extent for this block */
-               path = ext4_ext_find_extent(inode, block, path);
-               if (IS_ERR(path)) {
-                       err = PTR_ERR(path);
-                       path = NULL;
-                       break;
-               }
-
-               depth = ext_depth(inode);
-               BUG_ON(path[depth].p_hdr == NULL);
-               ex = path[depth].p_ext;
-               next = ext4_ext_next_allocated_block(path);
-
-               exists = 0;
-               if (!ex) {
-                       /* there is no extent yet, so try to allocate
-                        * all requested space */
-                       start = block;
-                       end = block + num;
-               } else if (le32_to_cpu(ex->ee_block) > block) {
-                       /* need to allocate space before found extent */
-                       start = block;
-                       end = le32_to_cpu(ex->ee_block);
-                       if (block + num < end)
-                               end = block + num;
-               } else if (block >= le32_to_cpu(ex->ee_block)
-                                       + ext4_ext_get_actual_len(ex)) {
-                       /* need to allocate space after found extent */
-                       start = block;
-                       end = block + num;
-                       if (end >= next)
-                               end = next;
-               } else if (block >= le32_to_cpu(ex->ee_block)) {
-                       /*
-                        * some part of requested space is covered
-                        * by found extent
-                        */
-                       start = block;
-                       end = le32_to_cpu(ex->ee_block)
-                               + ext4_ext_get_actual_len(ex);
-                       if (block + num < end)
-                               end = block + num;
-                       exists = 1;
-               } else {
-                       BUG();
-               }
-               BUG_ON(end <= start);
-
-               if (!exists) {
-                       cbex.ec_block = start;
-                       cbex.ec_len = end - start;
-                       cbex.ec_start = 0;
-                       cbex.ec_type = EXT4_EXT_CACHE_GAP;
-               } else {
-                       cbex.ec_block = le32_to_cpu(ex->ee_block);
-                       cbex.ec_len = ext4_ext_get_actual_len(ex);
-                       cbex.ec_start = ext_pblock(ex);
-                       cbex.ec_type = EXT4_EXT_CACHE_EXTENT;
-               }
-
-               BUG_ON(cbex.ec_len == 0);
-               err = func(inode, path, &cbex, cbdata);
-               ext4_ext_drop_refs(path);
-
-               if (err < 0)
-                       break;
-               if (err == EXT_REPEAT)
-                       continue;
-               else if (err == EXT_BREAK) {
-                       err = 0;
-                       break;
-               }
-
-               if (ext_depth(inode) != depth) {
-                       /* depth was changed. we have to realloc path */
-                       kfree(path);
-                       path = NULL;
-               }
-
-               block = cbex.ec_block + cbex.ec_len;
-       }
-
-       if (path) {
-               ext4_ext_drop_refs(path);
-               kfree(path);
-       }
-
-       return err;
-}
-
 static void
-ext4_ext_put_in_cache(struct inode *inode, __u32 block,
+ext4_ext_put_in_cache(struct inode *inode, ext4_lblk_t block,
                        __u32 len, ext4_fsblk_t start, int type)
 {
        struct ext4_ext_cache *cex;
@@ -1561,10 +1603,11 @@ ext4_ext_put_in_cache(struct inode *inode, __u32 block,
  */
 static void
 ext4_ext_put_gap_in_cache(struct inode *inode, struct ext4_ext_path *path,
-                               unsigned long block)
+                               ext4_lblk_t block)
 {
        int depth = ext_depth(inode);
-       unsigned long lblock, len;
+       unsigned long len;
+       ext4_lblk_t lblock;
        struct ext4_extent *ex;
 
        ex = path[depth].p_ext;
@@ -1576,32 +1619,34 @@ ext4_ext_put_gap_in_cache(struct inode *inode, struct ext4_ext_path *path,
        } else if (block < le32_to_cpu(ex->ee_block)) {
                lblock = block;
                len = le32_to_cpu(ex->ee_block) - block;
-               ext_debug("cache gap(before): %lu [%lu:%lu]",
-                               (unsigned long) block,
-                               (unsigned long) le32_to_cpu(ex->ee_block),
-                               (unsigned long) ext4_ext_get_actual_len(ex));
+               ext_debug("cache gap(before): %u [%u:%u]",
+                               block,
+                               le32_to_cpu(ex->ee_block),
+                                ext4_ext_get_actual_len(ex));
        } else if (block >= le32_to_cpu(ex->ee_block)
                        + ext4_ext_get_actual_len(ex)) {
+               ext4_lblk_t next;
                lblock = le32_to_cpu(ex->ee_block)
                        + ext4_ext_get_actual_len(ex);
-               len = ext4_ext_next_allocated_block(path);
-               ext_debug("cache gap(after): [%lu:%lu] %lu",
-                               (unsigned long) le32_to_cpu(ex->ee_block),
-                               (unsigned long) ext4_ext_get_actual_len(ex),
-                               (unsigned long) block);
-               BUG_ON(len == lblock);
-               len = len - lblock;
+
+               next = ext4_ext_next_allocated_block(path);
+               ext_debug("cache gap(after): [%u:%u] %u",
+                               le32_to_cpu(ex->ee_block),
+                               ext4_ext_get_actual_len(ex),
+                               block);
+               BUG_ON(next == lblock);
+               len = next - lblock;
        } else {
                lblock = len = 0;
                BUG();
        }
 
-       ext_debug(" -> %lu:%lu\n", (unsigned long) lblock, len);
+       ext_debug(" -> %u:%lu\n", lblock, len);
        ext4_ext_put_in_cache(inode, lblock, len, 0, EXT4_EXT_CACHE_GAP);
 }
 
 static int
-ext4_ext_in_cache(struct inode *inode, unsigned long block,
+ext4_ext_in_cache(struct inode *inode, ext4_lblk_t block,
                        struct ext4_extent *ex)
 {
        struct ext4_ext_cache *cex;
@@ -1618,11 +1663,9 @@ ext4_ext_in_cache(struct inode *inode, unsigned long block,
                ex->ee_block = cpu_to_le32(cex->ec_block);
                ext4_ext_store_pblock(ex, cex->ec_start);
                ex->ee_len = cpu_to_le16(cex->ec_len);
-               ext_debug("%lu cached by %lu:%lu:%llu\n",
-                               (unsigned long) block,
-                               (unsigned long) cex->ec_block,
-                               (unsigned long) cex->ec_len,
-                               cex->ec_start);
+               ext_debug("%u cached by %u:%u:%llu\n",
+                               block,
+                               cex->ec_block, cex->ec_len, cex->ec_start);
                return cex->ec_type;
        }
 
@@ -1636,7 +1679,7 @@ ext4_ext_in_cache(struct inode *inode, unsigned long block,
  * It's used in truncate case only, thus all requests are for
  * last index in the block only.
  */
-int ext4_ext_rm_idx(handle_t *handle, struct inode *inode,
+static int ext4_ext_rm_idx(handle_t *handle, struct inode *inode,
                        struct ext4_ext_path *path)
 {
        struct buffer_head *bh;
@@ -1657,7 +1700,7 @@ int ext4_ext_rm_idx(handle_t *handle, struct inode *inode,
        ext_debug("index is empty, remove it, free block %llu\n", leaf);
        bh = sb_find_get_block(inode->i_sb, leaf);
        ext4_forget(handle, 1, inode, bh, leaf);
-       ext4_free_blocks(handle, inode, leaf, 1);
+       ext4_free_blocks(handle, inode, leaf, 1, 1);
        return err;
 }
 
@@ -1666,7 +1709,7 @@ int ext4_ext_rm_idx(handle_t *handle, struct inode *inode,
  * This routine returns max. credits that the extent tree can consume.
  * It should be OK for low-performance paths like ->writepage()
  * To allow many writing processes to fit into a single transaction,
- * the caller should calculate credits under truncate_mutex and
+ * the caller should calculate credits under i_data_sem and
  * pass the actual path.
  */
 int ext4_ext_calc_credits_for_insert(struct inode *inode,
@@ -1714,12 +1757,14 @@ int ext4_ext_calc_credits_for_insert(struct inode *inode,
 
 static int ext4_remove_blocks(handle_t *handle, struct inode *inode,
                                struct ext4_extent *ex,
-                               unsigned long from, unsigned long to)
+                               ext4_lblk_t from, ext4_lblk_t to)
 {
        struct buffer_head *bh;
        unsigned short ee_len =  ext4_ext_get_actual_len(ex);
-       int i;
+       int i, metadata = 0;
 
+       if (S_ISDIR(inode->i_mode) || S_ISLNK(inode->i_mode))
+               metadata = 1;
 #ifdef EXTENTS_STATS
        {
                struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
@@ -1738,42 +1783,45 @@ static int ext4_remove_blocks(handle_t *handle, struct inode *inode,
        if (from >= le32_to_cpu(ex->ee_block)
            && to == le32_to_cpu(ex->ee_block) + ee_len - 1) {
                /* tail removal */
-               unsigned long num;
+               ext4_lblk_t num;
                ext4_fsblk_t start;
+
                num = le32_to_cpu(ex->ee_block) + ee_len - from;
                start = ext_pblock(ex) + ee_len - num;
-               ext_debug("free last %lu blocks starting %llu\n", num, start);
+               ext_debug("free last %u blocks starting %llu\n", num, start);
                for (i = 0; i < num; i++) {
                        bh = sb_find_get_block(inode->i_sb, start + i);
                        ext4_forget(handle, 0, inode, bh, start + i);
                }
-               ext4_free_blocks(handle, inode, start, num);
+               ext4_free_blocks(handle, inode, start, num, metadata);
        } else if (from == le32_to_cpu(ex->ee_block)
                   && to <= le32_to_cpu(ex->ee_block) + ee_len - 1) {
-               printk("strange request: removal %lu-%lu from %u:%u\n",
+               printk(KERN_INFO "strange request: removal %u-%u from %u:%u\n",
                        from, to, le32_to_cpu(ex->ee_block), ee_len);
        } else {
-               printk("strange request: removal(2) %lu-%lu from %u:%u\n",
-                       from, to, le32_to_cpu(ex->ee_block), ee_len);
+               printk(KERN_INFO "strange request: removal(2) "
+                               "%u-%u from %u:%u\n",
+                               from, to, le32_to_cpu(ex->ee_block), ee_len);
        }
        return 0;
 }
 
 static int
 ext4_ext_rm_leaf(handle_t *handle, struct inode *inode,
-               struct ext4_ext_path *path, unsigned long start)
+               struct ext4_ext_path *path, ext4_lblk_t start)
 {
        int err = 0, correct_index = 0;
        int depth = ext_depth(inode), credits;
        struct ext4_extent_header *eh;
-       unsigned a, b, block, num;
-       unsigned long ex_ee_block;
+       ext4_lblk_t a, b, block;
+       unsigned num;
+       ext4_lblk_t ex_ee_block;
        unsigned short ex_ee_len;
        unsigned uninitialized = 0;
        struct ext4_extent *ex;
 
        /* the header must be checked already in ext4_ext_remove_space() */
-       ext_debug("truncate since %lu in leaf\n", start);
+       ext_debug("truncate since %u in leaf\n", start);
        if (!path[depth].p_hdr)
                path[depth].p_hdr = ext_block_hdr(path[depth].p_bh);
        eh = path[depth].p_hdr;
@@ -1904,7 +1952,7 @@ ext4_ext_more_to_rm(struct ext4_ext_path *path)
        return 1;
 }
 
-int ext4_ext_remove_space(struct inode *inode, unsigned long start)
+static int ext4_ext_remove_space(struct inode *inode, ext4_lblk_t start)
 {
        struct super_block *sb = inode->i_sb;
        int depth = ext_depth(inode);
@@ -1912,7 +1960,7 @@ int ext4_ext_remove_space(struct inode *inode, unsigned long start)
        handle_t *handle;
        int i = 0, err = 0;
 
-       ext_debug("truncate since %lu\n", start);
+       ext_debug("truncate since %u\n", start);
 
        /* probably first extent we're gonna free will be last in block */
        handle = ext4_journal_start(inode, depth + 1);
@@ -2094,17 +2142,19 @@ void ext4_ext_release(struct super_block *sb)
  *   b> Splits in two extents: Write is happening at either end of the extent
  *   c> Splits in three extents: Somone is writing in middle of the extent
  */
-int ext4_ext_convert_to_initialized(handle_t *handle, struct inode *inode,
-                                       struct ext4_ext_path *path,
-                                       ext4_fsblk_t iblock,
-                                       unsigned long max_blocks)
+static int ext4_ext_convert_to_initialized(handle_t *handle,
+                                               struct inode *inode,
+                                               struct ext4_ext_path *path,
+                                               ext4_lblk_t iblock,
+                                               unsigned long max_blocks)
 {
        struct ext4_extent *ex, newex;
        struct ext4_extent *ex1 = NULL;
        struct ext4_extent *ex2 = NULL;
        struct ext4_extent *ex3 = NULL;
        struct ext4_extent_header *eh;
-       unsigned int allocated, ee_block, ee_len, depth;
+       ext4_lblk_t ee_block;
+       unsigned int allocated, ee_len, depth;
        ext4_fsblk_t newblock;
        int err = 0;
        int ret = 0;
@@ -2225,8 +2275,13 @@ out:
        return err ? err : allocated;
 }
 
+/*
+ * Need to be called with
+ * down_read(&EXT4_I(inode)->i_data_sem) if not allocating file system block
+ * (ie, create is zero). Otherwise down_write(&EXT4_I(inode)->i_data_sem)
+ */
 int ext4_ext_get_blocks(handle_t *handle, struct inode *inode,
-                       ext4_fsblk_t iblock,
+                       ext4_lblk_t iblock,
                        unsigned long max_blocks, struct buffer_head *bh_result,
                        int create, int extend_disksize)
 {
@@ -2236,11 +2291,11 @@ int ext4_ext_get_blocks(handle_t *handle, struct inode *inode,
        ext4_fsblk_t goal, newblock;
        int err = 0, depth, ret;
        unsigned long allocated = 0;
+       struct ext4_allocation_request ar;
 
        __clear_bit(BH_New, &bh_result->b_state);
-       ext_debug("blocks %d/%lu requested for inode %u\n", (int) iblock,
-                       max_blocks, (unsigned) inode->i_ino);
-       mutex_lock(&EXT4_I(inode)->truncate_mutex);
+       ext_debug("blocks %u/%lu requested for inode %u\n",
+                       iblock, max_blocks, inode->i_ino);
 
        /* check in cache */
        goal = ext4_ext_in_cache(inode, iblock, &newex);
@@ -2260,7 +2315,7 @@ int ext4_ext_get_blocks(handle_t *handle, struct inode *inode,
                                   - le32_to_cpu(newex.ee_block)
                                   + ext_pblock(&newex);
                        /* number of remaining blocks in the extent */
-                       allocated = le16_to_cpu(newex.ee_len) -
+                       allocated = ext4_ext_get_actual_len(&newex) -
                                        (iblock - le32_to_cpu(newex.ee_block));
                        goto out;
                } else {
@@ -2288,7 +2343,7 @@ int ext4_ext_get_blocks(handle_t *handle, struct inode *inode,
 
        ex = path[depth].p_ext;
        if (ex) {
-               unsigned long ee_block = le32_to_cpu(ex->ee_block);
+               ext4_lblk_t ee_block = le32_to_cpu(ex->ee_block);
                ext4_fsblk_t ee_start = ext_pblock(ex);
                unsigned short ee_len;
 
@@ -2302,7 +2357,7 @@ int ext4_ext_get_blocks(handle_t *handle, struct inode *inode,
                        newblock = iblock - ee_block + ee_start;
                        /* number of remaining blocks in the extent */
                        allocated = ee_len - (iblock - ee_block);
-                       ext_debug("%d fit into %lu:%d -> %llu\n", (int) iblock,
+                       ext_debug("%u fit into %lu:%d -> %llu\n", iblock,
                                        ee_block, ee_len, newblock);
 
                        /* Do not put uninitialized extent in the cache */
@@ -2320,9 +2375,10 @@ int ext4_ext_get_blocks(handle_t *handle, struct inode *inode,
                        ret = ext4_ext_convert_to_initialized(handle, inode,
                                                                path, iblock,
                                                                max_blocks);
-                       if (ret <= 0)
+                       if (ret <= 0) {
+                               err = ret;
                                goto out2;
-                       else
+                       else
                                allocated = ret;
                        goto outnew;
                }
@@ -2347,8 +2403,15 @@ int ext4_ext_get_blocks(handle_t *handle, struct inode *inode,
        if (S_ISREG(inode->i_mode) && (!EXT4_I(inode)->i_block_alloc_info))
                ext4_init_block_alloc_info(inode);
 
-       /* allocate new block */
-       goal = ext4_ext_find_goal(inode, path, iblock);
+       /* find neighbour allocated blocks */
+       ar.lleft = iblock;
+       err = ext4_ext_search_left(inode, path, &ar.lleft, &ar.pleft);
+       if (err)
+               goto out2;
+       ar.lright = iblock;
+       err = ext4_ext_search_right(inode, path, &ar.lright, &ar.pright);
+       if (err)
+               goto out2;
 
        /*
         * See if request is beyond maximum number of blocks we can have in
@@ -2368,10 +2431,21 @@ int ext4_ext_get_blocks(handle_t *handle, struct inode *inode,
        newex.ee_len = cpu_to_le16(max_blocks);
        err = ext4_ext_check_overlap(inode, &newex, path);
        if (err)
-               allocated = le16_to_cpu(newex.ee_len);
+               allocated = ext4_ext_get_actual_len(&newex);
        else
                allocated = max_blocks;
-       newblock = ext4_new_blocks(handle, inode, goal, &allocated, &err);
+
+       /* allocate new block */
+       ar.inode = inode;
+       ar.goal = ext4_ext_find_goal(inode, path, iblock);
+       ar.logical = iblock;
+       ar.len = allocated;
+       if (S_ISREG(inode->i_mode))
+               ar.flags = EXT4_MB_HINT_DATA;
+       else
+               /* disable in-core preallocation for non-regular files */
+               ar.flags = 0;
+       newblock = ext4_mb_new_blocks(handle, &ar, &err);
        if (!newblock)
                goto out2;
        ext_debug("allocate new block: goal %llu, found %llu/%lu\n",
@@ -2379,14 +2453,17 @@ int ext4_ext_get_blocks(handle_t *handle, struct inode *inode,
 
        /* try to insert new extent into found leaf and return */
        ext4_ext_store_pblock(&newex, newblock);
-       newex.ee_len = cpu_to_le16(allocated);
+       newex.ee_len = cpu_to_le16(ar.len);
        if (create == EXT4_CREATE_UNINITIALIZED_EXT)  /* Mark uninitialized */
                ext4_ext_mark_uninitialized(&newex);
        err = ext4_ext_insert_extent(handle, inode, path, &newex);
        if (err) {
                /* free data blocks we just allocated */
+               /* not a good idea to call discard here directly,
+                * but otherwise we'd need to call it every free() */
+               ext4_mb_discard_inode_preallocations(inode);
                ext4_free_blocks(handle, inode, ext_pblock(&newex),
-                                       le16_to_cpu(newex.ee_len));
+                                       ext4_ext_get_actual_len(&newex), 0);
                goto out2;
        }
 
@@ -2395,6 +2472,7 @@ int ext4_ext_get_blocks(handle_t *handle, struct inode *inode,
 
        /* previous routine could use block we allocated */
        newblock = ext_pblock(&newex);
+       allocated = ext4_ext_get_actual_len(&newex);
 outnew:
        __set_bit(BH_New, &bh_result->b_state);
 
@@ -2414,8 +2492,6 @@ out2:
                ext4_ext_drop_refs(path);
                kfree(path);
        }
-       mutex_unlock(&EXT4_I(inode)->truncate_mutex);
-
        return err ? err : allocated;
 }
 
@@ -2423,7 +2499,7 @@ void ext4_ext_truncate(struct inode * inode, struct page *page)
 {
        struct address_space *mapping = inode->i_mapping;
        struct super_block *sb = inode->i_sb;
-       unsigned long last_block;
+       ext4_lblk_t last_block;
        handle_t *handle;
        int err = 0;
 
@@ -2445,9 +2521,11 @@ void ext4_ext_truncate(struct inode * inode, struct page *page)
        if (page)
                ext4_block_truncate_page(handle, page, mapping, inode->i_size);
 
-       mutex_lock(&EXT4_I(inode)->truncate_mutex);
+       down_write(&EXT4_I(inode)->i_data_sem);
        ext4_ext_invalidate_cache(inode);
 
+       ext4_mb_discard_inode_preallocations(inode);
+
        /*
         * TODO: optimization is possible here.
         * Probably we need not scan at all,
@@ -2481,7 +2559,7 @@ out_stop:
        if (inode->i_nlink)
                ext4_orphan_del(handle, inode);
 
-       mutex_unlock(&EXT4_I(inode)->truncate_mutex);
+       up_write(&EXT4_I(inode)->i_data_sem);
        ext4_journal_stop(handle);
 }
 
@@ -2516,7 +2594,8 @@ int ext4_ext_writepage_trans_blocks(struct inode *inode, int num)
 long ext4_fallocate(struct inode *inode, int mode, loff_t offset, loff_t len)
 {
        handle_t *handle;
-       ext4_fsblk_t block, max_blocks;
+       ext4_lblk_t block;
+       unsigned long max_blocks;
        ext4_fsblk_t nblocks = 0;
        int ret = 0;
        int ret2 = 0;
@@ -2544,6 +2623,7 @@ long ext4_fallocate(struct inode *inode, int mode, loff_t offset, loff_t len)
         * modify 1 super block, 1 block bitmap and 1 group descriptor.
         */
        credits = EXT4_DATA_TRANS_BLOCKS(inode->i_sb) + 3;
+       down_write((&EXT4_I(inode)->i_data_sem));
 retry:
        while (ret >= 0 && ret < max_blocks) {
                block = block + ret;
@@ -2557,12 +2637,12 @@ retry:
                ret = ext4_ext_get_blocks(handle, inode, block,
                                          max_blocks, &map_bh,
                                          EXT4_CREATE_UNINITIALIZED_EXT, 0);
-               WARN_ON(!ret);
-               if (!ret) {
+               WARN_ON(ret <= 0);
+               if (ret <= 0) {
                        ext4_error(inode->i_sb, "ext4_fallocate",
-                                  "ext4_ext_get_blocks returned 0! inode#%lu"
-                                  ", block=%llu, max_blocks=%llu",
-                                  inode->i_ino, block, max_blocks);
+                                   "ext4_ext_get_blocks returned error: "
+                                   "inode#%lu, block=%u, max_blocks=%lu",
+                                   inode->i_ino, block, max_blocks);
                        ret = -EIO;
                        ext4_mark_inode_dirty(handle, inode);
                        ret2 = ext4_journal_stop(handle);
@@ -2600,6 +2680,7 @@ retry:
        if (ret == -ENOSPC && ext4_should_retry_alloc(inode->i_sb, &retries))
                goto retry;
 
+       up_write((&EXT4_I(inode)->i_data_sem));
        /*
         * Time to update the file size.
         * Update only when preallocation was requested beyond the file size.
index 1a81cd66d63b2b2371e4c35025f004657ca66adf..ac35ec58db55c9e9fe4b4d3effde74274b614b25 100644 (file)
@@ -37,9 +37,9 @@ static int ext4_release_file (struct inode * inode, struct file * filp)
        if ((filp->f_mode & FMODE_WRITE) &&
                        (atomic_read(&inode->i_writecount) == 1))
        {
-               mutex_lock(&EXT4_I(inode)->truncate_mutex);
+               down_write(&EXT4_I(inode)->i_data_sem);
                ext4_discard_reservation(inode);
-               mutex_unlock(&EXT4_I(inode)->truncate_mutex);
+               up_write(&EXT4_I(inode)->i_data_sem);
        }
        if (is_dx(inode) && filp->private_data)
                ext4_htree_free_dir_info(filp->private_data);
@@ -56,8 +56,25 @@ ext4_file_write(struct kiocb *iocb, const struct iovec *iov,
        ssize_t ret;
        int err;
 
-       ret = generic_file_aio_write(iocb, iov, nr_segs, pos);
+       /*
+        * If we have encountered a bitmap-format file, the size limit
+        * is smaller than s_maxbytes, which is for extent-mapped files.
+        */
+
+       if (!(EXT4_I(inode)->i_flags & EXT4_EXTENTS_FL)) {
+               struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
+               size_t length = iov_length(iov, nr_segs);
 
+               if (pos > sbi->s_bitmap_maxbytes)
+                       return -EFBIG;
+
+               if (pos + length > sbi->s_bitmap_maxbytes) {
+                       nr_segs = iov_shorten((struct iovec *)iov, nr_segs,
+                                             sbi->s_bitmap_maxbytes - pos);
+               }
+       }
+
+       ret = generic_file_aio_write(iocb, iov, nr_segs, pos);
        /*
         * Skip flushing if there was an error, or if nothing was written.
         */
index 1577910bb58b4135d2bbfbb025c67037cd25785a..7eb0604e7eea8f76973175fc3ee6b247a79171cb 100644 (file)
@@ -14,14 +14,16 @@ extern __le16 ext4_group_desc_csum(struct ext4_sb_info *sbi, __u32 group,
 extern int ext4_group_desc_csum_verify(struct ext4_sb_info *sbi, __u32 group,
                                       struct ext4_group_desc *gdp);
 struct buffer_head *read_block_bitmap(struct super_block *sb,
-                                     unsigned int block_group);
+                                     ext4_group_t block_group);
 extern unsigned ext4_init_block_bitmap(struct super_block *sb,
-                                      struct buffer_head *bh, int group,
+                                      struct buffer_head *bh,
+                                      ext4_group_t group,
                                       struct ext4_group_desc *desc);
 #define ext4_free_blocks_after_init(sb, group, desc)                   \
                ext4_init_block_bitmap(sb, NULL, group, desc)
 extern unsigned ext4_init_inode_bitmap(struct super_block *sb,
-                                      struct buffer_head *bh, int group,
+                                      struct buffer_head *bh,
+                                      ext4_group_t group,
                                       struct ext4_group_desc *desc);
 extern void mark_bitmap_end(int start_bit, int end_bit, char *bitmap);
 #endif /* _LINUX_EXT4_GROUP_H */
index c61f37fd3f05e4d72061791d79600024c858df97..575b5215c8084f3613484555afb0f5727a404826 100644 (file)
@@ -64,8 +64,8 @@ void mark_bitmap_end(int start_bit, int end_bit, char *bitmap)
 }
 
 /* Initializes an uninitialized inode bitmap */
-unsigned ext4_init_inode_bitmap(struct super_block *sb,
-                               struct buffer_head *bh, int block_group,
+unsigned ext4_init_inode_bitmap(struct super_block *sb, struct buffer_head *bh,
+                               ext4_group_t block_group,
                                struct ext4_group_desc *gdp)
 {
        struct ext4_sb_info *sbi = EXT4_SB(sb);
@@ -75,7 +75,7 @@ unsigned ext4_init_inode_bitmap(struct super_block *sb,
        /* If checksum is bad mark all blocks and inodes use to prevent
         * allocation, essentially implementing a per-group read-only flag. */
        if (!ext4_group_desc_csum_verify(sbi, block_group, gdp)) {
-               ext4_error(sb, __FUNCTION__, "Checksum bad for group %u\n",
+               ext4_error(sb, __FUNCTION__, "Checksum bad for group %lu\n",
                           block_group);
                gdp->bg_free_blocks_count = 0;
                gdp->bg_free_inodes_count = 0;
@@ -98,7 +98,7 @@ unsigned ext4_init_inode_bitmap(struct super_block *sb,
  * Return buffer_head of bitmap on success or NULL.
  */
 static struct buffer_head *
-read_inode_bitmap(struct super_block * sb, unsigned long block_group)
+read_inode_bitmap(struct super_block *sb, ext4_group_t block_group)
 {
        struct ext4_group_desc *desc;
        struct buffer_head *bh = NULL;
@@ -152,7 +152,7 @@ void ext4_free_inode (handle_t *handle, struct inode * inode)
        unsigned long ino;
        struct buffer_head *bitmap_bh = NULL;
        struct buffer_head *bh2;
-       unsigned long block_group;
+       ext4_group_t block_group;
        unsigned long bit;
        struct ext4_group_desc * gdp;
        struct ext4_super_block * es;
@@ -260,12 +260,14 @@ error_return:
  * For other inodes, search forward from the parent directory\'s block
  * group to find a free inode.
  */
-static int find_group_dir(struct super_block *sb, struct inode *parent)
+static int find_group_dir(struct super_block *sb, struct inode *parent,
+                               ext4_group_t *best_group)
 {
-       int ngroups = EXT4_SB(sb)->s_groups_count;
+       ext4_group_t ngroups = EXT4_SB(sb)->s_groups_count;
        unsigned int freei, avefreei;
        struct ext4_group_desc *desc, *best_desc = NULL;
-       int group, best_group = -1;
+       ext4_group_t group;
+       int ret = -1;
 
        freei = percpu_counter_read_positive(&EXT4_SB(sb)->s_freeinodes_counter);
        avefreei = freei / ngroups;
@@ -279,11 +281,12 @@ static int find_group_dir(struct super_block *sb, struct inode *parent)
                if (!best_desc ||
                    (le16_to_cpu(desc->bg_free_blocks_count) >
                     le16_to_cpu(best_desc->bg_free_blocks_count))) {
-                       best_group = group;
+                       *best_group = group;
                        best_desc = desc;
+                       ret = 0;
                }
        }
-       return best_group;
+       return ret;
 }
 
 /*
@@ -314,12 +317,13 @@ static int find_group_dir(struct super_block *sb, struct inode *parent)
 #define INODE_COST 64
 #define BLOCK_COST 256
 
-static int find_group_orlov(struct super_block *sb, struct inode *parent)
+static int find_group_orlov(struct super_block *sb, struct inode *parent,
+                               ext4_group_t *group)
 {
-       int parent_group = EXT4_I(parent)->i_block_group;
+       ext4_group_t parent_group = EXT4_I(parent)->i_block_group;
        struct ext4_sb_info *sbi = EXT4_SB(sb);
        struct ext4_super_block *es = sbi->s_es;
-       int ngroups = sbi->s_groups_count;
+       ext4_group_t ngroups = sbi->s_groups_count;
        int inodes_per_group = EXT4_INODES_PER_GROUP(sb);
        unsigned int freei, avefreei;
        ext4_fsblk_t freeb, avefreeb;
@@ -327,7 +331,7 @@ static int find_group_orlov(struct super_block *sb, struct inode *parent)
        unsigned int ndirs;
        int max_debt, max_dirs, min_inodes;
        ext4_grpblk_t min_blocks;
-       int group = -1, i;
+       ext4_group_t i;
        struct ext4_group_desc *desc;
 
        freei = percpu_counter_read_positive(&sbi->s_freeinodes_counter);
@@ -340,13 +344,14 @@ static int find_group_orlov(struct super_block *sb, struct inode *parent)
        if ((parent == sb->s_root->d_inode) ||
            (EXT4_I(parent)->i_flags & EXT4_TOPDIR_FL)) {
                int best_ndir = inodes_per_group;
-               int best_group = -1;
+               ext4_group_t grp;
+               int ret = -1;
 
-               get_random_bytes(&group, sizeof(group));
-               parent_group = (unsigned)group % ngroups;
+               get_random_bytes(&grp, sizeof(grp));
+               parent_group = (unsigned)grp % ngroups;
                for (i = 0; i < ngroups; i++) {
-                       group = (parent_group + i) % ngroups;
-                       desc = ext4_get_group_desc (sb, group, NULL);
+                       grp = (parent_group + i) % ngroups;
+                       desc = ext4_get_group_desc(sb, grp, NULL);
                        if (!desc || !desc->bg_free_inodes_count)
                                continue;
                        if (le16_to_cpu(desc->bg_used_dirs_count) >= best_ndir)
@@ -355,11 +360,12 @@ static int find_group_orlov(struct super_block *sb, struct inode *parent)
                                continue;
                        if (le16_to_cpu(desc->bg_free_blocks_count) < avefreeb)
                                continue;
-                       best_group = group;
+                       *group = grp;
+                       ret = 0;
                        best_ndir = le16_to_cpu(desc->bg_used_dirs_count);
                }
-               if (best_group >= 0)
-                       return best_group;
+               if (ret == 0)
+                       return ret;
                goto fallback;
        }
 
@@ -380,8 +386,8 @@ static int find_group_orlov(struct super_block *sb, struct inode *parent)
                max_debt = 1;
 
        for (i = 0; i < ngroups; i++) {
-               group = (parent_group + i) % ngroups;
-               desc = ext4_get_group_desc (sb, group, NULL);
+               *group = (parent_group + i) % ngroups;
+               desc = ext4_get_group_desc(sb, *group, NULL);
                if (!desc || !desc->bg_free_inodes_count)
                        continue;
                if (le16_to_cpu(desc->bg_used_dirs_count) >= max_dirs)
@@ -390,17 +396,16 @@ static int find_group_orlov(struct super_block *sb, struct inode *parent)
                        continue;
                if (le16_to_cpu(desc->bg_free_blocks_count) < min_blocks)
                        continue;
-               return group;
+               return 0;
        }
 
 fallback:
        for (i = 0; i < ngroups; i++) {
-               group = (parent_group + i) % ngroups;
-               desc = ext4_get_group_desc (sb, group, NULL);
-               if (!desc || !desc->bg_free_inodes_count)
-                       continue;
-               if (le16_to_cpu(desc->bg_free_inodes_count) >= avefreei)
-                       return group;
+               *group = (parent_group + i) % ngroups;
+               desc = ext4_get_group_desc(sb, *group, NULL);
+               if (desc && desc->bg_free_inodes_count &&
+                       le16_to_cpu(desc->bg_free_inodes_count) >= avefreei)
+                       return 0;
        }
 
        if (avefreei) {
@@ -415,21 +420,22 @@ fallback:
        return -1;
 }
 
-static int find_group_other(struct super_block *sb, struct inode *parent)
+static int find_group_other(struct super_block *sb, struct inode *parent,
+                               ext4_group_t *group)
 {
-       int parent_group = EXT4_I(parent)->i_block_group;
-       int ngroups = EXT4_SB(sb)->s_groups_count;
+       ext4_group_t parent_group = EXT4_I(parent)->i_block_group;
+       ext4_group_t ngroups = EXT4_SB(sb)->s_groups_count;
        struct ext4_group_desc *desc;
-       int group, i;
+       ext4_group_t i;
 
        /*
         * Try to place the inode in its parent directory
         */
-       group = parent_group;
-       desc = ext4_get_group_desc (sb, group, NULL);
+       *group = parent_group;
+       desc = ext4_get_group_desc(sb, *group, NULL);
        if (desc && le16_to_cpu(desc->bg_free_inodes_count) &&
                        le16_to_cpu(desc->bg_free_blocks_count))
-               return group;
+               return 0;
 
        /*
         * We're going to place this inode in a different blockgroup from its
@@ -440,33 +446,33 @@ static int find_group_other(struct super_block *sb, struct inode *parent)
         *
         * So add our directory's i_ino into the starting point for the hash.
         */
-       group = (group + parent->i_ino) % ngroups;
+       *group = (*group + parent->i_ino) % ngroups;
 
        /*
         * Use a quadratic hash to find a group with a free inode and some free
         * blocks.
         */
        for (i = 1; i < ngroups; i <<= 1) {
-               group += i;
-               if (group >= ngroups)
-                       group -= ngroups;
-               desc = ext4_get_group_desc (sb, group, NULL);
+               *group += i;
+               if (*group >= ngroups)
+                       *group -= ngroups;
+               desc = ext4_get_group_desc(sb, *group, NULL);
                if (desc && le16_to_cpu(desc->bg_free_inodes_count) &&
                                le16_to_cpu(desc->bg_free_blocks_count))
-                       return group;
+                       return 0;
        }
 
        /*
         * That failed: try linear search for a free inode, even if that group
         * has no free blocks.
         */
-       group = parent_group;
+       *group = parent_group;
        for (i = 0; i < ngroups; i++) {
-               if (++group >= ngroups)
-                       group = 0;
-               desc = ext4_get_group_desc (sb, group, NULL);
+               if (++*group >= ngroups)
+                       *group = 0;
+               desc = ext4_get_group_desc(sb, *group, NULL);
                if (desc && le16_to_cpu(desc->bg_free_inodes_count))
-                       return group;
+                       return 0;
        }
 
        return -1;
@@ -487,16 +493,17 @@ struct inode *ext4_new_inode(handle_t *handle, struct inode * dir, int mode)
        struct super_block *sb;
        struct buffer_head *bitmap_bh = NULL;
        struct buffer_head *bh2;
-       int group;
+       ext4_group_t group = 0;
        unsigned long ino = 0;
        struct inode * inode;
        struct ext4_group_desc * gdp = NULL;
        struct ext4_super_block * es;
        struct ext4_inode_info *ei;
        struct ext4_sb_info *sbi;
-       int err = 0;
+       int ret2, err = 0;
        struct inode *ret;
-       int i, free = 0;
+       ext4_group_t i;
+       int free = 0;
 
        /* Cannot create files in a deleted directory */
        if (!dir || !dir->i_nlink)
@@ -512,14 +519,14 @@ struct inode *ext4_new_inode(handle_t *handle, struct inode * dir, int mode)
        es = sbi->s_es;
        if (S_ISDIR(mode)) {
                if (test_opt (sb, OLDALLOC))
-                       group = find_group_dir(sb, dir);
+                       ret2 = find_group_dir(sb, dir, &group);
                else
-                       group = find_group_orlov(sb, dir);
+                       ret2 = find_group_orlov(sb, dir, &group);
        } else
-               group = find_group_other(sb, dir);
+               ret2 = find_group_other(sb, dir, &group);
 
        err = -ENOSPC;
-       if (group == -1)
+       if (ret2 == -1)
                goto out;
 
        for (i = 0; i < sbi->s_groups_count; i++) {
@@ -583,7 +590,7 @@ got:
            ino > EXT4_INODES_PER_GROUP(sb)) {
                ext4_error(sb, __FUNCTION__,
                           "reserved inode or inode > inodes count - "
-                          "block_group = %d, inode=%lu", group,
+                          "block_group = %lu, inode=%lu", group,
                           ino + group * EXT4_INODES_PER_GROUP(sb));
                err = -EIO;
                goto fail;
@@ -702,7 +709,6 @@ got:
        if (!S_ISDIR(mode))
                ei->i_flags &= ~EXT4_DIRSYNC_FL;
        ei->i_file_acl = 0;
-       ei->i_dir_acl = 0;
        ei->i_dtime = 0;
        ei->i_block_alloc_info = NULL;
        ei->i_block_group = group;
@@ -741,13 +747,10 @@ got:
        if (test_opt(sb, EXTENTS)) {
                EXT4_I(inode)->i_flags |= EXT4_EXTENTS_FL;
                ext4_ext_tree_init(handle, inode);
-               if (!EXT4_HAS_INCOMPAT_FEATURE(sb, EXT4_FEATURE_INCOMPAT_EXTENTS)) {
-                       err = ext4_journal_get_write_access(handle, EXT4_SB(sb)->s_sbh);
-                       if (err) goto fail;
-                       EXT4_SET_INCOMPAT_FEATURE(sb, EXT4_FEATURE_INCOMPAT_EXTENTS);
-                       BUFFER_TRACE(EXT4_SB(sb)->s_sbh, "call ext4_journal_dirty_metadata");
-                       err = ext4_journal_dirty_metadata(handle, EXT4_SB(sb)->s_sbh);
-               }
+               err = ext4_update_incompat_feature(handle, sb,
+                                               EXT4_FEATURE_INCOMPAT_EXTENTS);
+               if (err)
+                       goto fail;
        }
 
        ext4_debug("allocating inode %lu\n", inode->i_ino);
@@ -777,7 +780,7 @@ fail_drop:
 struct inode *ext4_orphan_get(struct super_block *sb, unsigned long ino)
 {
        unsigned long max_ino = le32_to_cpu(EXT4_SB(sb)->s_es->s_inodes_count);
-       unsigned long block_group;
+       ext4_group_t block_group;
        int bit;
        struct buffer_head *bitmap_bh = NULL;
        struct inode *inode = NULL;
@@ -833,7 +836,7 @@ unsigned long ext4_count_free_inodes (struct super_block * sb)
 {
        unsigned long desc_count;
        struct ext4_group_desc *gdp;
-       int i;
+       ext4_group_t i;
 #ifdef EXT4FS_DEBUG
        struct ext4_super_block *es;
        unsigned long bitmap_count, x;
@@ -854,7 +857,7 @@ unsigned long ext4_count_free_inodes (struct super_block * sb)
                        continue;
 
                x = ext4_count_free(bitmap_bh, EXT4_INODES_PER_GROUP(sb) / 8);
-               printk("group %d: stored = %d, counted = %lu\n",
+               printk(KERN_DEBUG "group %lu: stored = %d, counted = %lu\n",
                        i, le16_to_cpu(gdp->bg_free_inodes_count), x);
                bitmap_count += x;
        }
@@ -879,7 +882,7 @@ unsigned long ext4_count_free_inodes (struct super_block * sb)
 unsigned long ext4_count_dirs (struct super_block * sb)
 {
        unsigned long count = 0;
-       int i;
+       ext4_group_t i;
 
        for (i = 0; i < EXT4_SB(sb)->s_groups_count; i++) {
                struct ext4_group_desc *gdp = ext4_get_group_desc (sb, i, NULL);
index 5489703d95738ffb075ed7cb275aab94eb4d74e6..bb717cbb749c822265bccdd8159de7af456ad745 100644 (file)
@@ -105,7 +105,7 @@ int ext4_forget(handle_t *handle, int is_metadata, struct inode *inode,
  */
 static unsigned long blocks_for_truncate(struct inode *inode)
 {
-       unsigned long needed;
+       ext4_lblk_t needed;
 
        needed = inode->i_blocks >> (inode->i_sb->s_blocksize_bits - 9);
 
@@ -243,13 +243,6 @@ static inline void add_chain(Indirect *p, struct buffer_head *bh, __le32 *v)
        p->bh = bh;
 }
 
-static int verify_chain(Indirect *from, Indirect *to)
-{
-       while (from <= to && from->key == *from->p)
-               from++;
-       return (from > to);
-}
-
 /**
  *     ext4_block_to_path - parse the block number into array of offsets
  *     @inode: inode in question (we are only interested in its superblock)
@@ -282,7 +275,8 @@ static int verify_chain(Indirect *from, Indirect *to)
  */
 
 static int ext4_block_to_path(struct inode *inode,
-                       long i_block, int offsets[4], int *boundary)
+                       ext4_lblk_t i_block,
+                       ext4_lblk_t offsets[4], int *boundary)
 {
        int ptrs = EXT4_ADDR_PER_BLOCK(inode->i_sb);
        int ptrs_bits = EXT4_ADDR_PER_BLOCK_BITS(inode->i_sb);
@@ -313,7 +307,10 @@ static int ext4_block_to_path(struct inode *inode,
                offsets[n++] = i_block & (ptrs - 1);
                final = ptrs;
        } else {
-               ext4_warning(inode->i_sb, "ext4_block_to_path", "block > big");
+               ext4_warning(inode->i_sb, "ext4_block_to_path",
+                               "block %lu > max",
+                               i_block + direct_blocks +
+                               indirect_blocks + double_blocks);
        }
        if (boundary)
                *boundary = final - 1 - (i_block & (ptrs - 1));
@@ -344,12 +341,14 @@ static int ext4_block_to_path(struct inode *inode,
  *             (pointer to last triple returned, *@err == 0)
  *     or when it gets an IO error reading an indirect block
  *             (ditto, *@err == -EIO)
- *     or when it notices that chain had been changed while it was reading
- *             (ditto, *@err == -EAGAIN)
  *     or when it reads all @depth-1 indirect blocks successfully and finds
  *     the whole chain, all way to the data (returns %NULL, *err == 0).
+ *
+ *      Need to be called with
+ *      down_read(&EXT4_I(inode)->i_data_sem)
  */
-static Indirect *ext4_get_branch(struct inode *inode, int depth, int *offsets,
+static Indirect *ext4_get_branch(struct inode *inode, int depth,
+                                ext4_lblk_t  *offsets,
                                 Indirect chain[4], int *err)
 {
        struct super_block *sb = inode->i_sb;
@@ -365,9 +364,6 @@ static Indirect *ext4_get_branch(struct inode *inode, int depth, int *offsets,
                bh = sb_bread(sb, le32_to_cpu(p->key));
                if (!bh)
                        goto failure;
-               /* Reader: pointers */
-               if (!verify_chain(chain, p))
-                       goto changed;
                add_chain(++p, bh, (__le32*)bh->b_data + *++offsets);
                /* Reader: end */
                if (!p->key)
@@ -375,10 +371,6 @@ static Indirect *ext4_get_branch(struct inode *inode, int depth, int *offsets,
        }
        return NULL;
 
-changed:
-       brelse(bh);
-       *err = -EAGAIN;
-       goto no_block;
 failure:
        *err = -EIO;
 no_block:
@@ -445,7 +437,7 @@ static ext4_fsblk_t ext4_find_near(struct inode *inode, Indirect *ind)
  *     stores it in *@goal and returns zero.
  */
 
-static ext4_fsblk_t ext4_find_goal(struct inode *inode, long block,
+static ext4_fsblk_t ext4_find_goal(struct inode *inode, ext4_lblk_t block,
                Indirect chain[4], Indirect *partial)
 {
        struct ext4_block_alloc_info *block_i;
@@ -559,7 +551,7 @@ static int ext4_alloc_blocks(handle_t *handle, struct inode *inode,
        return ret;
 failed_out:
        for (i = 0; i <index; i++)
-               ext4_free_blocks(handle, inode, new_blocks[i], 1);
+               ext4_free_blocks(handle, inode, new_blocks[i], 1, 0);
        return ret;
 }
 
@@ -590,7 +582,7 @@ failed_out:
  */
 static int ext4_alloc_branch(handle_t *handle, struct inode *inode,
                        int indirect_blks, int *blks, ext4_fsblk_t goal,
-                       int *offsets, Indirect *branch)
+                       ext4_lblk_t *offsets, Indirect *branch)
 {
        int blocksize = inode->i_sb->s_blocksize;
        int i, n = 0;
@@ -658,9 +650,9 @@ failed:
                ext4_journal_forget(handle, branch[i].bh);
        }
        for (i = 0; i <indirect_blks; i++)
-               ext4_free_blocks(handle, inode, new_blocks[i], 1);
+               ext4_free_blocks(handle, inode, new_blocks[i], 1, 0);
 
-       ext4_free_blocks(handle, inode, new_blocks[i], num);
+       ext4_free_blocks(handle, inode, new_blocks[i], num, 0);
 
        return err;
 }
@@ -680,7 +672,7 @@ failed:
  * chain to new block and return 0.
  */
 static int ext4_splice_branch(handle_t *handle, struct inode *inode,
-                       long block, Indirect *where, int num, int blks)
+                       ext4_lblk_t block, Indirect *where, int num, int blks)
 {
        int i;
        int err = 0;
@@ -757,9 +749,10 @@ err_out:
        for (i = 1; i <= num; i++) {
                BUFFER_TRACE(where[i].bh, "call jbd2_journal_forget");
                ext4_journal_forget(handle, where[i].bh);
-               ext4_free_blocks(handle,inode,le32_to_cpu(where[i-1].key),1);
+               ext4_free_blocks(handle, inode,
+                                       le32_to_cpu(where[i-1].key), 1, 0);
        }
-       ext4_free_blocks(handle, inode, le32_to_cpu(where[num].key), blks);
+       ext4_free_blocks(handle, inode, le32_to_cpu(where[num].key), blks, 0);
 
        return err;
 }
@@ -782,14 +775,19 @@ err_out:
  * return > 0, # of blocks mapped or allocated.
  * return = 0, if plain lookup failed.
  * return < 0, error case.
+ *
+ *
+ * Need to be called with
+ * down_read(&EXT4_I(inode)->i_data_sem) if not allocating file system block
+ * (ie, create is zero). Otherwise down_write(&EXT4_I(inode)->i_data_sem)
  */
 int ext4_get_blocks_handle(handle_t *handle, struct inode *inode,
-               sector_t iblock, unsigned long maxblocks,
+               ext4_lblk_t iblock, unsigned long maxblocks,
                struct buffer_head *bh_result,
                int create, int extend_disksize)
 {
        int err = -EIO;
-       int offsets[4];
+       ext4_lblk_t offsets[4];
        Indirect chain[4];
        Indirect *partial;
        ext4_fsblk_t goal;
@@ -803,7 +801,8 @@ int ext4_get_blocks_handle(handle_t *handle, struct inode *inode,
 
        J_ASSERT(!(EXT4_I(inode)->i_flags & EXT4_EXTENTS_FL));
        J_ASSERT(handle != NULL || create == 0);
-       depth = ext4_block_to_path(inode,iblock,offsets,&blocks_to_boundary);
+       depth = ext4_block_to_path(inode, iblock, offsets,
+                                       &blocks_to_boundary);
 
        if (depth == 0)
                goto out;
@@ -819,18 +818,6 @@ int ext4_get_blocks_handle(handle_t *handle, struct inode *inode,
                while (count < maxblocks && count <= blocks_to_boundary) {
                        ext4_fsblk_t blk;
 
-                       if (!verify_chain(chain, partial)) {
-                               /*
-                                * Indirect block might be removed by
-                                * truncate while we were reading it.
-                                * Handling of that case: forget what we've
-                                * got now. Flag the err as EAGAIN, so it
-                                * will reread.
-                                */
-                               err = -EAGAIN;
-                               count = 0;
-                               break;
-                       }
                        blk = le32_to_cpu(*(chain[depth-1].p + count));
 
                        if (blk == first_block + count)
@@ -838,44 +825,13 @@ int ext4_get_blocks_handle(handle_t *handle, struct inode *inode,
                        else
                                break;
                }
-               if (err != -EAGAIN)
-                       goto got_it;
+               goto got_it;
        }
 
        /* Next simple case - plain lookup or failed read of indirect block */
        if (!create || err == -EIO)
                goto cleanup;
 
-       mutex_lock(&ei->truncate_mutex);
-
-       /*
-        * If the indirect block is missing while we are reading
-        * the chain(ext4_get_branch() returns -EAGAIN err), or
-        * if the chain has been changed after we grab the semaphore,
-        * (either because another process truncated this branch, or
-        * another get_block allocated this branch) re-grab the chain to see if
-        * the request block has been allocated or not.
-        *
-        * Since we already block the truncate/other get_block
-        * at this point, we will have the current copy of the chain when we
-        * splice the branch into the tree.
-        */
-       if (err == -EAGAIN || !verify_chain(chain, partial)) {
-               while (partial > chain) {
-                       brelse(partial->bh);
-                       partial--;
-               }
-               partial = ext4_get_branch(inode, depth, offsets, chain, &err);
-               if (!partial) {
-                       count++;
-                       mutex_unlock(&ei->truncate_mutex);
-                       if (err)
-                               goto cleanup;
-                       clear_buffer_new(bh_result);
-                       goto got_it;
-               }
-       }
-
        /*
         * Okay, we need to do block allocation.  Lazily initialize the block
         * allocation info here if necessary
@@ -911,13 +867,12 @@ int ext4_get_blocks_handle(handle_t *handle, struct inode *inode,
                err = ext4_splice_branch(handle, inode, iblock,
                                        partial, indirect_blks, count);
        /*
-        * i_disksize growing is protected by truncate_mutex.  Don't forget to
+        * i_disksize growing is protected by i_data_sem.  Don't forget to
         * protect it if you're about to implement concurrent
         * ext4_get_block() -bzzz
        */
        if (!err && extend_disksize && inode->i_size > ei->i_disksize)
                ei->i_disksize = inode->i_size;
-       mutex_unlock(&ei->truncate_mutex);
        if (err)
                goto cleanup;
 
@@ -942,6 +897,47 @@ out:
 
 #define DIO_CREDITS (EXT4_RESERVE_TRANS_BLOCKS + 32)
 
+int ext4_get_blocks_wrap(handle_t *handle, struct inode *inode, sector_t block,
+                       unsigned long max_blocks, struct buffer_head *bh,
+                       int create, int extend_disksize)
+{
+       int retval;
+       /*
+        * Try to see if we can get  the block without requesting
+        * for new file system block.
+        */
+       down_read((&EXT4_I(inode)->i_data_sem));
+       if (EXT4_I(inode)->i_flags & EXT4_EXTENTS_FL) {
+               retval =  ext4_ext_get_blocks(handle, inode, block, max_blocks,
+                               bh, 0, 0);
+       } else {
+               retval = ext4_get_blocks_handle(handle,
+                               inode, block, max_blocks, bh, 0, 0);
+       }
+       up_read((&EXT4_I(inode)->i_data_sem));
+       if (!create || (retval > 0))
+               return retval;
+
+       /*
+        * We need to allocate new blocks which will result
+        * in i_data update
+        */
+       down_write((&EXT4_I(inode)->i_data_sem));
+       /*
+        * We need to check for EXT4 here because migrate
+        * could have changed the inode type in between
+        */
+       if (EXT4_I(inode)->i_flags & EXT4_EXTENTS_FL) {
+               retval =  ext4_ext_get_blocks(handle, inode, block, max_blocks,
+                               bh, create, extend_disksize);
+       } else {
+               retval = ext4_get_blocks_handle(handle, inode, block,
+                               max_blocks, bh, create, extend_disksize);
+       }
+       up_write((&EXT4_I(inode)->i_data_sem));
+       return retval;
+}
+
 static int ext4_get_block(struct inode *inode, sector_t iblock,
                        struct buffer_head *bh_result, int create)
 {
@@ -996,7 +992,7 @@ get_block:
  * `handle' can be NULL if create is zero
  */
 struct buffer_head *ext4_getblk(handle_t *handle, struct inode *inode,
-                               long block, int create, int *errp)
+                               ext4_lblk_t block, int create, int *errp)
 {
        struct buffer_head dummy;
        int fatal = 0, err;
@@ -1063,7 +1059,7 @@ err:
 }
 
 struct buffer_head *ext4_bread(handle_t *handle, struct inode *inode,
-                              int block, int create, int *err)
+                              ext4_lblk_t block, int create, int *err)
 {
        struct buffer_head * bh;
 
@@ -1446,7 +1442,7 @@ static int jbd2_journal_dirty_data_fn(handle_t *handle, struct buffer_head *bh)
  *     ext4_file_write() -> generic_file_write() -> __alloc_pages() -> ...
  *
  * Same applies to ext4_get_block().  We will deadlock on various things like
- * lock_journal and i_truncate_mutex.
+ * lock_journal and i_data_sem
  *
  * Setting PF_MEMALLOC here doesn't work - too many internal memory
  * allocations fail.
@@ -1828,7 +1824,8 @@ int ext4_block_truncate_page(handle_t *handle, struct page *page,
 {
        ext4_fsblk_t index = from >> PAGE_CACHE_SHIFT;
        unsigned offset = from & (PAGE_CACHE_SIZE-1);
-       unsigned blocksize, iblock, length, pos;
+       unsigned blocksize, length, pos;
+       ext4_lblk_t iblock;
        struct inode *inode = mapping->host;
        struct buffer_head *bh;
        int err = 0;
@@ -1964,7 +1961,7 @@ static inline int all_zeroes(__le32 *p, __le32 *q)
  *                     (no partially truncated stuff there).  */
 
 static Indirect *ext4_find_shared(struct inode *inode, int depth,
-                       int offsets[4], Indirect chain[4], __le32 *top)
+                       ext4_lblk_t offsets[4], Indirect chain[4], __le32 *top)
 {
        Indirect *partial, *p;
        int k, err;
@@ -2048,15 +2045,15 @@ static void ext4_clear_blocks(handle_t *handle, struct inode *inode,
        for (p = first; p < last; p++) {
                u32 nr = le32_to_cpu(*p);
                if (nr) {
-                       struct buffer_head *bh;
+                       struct buffer_head *tbh;
 
                        *p = 0;
-                       bh = sb_find_get_block(inode->i_sb, nr);
-                       ext4_forget(handle, 0, inode, bh, nr);
+                       tbh = sb_find_get_block(inode->i_sb, nr);
+                       ext4_forget(handle, 0, inode, tbh, nr);
                }
        }
 
-       ext4_free_blocks(handle, inode, block_to_free, count);
+       ext4_free_blocks(handle, inode, block_to_free, count, 0);
 }
 
 /**
@@ -2229,7 +2226,7 @@ static void ext4_free_branches(handle_t *handle, struct inode *inode,
                                ext4_journal_test_restart(handle, inode);
                        }
 
-                       ext4_free_blocks(handle, inode, nr, 1);
+                       ext4_free_blocks(handle, inode, nr, 1, 1);
 
                        if (parent_bh) {
                                /*
@@ -2289,12 +2286,12 @@ void ext4_truncate(struct inode *inode)
        __le32 *i_data = ei->i_data;
        int addr_per_block = EXT4_ADDR_PER_BLOCK(inode->i_sb);
        struct address_space *mapping = inode->i_mapping;
-       int offsets[4];
+       ext4_lblk_t offsets[4];
        Indirect chain[4];
        Indirect *partial;
        __le32 nr = 0;
        int n;
-       long last_block;
+       ext4_lblk_t last_block;
        unsigned blocksize = inode->i_sb->s_blocksize;
        struct page *page;
 
@@ -2320,8 +2317,10 @@ void ext4_truncate(struct inode *inode)
                        return;
        }
 
-       if (EXT4_I(inode)->i_flags & EXT4_EXTENTS_FL)
-               return ext4_ext_truncate(inode, page);
+       if (EXT4_I(inode)->i_flags & EXT4_EXTENTS_FL) {
+               ext4_ext_truncate(inode, page);
+               return;
+       }
 
        handle = start_transaction(inode);
        if (IS_ERR(handle)) {
@@ -2369,7 +2368,7 @@ void ext4_truncate(struct inode *inode)
         * From here we block out all ext4_get_block() callers who want to
         * modify the block allocation tree.
         */
-       mutex_lock(&ei->truncate_mutex);
+       down_write(&ei->i_data_sem);
 
        if (n == 1) {           /* direct blocks */
                ext4_free_data(handle, inode, NULL, i_data+offsets[0],
@@ -2433,7 +2432,7 @@ do_indirects:
 
        ext4_discard_reservation(inode);
 
-       mutex_unlock(&ei->truncate_mutex);
+       up_write(&ei->i_data_sem);
        inode->i_mtime = inode->i_ctime = ext4_current_time(inode);
        ext4_mark_inode_dirty(handle, inode);
 
@@ -2460,7 +2459,8 @@ out_stop:
 static ext4_fsblk_t ext4_get_inode_block(struct super_block *sb,
                unsigned long ino, struct ext4_iloc *iloc)
 {
-       unsigned long desc, group_desc, block_group;
+       unsigned long desc, group_desc;
+       ext4_group_t block_group;
        unsigned long offset;
        ext4_fsblk_t block;
        struct buffer_head *bh;
@@ -2547,7 +2547,7 @@ static int __ext4_get_inode_loc(struct inode *inode,
                        struct ext4_group_desc *desc;
                        int inodes_per_buffer;
                        int inode_offset, i;
-                       int block_group;
+                       ext4_group_t block_group;
                        int start;
 
                        block_group = (inode->i_ino - 1) /
@@ -2660,6 +2660,28 @@ void ext4_get_inode_flags(struct ext4_inode_info *ei)
        if (flags & S_DIRSYNC)
                ei->i_flags |= EXT4_DIRSYNC_FL;
 }
+static blkcnt_t ext4_inode_blocks(struct ext4_inode *raw_inode,
+                                       struct ext4_inode_info *ei)
+{
+       blkcnt_t i_blocks ;
+       struct inode *inode = &(ei->vfs_inode);
+       struct super_block *sb = inode->i_sb;
+
+       if (EXT4_HAS_RO_COMPAT_FEATURE(sb,
+                               EXT4_FEATURE_RO_COMPAT_HUGE_FILE)) {
+               /* we are using combined 48 bit field */
+               i_blocks = ((u64)le16_to_cpu(raw_inode->i_blocks_high)) << 32 |
+                                       le32_to_cpu(raw_inode->i_blocks_lo);
+               if (ei->i_flags & EXT4_HUGE_FILE_FL) {
+                       /* i_blocks represent file system block size */
+                       return i_blocks  << (inode->i_blkbits - 9);
+               } else {
+                       return i_blocks;
+               }
+       } else {
+               return le32_to_cpu(raw_inode->i_blocks_lo);
+       }
+}
 
 void ext4_read_inode(struct inode * inode)
 {
@@ -2687,7 +2709,6 @@ void ext4_read_inode(struct inode * inode)
                inode->i_gid |= le16_to_cpu(raw_inode->i_gid_high) << 16;
        }
        inode->i_nlink = le16_to_cpu(raw_inode->i_links_count);
-       inode->i_size = le32_to_cpu(raw_inode->i_size);
 
        ei->i_state = 0;
        ei->i_dir_start_lookup = 0;
@@ -2709,19 +2730,15 @@ void ext4_read_inode(struct inode * inode)
                 * recovery code: that's fine, we're about to complete
                 * the process of deleting those. */
        }
-       inode->i_blocks = le32_to_cpu(raw_inode->i_blocks);
        ei->i_flags = le32_to_cpu(raw_inode->i_flags);
-       ei->i_file_acl = le32_to_cpu(raw_inode->i_file_acl);
+       inode->i_blocks = ext4_inode_blocks(raw_inode, ei);
+       ei->i_file_acl = le32_to_cpu(raw_inode->i_file_acl_lo);
        if (EXT4_SB(inode->i_sb)->s_es->s_creator_os !=
-           cpu_to_le32(EXT4_OS_HURD))
+           cpu_to_le32(EXT4_OS_HURD)) {
                ei->i_file_acl |=
                        ((__u64)le16_to_cpu(raw_inode->i_file_acl_high)) << 32;
-       if (!S_ISREG(inode->i_mode)) {
-               ei->i_dir_acl = le32_to_cpu(raw_inode->i_dir_acl);
-       } else {
-               inode->i_size |=
-                       ((__u64)le32_to_cpu(raw_inode->i_size_high)) << 32;
        }
+       inode->i_size = ext4_isize(raw_inode);
        ei->i_disksize = inode->i_size;
        inode->i_generation = le32_to_cpu(raw_inode->i_generation);
        ei->i_block_group = iloc.block_group;
@@ -2765,6 +2782,13 @@ void ext4_read_inode(struct inode * inode)
        EXT4_INODE_GET_XTIME(i_atime, inode, raw_inode);
        EXT4_EINODE_GET_XTIME(i_crtime, ei, raw_inode);
 
+       inode->i_version = le32_to_cpu(raw_inode->i_disk_version);
+       if (EXT4_INODE_SIZE(inode->i_sb) > EXT4_GOOD_OLD_INODE_SIZE) {
+               if (EXT4_FITS_IN_INODE(raw_inode, ei, i_version_hi))
+                       inode->i_version |=
+                       (__u64)(le32_to_cpu(raw_inode->i_version_hi)) << 32;
+       }
+
        if (S_ISREG(inode->i_mode)) {
                inode->i_op = &ext4_file_inode_operations;
                inode->i_fop = &ext4_file_operations;
@@ -2797,6 +2821,55 @@ bad_inode:
        return;
 }
 
+static int ext4_inode_blocks_set(handle_t *handle,
+                               struct ext4_inode *raw_inode,
+                               struct ext4_inode_info *ei)
+{
+       struct inode *inode = &(ei->vfs_inode);
+       u64 i_blocks = inode->i_blocks;
+       struct super_block *sb = inode->i_sb;
+       int err = 0;
+
+       if (i_blocks <= ~0U) {
+               /*
+                * i_blocks can be represnted in a 32 bit variable
+                * as multiple of 512 bytes
+                */
+               raw_inode->i_blocks_lo   = cpu_to_le32(i_blocks);
+               raw_inode->i_blocks_high = 0;
+               ei->i_flags &= ~EXT4_HUGE_FILE_FL;
+       } else if (i_blocks <= 0xffffffffffffULL) {
+               /*
+                * i_blocks can be represented in a 48 bit variable
+                * as multiple of 512 bytes
+                */
+               err = ext4_update_rocompat_feature(handle, sb,
+                                           EXT4_FEATURE_RO_COMPAT_HUGE_FILE);
+               if (err)
+                       goto  err_out;
+               /* i_block is stored in the split  48 bit fields */
+               raw_inode->i_blocks_lo   = cpu_to_le32(i_blocks);
+               raw_inode->i_blocks_high = cpu_to_le16(i_blocks >> 32);
+               ei->i_flags &= ~EXT4_HUGE_FILE_FL;
+       } else {
+               /*
+                * i_blocks should be represented in a 48 bit variable
+                * as multiple of  file system block size
+                */
+               err = ext4_update_rocompat_feature(handle, sb,
+                                           EXT4_FEATURE_RO_COMPAT_HUGE_FILE);
+               if (err)
+                       goto  err_out;
+               ei->i_flags |= EXT4_HUGE_FILE_FL;
+               /* i_block is stored in file system block size */
+               i_blocks = i_blocks >> (inode->i_blkbits - 9);
+               raw_inode->i_blocks_lo   = cpu_to_le32(i_blocks);
+               raw_inode->i_blocks_high = cpu_to_le16(i_blocks >> 32);
+       }
+err_out:
+       return err;
+}
+
 /*
  * Post the struct inode info into an on-disk inode location in the
  * buffer-cache.  This gobbles the caller's reference to the
@@ -2845,47 +2918,42 @@ static int ext4_do_update_inode(handle_t *handle,
                raw_inode->i_gid_high = 0;
        }
        raw_inode->i_links_count = cpu_to_le16(inode->i_nlink);
-       raw_inode->i_size = cpu_to_le32(ei->i_disksize);
 
        EXT4_INODE_SET_XTIME(i_ctime, inode, raw_inode);
        EXT4_INODE_SET_XTIME(i_mtime, inode, raw_inode);
        EXT4_INODE_SET_XTIME(i_atime, inode, raw_inode);
        EXT4_EINODE_SET_XTIME(i_crtime, ei, raw_inode);
 
-       raw_inode->i_blocks = cpu_to_le32(inode->i_blocks);
+       if (ext4_inode_blocks_set(handle, raw_inode, ei))
+               goto out_brelse;
        raw_inode->i_dtime = cpu_to_le32(ei->i_dtime);
        raw_inode->i_flags = cpu_to_le32(ei->i_flags);
        if (EXT4_SB(inode->i_sb)->s_es->s_creator_os !=
            cpu_to_le32(EXT4_OS_HURD))
                raw_inode->i_file_acl_high =
                        cpu_to_le16(ei->i_file_acl >> 32);
-       raw_inode->i_file_acl = cpu_to_le32(ei->i_file_acl);
-       if (!S_ISREG(inode->i_mode)) {
-               raw_inode->i_dir_acl = cpu_to_le32(ei->i_dir_acl);
-       } else {
-               raw_inode->i_size_high =
-                       cpu_to_le32(ei->i_disksize >> 32);
-               if (ei->i_disksize > 0x7fffffffULL) {
-                       struct super_block *sb = inode->i_sb;
-                       if (!EXT4_HAS_RO_COMPAT_FEATURE(sb,
-                                       EXT4_FEATURE_RO_COMPAT_LARGE_FILE) ||
-                           EXT4_SB(sb)->s_es->s_rev_level ==
-                                       cpu_to_le32(EXT4_GOOD_OLD_REV)) {
-                              /* If this is the first large file
-                               * created, add a flag to the superblock.
-                               */
-                               err = ext4_journal_get_write_access(handle,
-                                               EXT4_SB(sb)->s_sbh);
-                               if (err)
-                                       goto out_brelse;
-                               ext4_update_dynamic_rev(sb);
-                               EXT4_SET_RO_COMPAT_FEATURE(sb,
+       raw_inode->i_file_acl_lo = cpu_to_le32(ei->i_file_acl);
+       ext4_isize_set(raw_inode, ei->i_disksize);
+       if (ei->i_disksize > 0x7fffffffULL) {
+               struct super_block *sb = inode->i_sb;
+               if (!EXT4_HAS_RO_COMPAT_FEATURE(sb,
+                               EXT4_FEATURE_RO_COMPAT_LARGE_FILE) ||
+                               EXT4_SB(sb)->s_es->s_rev_level ==
+                               cpu_to_le32(EXT4_GOOD_OLD_REV)) {
+                       /* If this is the first large file
+                        * created, add a flag to the superblock.
+                        */
+                       err = ext4_journal_get_write_access(handle,
+                                       EXT4_SB(sb)->s_sbh);
+                       if (err)
+                               goto out_brelse;
+                       ext4_update_dynamic_rev(sb);
+                       EXT4_SET_RO_COMPAT_FEATURE(sb,
                                        EXT4_FEATURE_RO_COMPAT_LARGE_FILE);
-                               sb->s_dirt = 1;
-                               handle->h_sync = 1;
-                               err = ext4_journal_dirty_metadata(handle,
-                                               EXT4_SB(sb)->s_sbh);
-                       }
+                       sb->s_dirt = 1;
+                       handle->h_sync = 1;
+                       err = ext4_journal_dirty_metadata(handle,
+                                       EXT4_SB(sb)->s_sbh);
                }
        }
        raw_inode->i_generation = cpu_to_le32(inode->i_generation);
@@ -2903,8 +2971,14 @@ static int ext4_do_update_inode(handle_t *handle,
        } else for (block = 0; block < EXT4_N_BLOCKS; block++)
                raw_inode->i_block[block] = ei->i_data[block];
 
-       if (ei->i_extra_isize)
+       raw_inode->i_disk_version = cpu_to_le32(inode->i_version);
+       if (ei->i_extra_isize) {
+               if (EXT4_FITS_IN_INODE(raw_inode, ei, i_version_hi))
+                       raw_inode->i_version_hi =
+                       cpu_to_le32(inode->i_version >> 32);
                raw_inode->i_extra_isize = cpu_to_le16(ei->i_extra_isize);
+       }
+
 
        BUFFER_TRACE(bh, "call ext4_journal_dirty_metadata");
        rc = ext4_journal_dirty_metadata(handle, bh);
@@ -3024,6 +3098,17 @@ int ext4_setattr(struct dentry *dentry, struct iattr *attr)
                ext4_journal_stop(handle);
        }
 
+       if (attr->ia_valid & ATTR_SIZE) {
+               if (!(EXT4_I(inode)->i_flags & EXT4_EXTENTS_FL)) {
+                       struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
+
+                       if (attr->ia_size > sbi->s_bitmap_maxbytes) {
+                               error = -EFBIG;
+                               goto err_out;
+                       }
+               }
+       }
+
        if (S_ISREG(inode->i_mode) &&
            attr->ia_valid & ATTR_SIZE && attr->ia_size < inode->i_size) {
                handle_t *handle;
@@ -3120,6 +3205,9 @@ int ext4_mark_iloc_dirty(handle_t *handle,
 {
        int err = 0;
 
+       if (test_opt(inode->i_sb, I_VERSION))
+               inode_inc_iversion(inode);
+
        /* the do_update_inode consumes one bh->b_count */
        get_bh(iloc->bh);
 
@@ -3158,8 +3246,10 @@ ext4_reserve_inode_write(handle_t *handle, struct inode *inode,
  * Expand an inode by new_extra_isize bytes.
  * Returns 0 on success or negative error number on failure.
  */
-int ext4_expand_extra_isize(struct inode *inode, unsigned int new_extra_isize,
-                       struct ext4_iloc iloc, handle_t *handle)
+static int ext4_expand_extra_isize(struct inode *inode,
+                                  unsigned int new_extra_isize,
+                                  struct ext4_iloc iloc,
+                                  handle_t *handle)
 {
        struct ext4_inode *raw_inode;
        struct ext4_xattr_ibody_header *header;
index e7f894bdb4202359974088ba5c1f14585a43fc03..2ed7c37f897e79f0b9c6d6af3a5c2aad79fa7f59 100644 (file)
@@ -199,7 +199,7 @@ flags_err:
                 * need to allocate reservation structure for this inode
                 * before set the window size
                 */
-               mutex_lock(&ei->truncate_mutex);
+               down_write(&ei->i_data_sem);
                if (!ei->i_block_alloc_info)
                        ext4_init_block_alloc_info(inode);
 
@@ -207,7 +207,7 @@ flags_err:
                        struct ext4_reserve_window_node *rsv = &ei->i_block_alloc_info->rsv_window_node;
                        rsv->rsv_goal_size = rsv_window_size;
                }
-               mutex_unlock(&ei->truncate_mutex);
+               up_write(&ei->i_data_sem);
                return 0;
        }
        case EXT4_IOC_GROUP_EXTEND: {
@@ -254,6 +254,9 @@ flags_err:
                return err;
        }
 
+       case EXT4_IOC_MIGRATE:
+               return ext4_ext_migrate(inode, filp, cmd, arg);
+
        default:
                return -ENOTTY;
        }
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
new file mode 100644 (file)
index 0000000..76e5fed
--- /dev/null
@@ -0,0 +1,4552 @@
+/*
+ * Copyright (c) 2003-2006, Cluster File Systems, Inc, info@clusterfs.com
+ * Written by Alex Tomas <alex@clusterfs.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public Licens
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-
+ */
+
+
+/*
+ * mballoc.c contains the multiblocks allocation routines
+ */
+
+#include <linux/time.h>
+#include <linux/fs.h>
+#include <linux/namei.h>
+#include <linux/ext4_jbd2.h>
+#include <linux/ext4_fs.h>
+#include <linux/quotaops.h>
+#include <linux/buffer_head.h>
+#include <linux/module.h>
+#include <linux/swap.h>
+#include <linux/proc_fs.h>
+#include <linux/pagemap.h>
+#include <linux/seq_file.h>
+#include <linux/version.h>
+#include "group.h"
+
+/*
+ * MUSTDO:
+ *   - test ext4_ext_search_left() and ext4_ext_search_right()
+ *   - search for metadata in few groups
+ *
+ * TODO v4:
+ *   - normalization should take into account whether file is still open
+ *   - discard preallocations if no free space left (policy?)
+ *   - don't normalize tails
+ *   - quota
+ *   - reservation for superuser
+ *
+ * TODO v3:
+ *   - bitmap read-ahead (proposed by Oleg Drokin aka green)
+ *   - track min/max extents in each group for better group selection
+ *   - mb_mark_used() may allocate chunk right after splitting buddy
+ *   - tree of groups sorted by number of free blocks
+ *   - error handling
+ */
+
+/*
+ * The allocation request involve request for multiple number of blocks
+ * near to the goal(block) value specified.
+ *
+ * During initialization phase of the allocator we decide to use the group
+ * preallocation or inode preallocation depending on the size file. The
+ * size of the file could be the resulting file size we would have after
+ * allocation or the current file size which ever is larger. If the size is
+ * less that sbi->s_mb_stream_request we select the group
+ * preallocation. The default value of s_mb_stream_request is 16
+ * blocks. This can also be tuned via
+ * /proc/fs/ext4/<partition>/stream_req. The value is represented in terms
+ * of number of blocks.
+ *
+ * The main motivation for having small file use group preallocation is to
+ * ensure that we have small file closer in the disk.
+ *
+ * First stage the allocator looks at the inode prealloc list
+ * ext4_inode_info->i_prealloc_list contain list of prealloc spaces for
+ * this particular inode. The inode prealloc space is represented as:
+ *
+ * pa_lstart -> the logical start block for this prealloc space
+ * pa_pstart -> the physical start block for this prealloc space
+ * pa_len    -> lenght for this prealloc space
+ * pa_free   ->  free space available in this prealloc space
+ *
+ * The inode preallocation space is used looking at the _logical_ start
+ * block. If only the logical file block falls within the range of prealloc
+ * space we will consume the particular prealloc space. This make sure that
+ * that the we have contiguous physical blocks representing the file blocks
+ *
+ * The important thing to be noted in case of inode prealloc space is that
+ * we don't modify the values associated to inode prealloc space except
+ * pa_free.
+ *
+ * If we are not able to find blocks in the inode prealloc space and if we
+ * have the group allocation flag set then we look at the locality group
+ * prealloc space. These are per CPU prealloc list repreasented as
+ *
+ * ext4_sb_info.s_locality_groups[smp_processor_id()]
+ *
+ * The reason for having a per cpu locality group is to reduce the contention
+ * between CPUs. It is possible to get scheduled at this point.
+ *
+ * The locality group prealloc space is used looking at whether we have
+ * enough free space (pa_free) withing the prealloc space.
+ *
+ * If we can't allocate blocks via inode prealloc or/and locality group
+ * prealloc then we look at the buddy cache. The buddy cache is represented
+ * by ext4_sb_info.s_buddy_cache (struct inode) whose file offset gets
+ * mapped to the buddy and bitmap information regarding different
+ * groups. The buddy information is attached to buddy cache inode so that
+ * we can access them through the page cache. The information regarding
+ * each group is loaded via ext4_mb_load_buddy.  The information involve
+ * block bitmap and buddy information. The information are stored in the
+ * inode as:
+ *
+ *  {                        page                        }
+ *  [ group 0 buddy][ group 0 bitmap] [group 1][ group 1]...
+ *
+ *
+ * one block each for bitmap and buddy information.  So for each group we
+ * take up 2 blocks. A page can contain blocks_per_page (PAGE_CACHE_SIZE /
+ * blocksize) blocks.  So it can have information regarding groups_per_page
+ * which is blocks_per_page/2
+ *
+ * The buddy cache inode is not stored on disk. The inode is thrown
+ * away when the filesystem is unmounted.
+ *
+ * We look for count number of blocks in the buddy cache. If we were able
+ * to locate that many free blocks we return with additional information
+ * regarding rest of the contiguous physical block available
+ *
+ * Before allocating blocks via buddy cache we normalize the request
+ * blocks. This ensure we ask for more blocks that we needed. The extra
+ * blocks that we get after allocation is added to the respective prealloc
+ * list. In case of inode preallocation we follow a list of heuristics
+ * based on file size. This can be found in ext4_mb_normalize_request. If
+ * we are doing a group prealloc we try to normalize the request to
+ * sbi->s_mb_group_prealloc. Default value of s_mb_group_prealloc is set to
+ * 512 blocks. This can be tuned via
+ * /proc/fs/ext4/<partition/group_prealloc. The value is represented in
+ * terms of number of blocks. If we have mounted the file system with -O
+ * stripe=<value> option the group prealloc request is normalized to the
+ * stripe value (sbi->s_stripe)
+ *
+ * The regular allocator(using the buddy cache) support few tunables.
+ *
+ * /proc/fs/ext4/<partition>/min_to_scan
+ * /proc/fs/ext4/<partition>/max_to_scan
+ * /proc/fs/ext4/<partition>/order2_req
+ *
+ * The regular allocator use buddy scan only if the request len is power of
+ * 2 blocks and the order of allocation is >= sbi->s_mb_order2_reqs. The
+ * value of s_mb_order2_reqs can be tuned via
+ * /proc/fs/ext4/<partition>/order2_req.  If the request len is equal to
+ * stripe size (sbi->s_stripe), we try to search for contigous block in
+ * stripe size. This should result in better allocation on RAID setup. If
+ * not we search in the specific group using bitmap for best extents. The
+ * tunable min_to_scan and max_to_scan controll the behaviour here.
+ * min_to_scan indicate how long the mballoc __must__ look for a best
+ * extent and max_to_scanindicate how long the mballoc __can__ look for a
+ * best extent in the found extents. Searching for the blocks starts with
+ * the group specified as the goal value in allocation context via
+ * ac_g_ex. Each group is first checked based on the criteria whether it
+ * can used for allocation. ext4_mb_good_group explains how the groups are
+ * checked.
+ *
+ * Both the prealloc space are getting populated as above. So for the first
+ * request we will hit the buddy cache which will result in this prealloc
+ * space getting filled. The prealloc space is then later used for the
+ * subsequent request.
+ */
+
+/*
+ * mballoc operates on the following data:
+ *  - on-disk bitmap
+ *  - in-core buddy (actually includes buddy and bitmap)
+ *  - preallocation descriptors (PAs)
+ *
+ * there are two types of preallocations:
+ *  - inode
+ *    assiged to specific inode and can be used for this inode only.
+ *    it describes part of inode's space preallocated to specific
+ *    physical blocks. any block from that preallocated can be used
+ *    independent. the descriptor just tracks number of blocks left
+ *    unused. so, before taking some block from descriptor, one must
+ *    make sure corresponded logical block isn't allocated yet. this
+ *    also means that freeing any block within descriptor's range
+ *    must discard all preallocated blocks.
+ *  - locality group
+ *    assigned to specific locality group which does not translate to
+ *    permanent set of inodes: inode can join and leave group. space
+ *    from this type of preallocation can be used for any inode. thus
+ *    it's consumed from the beginning to the end.
+ *
+ * relation between them can be expressed as:
+ *    in-core buddy = on-disk bitmap + preallocation descriptors
+ *
+ * this mean blocks mballoc considers used are:
+ *  - allocated blocks (persistent)
+ *  - preallocated blocks (non-persistent)
+ *
+ * consistency in mballoc world means that at any time a block is either
+ * free or used in ALL structures. notice: "any time" should not be read
+ * literally -- time is discrete and delimited by locks.
+ *
+ *  to keep it simple, we don't use block numbers, instead we count number of
+ *  blocks: how many blocks marked used/free in on-disk bitmap, buddy and PA.
+ *
+ * all operations can be expressed as:
+ *  - init buddy:                      buddy = on-disk + PAs
+ *  - new PA:                          buddy += N; PA = N
+ *  - use inode PA:                    on-disk += N; PA -= N
+ *  - discard inode PA                 buddy -= on-disk - PA; PA = 0
+ *  - use locality group PA            on-disk += N; PA -= N
+ *  - discard locality group PA                buddy -= PA; PA = 0
+ *  note: 'buddy -= on-disk - PA' is used to show that on-disk bitmap
+ *        is used in real operation because we can't know actual used
+ *        bits from PA, only from on-disk bitmap
+ *
+ * if we follow this strict logic, then all operations above should be atomic.
+ * given some of them can block, we'd have to use something like semaphores
+ * killing performance on high-end SMP hardware. let's try to relax it using
+ * the following knowledge:
+ *  1) if buddy is referenced, it's already initialized
+ *  2) while block is used in buddy and the buddy is referenced,
+ *     nobody can re-allocate that block
+ *  3) we work on bitmaps and '+' actually means 'set bits'. if on-disk has
+ *     bit set and PA claims same block, it's OK. IOW, one can set bit in
+ *     on-disk bitmap if buddy has same bit set or/and PA covers corresponded
+ *     block
+ *
+ * so, now we're building a concurrency table:
+ *  - init buddy vs.
+ *    - new PA
+ *      blocks for PA are allocated in the buddy, buddy must be referenced
+ *      until PA is linked to allocation group to avoid concurrent buddy init
+ *    - use inode PA
+ *      we need to make sure that either on-disk bitmap or PA has uptodate data
+ *      given (3) we care that PA-=N operation doesn't interfere with init
+ *    - discard inode PA
+ *      the simplest way would be to have buddy initialized by the discard
+ *    - use locality group PA
+ *      again PA-=N must be serialized with init
+ *    - discard locality group PA
+ *      the simplest way would be to have buddy initialized by the discard
+ *  - new PA vs.
+ *    - use inode PA
+ *      i_data_sem serializes them
+ *    - discard inode PA
+ *      discard process must wait until PA isn't used by another process
+ *    - use locality group PA
+ *      some mutex should serialize them
+ *    - discard locality group PA
+ *      discard process must wait until PA isn't used by another process
+ *  - use inode PA
+ *    - use inode PA
+ *      i_data_sem or another mutex should serializes them
+ *    - discard inode PA
+ *      discard process must wait until PA isn't used by another process
+ *    - use locality group PA
+ *      nothing wrong here -- they're different PAs covering different blocks
+ *    - discard locality group PA
+ *      discard process must wait until PA isn't used by another process
+ *
+ * now we're ready to make few consequences:
+ *  - PA is referenced and while it is no discard is possible
+ *  - PA is referenced until block isn't marked in on-disk bitmap
+ *  - PA changes only after on-disk bitmap
+ *  - discard must not compete with init. either init is done before
+ *    any discard or they're serialized somehow
+ *  - buddy init as sum of on-disk bitmap and PAs is done atomically
+ *
+ * a special case when we've used PA to emptiness. no need to modify buddy
+ * in this case, but we should care about concurrent init
+ *
+ */
+
+ /*
+ * Logic in few words:
+ *
+ *  - allocation:
+ *    load group
+ *    find blocks
+ *    mark bits in on-disk bitmap
+ *    release group
+ *
+ *  - use preallocation:
+ *    find proper PA (per-inode or group)
+ *    load group
+ *    mark bits in on-disk bitmap
+ *    release group
+ *    release PA
+ *
+ *  - free:
+ *    load group
+ *    mark bits in on-disk bitmap
+ *    release group
+ *
+ *  - discard preallocations in group:
+ *    mark PAs deleted
+ *    move them onto local list
+ *    load on-disk bitmap
+ *    load group
+ *    remove PA from object (inode or locality group)
+ *    mark free blocks in-core
+ *
+ *  - discard inode's preallocations:
+ */
+
+/*
+ * Locking rules
+ *
+ * Locks:
+ *  - bitlock on a group       (group)
+ *  - object (inode/locality)  (object)
+ *  - per-pa lock              (pa)
+ *
+ * Paths:
+ *  - new pa
+ *    object
+ *    group
+ *
+ *  - find and use pa:
+ *    pa
+ *
+ *  - release consumed pa:
+ *    pa
+ *    group
+ *    object
+ *
+ *  - generate in-core bitmap:
+ *    group
+ *        pa
+ *
+ *  - discard all for given object (inode, locality group):
+ *    object
+ *        pa
+ *    group
+ *
+ *  - discard all for given group:
+ *    group
+ *        pa
+ *    group
+ *        object
+ *
+ */
+
+/*
+ * with AGGRESSIVE_CHECK allocator runs consistency checks over
+ * structures. these checks slow things down a lot
+ */
+#define AGGRESSIVE_CHECK__
+
+/*
+ * with DOUBLE_CHECK defined mballoc creates persistent in-core
+ * bitmaps, maintains and uses them to check for double allocations
+ */
+#define DOUBLE_CHECK__
+
+/*
+ */
+#define MB_DEBUG__
+#ifdef MB_DEBUG
+#define mb_debug(fmt, a...)    printk(fmt, ##a)
+#else
+#define mb_debug(fmt, a...)
+#endif
+
+/*
+ * with EXT4_MB_HISTORY mballoc stores last N allocations in memory
+ * and you can monitor it in /proc/fs/ext4/<dev>/mb_history
+ */
+#define EXT4_MB_HISTORY
+#define EXT4_MB_HISTORY_ALLOC          1       /* allocation */
+#define EXT4_MB_HISTORY_PREALLOC       2       /* preallocated blocks used */
+#define EXT4_MB_HISTORY_DISCARD                4       /* preallocation discarded */
+#define EXT4_MB_HISTORY_FREE           8       /* free */
+
+#define EXT4_MB_HISTORY_DEFAULT                (EXT4_MB_HISTORY_ALLOC | \
+                                        EXT4_MB_HISTORY_PREALLOC)
+
+/*
+ * How long mballoc can look for a best extent (in found extents)
+ */
+#define MB_DEFAULT_MAX_TO_SCAN         200
+
+/*
+ * How long mballoc must look for a best extent
+ */
+#define MB_DEFAULT_MIN_TO_SCAN         10
+
+/*
+ * How many groups mballoc will scan looking for the best chunk
+ */
+#define MB_DEFAULT_MAX_GROUPS_TO_SCAN  5
+
+/*
+ * with 'ext4_mb_stats' allocator will collect stats that will be
+ * shown at umount. The collecting costs though!
+ */
+#define MB_DEFAULT_STATS               1
+
+/*
+ * files smaller than MB_DEFAULT_STREAM_THRESHOLD are served
+ * by the stream allocator, which purpose is to pack requests
+ * as close each to other as possible to produce smooth I/O traffic
+ * We use locality group prealloc space for stream request.
+ * We can tune the same via /proc/fs/ext4/<parition>/stream_req
+ */
+#define MB_DEFAULT_STREAM_THRESHOLD    16      /* 64K */
+
+/*
+ * for which requests use 2^N search using buddies
+ */
+#define MB_DEFAULT_ORDER2_REQS         2
+
+/*
+ * default group prealloc size 512 blocks
+ */
+#define MB_DEFAULT_GROUP_PREALLOC      512
+
+static struct kmem_cache *ext4_pspace_cachep;
+
+#ifdef EXT4_BB_MAX_BLOCKS
+#undef EXT4_BB_MAX_BLOCKS
+#endif
+#define EXT4_BB_MAX_BLOCKS     30
+
+struct ext4_free_metadata {
+       ext4_group_t group;
+       unsigned short num;
+       ext4_grpblk_t  blocks[EXT4_BB_MAX_BLOCKS];
+       struct list_head list;
+};
+
+struct ext4_group_info {
+       unsigned long   bb_state;
+       unsigned long   bb_tid;
+       struct ext4_free_metadata *bb_md_cur;
+       unsigned short  bb_first_free;
+       unsigned short  bb_free;
+       unsigned short  bb_fragments;
+       struct          list_head bb_prealloc_list;
+#ifdef DOUBLE_CHECK
+       void            *bb_bitmap;
+#endif
+       unsigned short  bb_counters[];
+};
+
+#define EXT4_GROUP_INFO_NEED_INIT_BIT  0
+#define EXT4_GROUP_INFO_LOCKED_BIT     1
+
+#define EXT4_MB_GRP_NEED_INIT(grp)     \
+       (test_bit(EXT4_GROUP_INFO_NEED_INIT_BIT, &((grp)->bb_state)))
+
+
+struct ext4_prealloc_space {
+       struct list_head        pa_inode_list;
+       struct list_head        pa_group_list;
+       union {
+               struct list_head pa_tmp_list;
+               struct rcu_head pa_rcu;
+       } u;
+       spinlock_t              pa_lock;
+       atomic_t                pa_count;
+       unsigned                pa_deleted;
+       ext4_fsblk_t            pa_pstart;      /* phys. block */
+       ext4_lblk_t             pa_lstart;      /* log. block */
+       unsigned short          pa_len;         /* len of preallocated chunk */
+       unsigned short          pa_free;        /* how many blocks are free */
+       unsigned short          pa_linear;      /* consumed in one direction
+                                                * strictly, for grp prealloc */
+       spinlock_t              *pa_obj_lock;
+       struct inode            *pa_inode;      /* hack, for history only */
+};
+
+
+struct ext4_free_extent {
+       ext4_lblk_t fe_logical;
+       ext4_grpblk_t fe_start;
+       ext4_group_t fe_group;
+       int fe_len;
+};
+
+/*
+ * Locality group:
+ *   we try to group all related changes together
+ *   so that writeback can flush/allocate them together as well
+ */
+struct ext4_locality_group {
+       /* for allocator */
+       struct mutex            lg_mutex;       /* to serialize allocates */
+       struct list_head        lg_prealloc_list;/* list of preallocations */
+       spinlock_t              lg_prealloc_lock;
+};
+
+struct ext4_allocation_context {
+       struct inode *ac_inode;
+       struct super_block *ac_sb;
+
+       /* original request */
+       struct ext4_free_extent ac_o_ex;
+
+       /* goal request (after normalization) */
+       struct ext4_free_extent ac_g_ex;
+
+       /* the best found extent */
+       struct ext4_free_extent ac_b_ex;
+
+       /* copy of the bext found extent taken before preallocation efforts */
+       struct ext4_free_extent ac_f_ex;
+
+       /* number of iterations done. we have to track to limit searching */
+       unsigned long ac_ex_scanned;
+       __u16 ac_groups_scanned;
+       __u16 ac_found;
+       __u16 ac_tail;
+       __u16 ac_buddy;
+       __u16 ac_flags;         /* allocation hints */
+       __u8 ac_status;
+       __u8 ac_criteria;
+       __u8 ac_repeats;
+       __u8 ac_2order;         /* if request is to allocate 2^N blocks and
+                                * N > 0, the field stores N, otherwise 0 */
+       __u8 ac_op;             /* operation, for history only */
+       struct page *ac_bitmap_page;
+       struct page *ac_buddy_page;
+       struct ext4_prealloc_space *ac_pa;
+       struct ext4_locality_group *ac_lg;
+};
+
+#define AC_STATUS_CONTINUE     1
+#define AC_STATUS_FOUND                2
+#define AC_STATUS_BREAK                3
+
+struct ext4_mb_history {
+       struct ext4_free_extent orig;   /* orig allocation */
+       struct ext4_free_extent goal;   /* goal allocation */
+       struct ext4_free_extent result; /* result allocation */
+       unsigned pid;
+       unsigned ino;
+       __u16 found;    /* how many extents have been found */
+       __u16 groups;   /* how many groups have been scanned */
+       __u16 tail;     /* what tail broke some buddy */
+       __u16 buddy;    /* buddy the tail ^^^ broke */
+       __u16 flags;
+       __u8 cr:3;      /* which phase the result extent was found at */
+       __u8 op:4;
+       __u8 merged:1;
+};
+
+struct ext4_buddy {
+       struct page *bd_buddy_page;
+       void *bd_buddy;
+       struct page *bd_bitmap_page;
+       void *bd_bitmap;
+       struct ext4_group_info *bd_info;
+       struct super_block *bd_sb;
+       __u16 bd_blkbits;
+       ext4_group_t bd_group;
+};
+#define EXT4_MB_BITMAP(e4b)    ((e4b)->bd_bitmap)
+#define EXT4_MB_BUDDY(e4b)     ((e4b)->bd_buddy)
+
+#ifndef EXT4_MB_HISTORY
+static inline void ext4_mb_store_history(struct ext4_allocation_context *ac)
+{
+       return;
+}
+#else
+static void ext4_mb_store_history(struct ext4_allocation_context *ac);
+#endif
+
+#define in_range(b, first, len)        ((b) >= (first) && (b) <= (first) + (len) - 1)
+
+static struct proc_dir_entry *proc_root_ext4;
+struct buffer_head *read_block_bitmap(struct super_block *, ext4_group_t);
+ext4_fsblk_t ext4_new_blocks_old(handle_t *handle, struct inode *inode,
+                       ext4_fsblk_t goal, unsigned long *count, int *errp);
+
+static void ext4_mb_generate_from_pa(struct super_block *sb, void *bitmap,
+                                       ext4_group_t group);
+static void ext4_mb_poll_new_transaction(struct super_block *, handle_t *);
+static void ext4_mb_free_committed_blocks(struct super_block *);
+static void ext4_mb_return_to_preallocation(struct inode *inode,
+                                       struct ext4_buddy *e4b, sector_t block,
+                                       int count);
+static void ext4_mb_put_pa(struct ext4_allocation_context *,
+                       struct super_block *, struct ext4_prealloc_space *pa);
+static int ext4_mb_init_per_dev_proc(struct super_block *sb);
+static int ext4_mb_destroy_per_dev_proc(struct super_block *sb);
+
+
+static inline void ext4_lock_group(struct super_block *sb, ext4_group_t group)
+{
+       struct ext4_group_info *grinfo = ext4_get_group_info(sb, group);
+
+       bit_spin_lock(EXT4_GROUP_INFO_LOCKED_BIT, &(grinfo->bb_state));
+}
+
+static inline void ext4_unlock_group(struct super_block *sb,
+                                       ext4_group_t group)
+{
+       struct ext4_group_info *grinfo = ext4_get_group_info(sb, group);
+
+       bit_spin_unlock(EXT4_GROUP_INFO_LOCKED_BIT, &(grinfo->bb_state));
+}
+
+static inline int ext4_is_group_locked(struct super_block *sb,
+                                       ext4_group_t group)
+{
+       struct ext4_group_info *grinfo = ext4_get_group_info(sb, group);
+
+       return bit_spin_is_locked(EXT4_GROUP_INFO_LOCKED_BIT,
+                                               &(grinfo->bb_state));
+}
+
+static ext4_fsblk_t ext4_grp_offs_to_block(struct super_block *sb,
+                                       struct ext4_free_extent *fex)
+{
+       ext4_fsblk_t block;
+
+       block = (ext4_fsblk_t) fex->fe_group * EXT4_BLOCKS_PER_GROUP(sb)
+                       + fex->fe_start
+                       + le32_to_cpu(EXT4_SB(sb)->s_es->s_first_data_block);
+       return block;
+}
+
+#if BITS_PER_LONG == 64
+#define mb_correct_addr_and_bit(bit, addr)             \
+{                                                      \
+       bit += ((unsigned long) addr & 7UL) << 3;       \
+       addr = (void *) ((unsigned long) addr & ~7UL);  \
+}
+#elif BITS_PER_LONG == 32
+#define mb_correct_addr_and_bit(bit, addr)             \
+{                                                      \
+       bit += ((unsigned long) addr & 3UL) << 3;       \
+       addr = (void *) ((unsigned long) addr & ~3UL);  \
+}
+#else
+#error "how many bits you are?!"
+#endif
+
+static inline int mb_test_bit(int bit, void *addr)
+{
+       /*
+        * ext4_test_bit on architecture like powerpc
+        * needs unsigned long aligned address
+        */
+       mb_correct_addr_and_bit(bit, addr);
+       return ext4_test_bit(bit, addr);
+}
+
+static inline void mb_set_bit(int bit, void *addr)
+{
+       mb_correct_addr_and_bit(bit, addr);
+       ext4_set_bit(bit, addr);
+}
+
+static inline void mb_set_bit_atomic(spinlock_t *lock, int bit, void *addr)
+{
+       mb_correct_addr_and_bit(bit, addr);
+       ext4_set_bit_atomic(lock, bit, addr);
+}
+
+static inline void mb_clear_bit(int bit, void *addr)
+{
+       mb_correct_addr_and_bit(bit, addr);
+       ext4_clear_bit(bit, addr);
+}
+
+static inline void mb_clear_bit_atomic(spinlock_t *lock, int bit, void *addr)
+{
+       mb_correct_addr_and_bit(bit, addr);
+       ext4_clear_bit_atomic(lock, bit, addr);
+}
+
+static void *mb_find_buddy(struct ext4_buddy *e4b, int order, int *max)
+{
+       char *bb;
+
+       /* FIXME!! is this needed */
+       BUG_ON(EXT4_MB_BITMAP(e4b) == EXT4_MB_BUDDY(e4b));
+       BUG_ON(max == NULL);
+
+       if (order > e4b->bd_blkbits + 1) {
+               *max = 0;
+               return NULL;
+       }
+
+       /* at order 0 we see each particular block */
+       *max = 1 << (e4b->bd_blkbits + 3);
+       if (order == 0)
+               return EXT4_MB_BITMAP(e4b);
+
+       bb = EXT4_MB_BUDDY(e4b) + EXT4_SB(e4b->bd_sb)->s_mb_offsets[order];
+       *max = EXT4_SB(e4b->bd_sb)->s_mb_maxs[order];
+
+       return bb;
+}
+
+#ifdef DOUBLE_CHECK
+static void mb_free_blocks_double(struct inode *inode, struct ext4_buddy *e4b,
+                          int first, int count)
+{
+       int i;
+       struct super_block *sb = e4b->bd_sb;
+
+       if (unlikely(e4b->bd_info->bb_bitmap == NULL))
+               return;
+       BUG_ON(!ext4_is_group_locked(sb, e4b->bd_group));
+       for (i = 0; i < count; i++) {
+               if (!mb_test_bit(first + i, e4b->bd_info->bb_bitmap)) {
+                       ext4_fsblk_t blocknr;
+                       blocknr = e4b->bd_group * EXT4_BLOCKS_PER_GROUP(sb);
+                       blocknr += first + i;
+                       blocknr +=
+                           le32_to_cpu(EXT4_SB(sb)->s_es->s_first_data_block);
+
+                       ext4_error(sb, __FUNCTION__, "double-free of inode"
+                                  " %lu's block %llu(bit %u in group %lu)\n",
+                                  inode ? inode->i_ino : 0, blocknr,
+                                  first + i, e4b->bd_group);
+               }
+               mb_clear_bit(first + i, e4b->bd_info->bb_bitmap);
+       }
+}
+
+static void mb_mark_used_double(struct ext4_buddy *e4b, int first, int count)
+{
+       int i;
+
+       if (unlikely(e4b->bd_info->bb_bitmap == NULL))
+               return;
+       BUG_ON(!ext4_is_group_locked(e4b->bd_sb, e4b->bd_group));
+       for (i = 0; i < count; i++) {
+               BUG_ON(mb_test_bit(first + i, e4b->bd_info->bb_bitmap));
+               mb_set_bit(first + i, e4b->bd_info->bb_bitmap);
+       }
+}
+
+static void mb_cmp_bitmaps(struct ext4_buddy *e4b, void *bitmap)
+{
+       if (memcmp(e4b->bd_info->bb_bitmap, bitmap, e4b->bd_sb->s_blocksize)) {
+               unsigned char *b1, *b2;
+               int i;
+               b1 = (unsigned char *) e4b->bd_info->bb_bitmap;
+               b2 = (unsigned char *) bitmap;
+               for (i = 0; i < e4b->bd_sb->s_blocksize; i++) {
+                       if (b1[i] != b2[i]) {
+                               printk("corruption in group %lu at byte %u(%u):"
+                                      " %x in copy != %x on disk/prealloc\n",
+                                       e4b->bd_group, i, i * 8, b1[i], b2[i]);
+                               BUG();
+                       }
+               }
+       }
+}
+
+#else
+static inline void mb_free_blocks_double(struct inode *inode,
+                               struct ext4_buddy *e4b, int first, int count)
+{
+       return;
+}
+static inline void mb_mark_used_double(struct ext4_buddy *e4b,
+                                               int first, int count)
+{
+       return;
+}
+static inline void mb_cmp_bitmaps(struct ext4_buddy *e4b, void *bitmap)
+{
+       return;
+}
+#endif
+
+#ifdef AGGRESSIVE_CHECK
+
+#define MB_CHECK_ASSERT(assert)                                                \
+do {                                                                   \
+       if (!(assert)) {                                                \
+               printk(KERN_EMERG                                       \
+                       "Assertion failure in %s() at %s:%d: \"%s\"\n", \
+                       function, file, line, # assert);                \
+               BUG();                                                  \
+       }                                                               \
+} while (0)
+
+static int __mb_check_buddy(struct ext4_buddy *e4b, char *file,
+                               const char *function, int line)
+{
+       struct super_block *sb = e4b->bd_sb;
+       int order = e4b->bd_blkbits + 1;
+       int max;
+       int max2;
+       int i;
+       int j;
+       int k;
+       int count;
+       struct ext4_group_info *grp;
+       int fragments = 0;
+       int fstart;
+       struct list_head *cur;
+       void *buddy;
+       void *buddy2;
+
+       if (!test_opt(sb, MBALLOC))
+               return 0;
+
+       {
+               static int mb_check_counter;
+               if (mb_check_counter++ % 100 != 0)
+                       return 0;
+       }
+
+       while (order > 1) {
+               buddy = mb_find_buddy(e4b, order, &max);
+               MB_CHECK_ASSERT(buddy);
+               buddy2 = mb_find_buddy(e4b, order - 1, &max2);
+               MB_CHECK_ASSERT(buddy2);
+               MB_CHECK_ASSERT(buddy != buddy2);
+               MB_CHECK_ASSERT(max * 2 == max2);
+
+               count = 0;
+               for (i = 0; i < max; i++) {
+
+                       if (mb_test_bit(i, buddy)) {
+                               /* only single bit in buddy2 may be 1 */
+                               if (!mb_test_bit(i << 1, buddy2)) {
+                                       MB_CHECK_ASSERT(
+                                               mb_test_bit((i<<1)+1, buddy2));
+                               } else if (!mb_test_bit((i << 1) + 1, buddy2)) {
+                                       MB_CHECK_ASSERT(
+                                               mb_test_bit(i << 1, buddy2));
+                               }
+                               continue;
+                       }
+
+                       /* both bits in buddy2 must be 0 */
+                       MB_CHECK_ASSERT(mb_test_bit(i << 1, buddy2));
+                       MB_CHECK_ASSERT(mb_test_bit((i << 1) + 1, buddy2));
+
+                       for (j = 0; j < (1 << order); j++) {
+                               k = (i * (1 << order)) + j;
+                               MB_CHECK_ASSERT(
+                                       !mb_test_bit(k, EXT4_MB_BITMAP(e4b)));
+                       }
+                       count++;
+               }
+               MB_CHECK_ASSERT(e4b->bd_info->bb_counters[order] == count);
+               order--;
+       }
+
+       fstart = -1;
+       buddy = mb_find_buddy(e4b, 0, &max);
+       for (i = 0; i < max; i++) {
+               if (!mb_test_bit(i, buddy)) {
+                       MB_CHECK_ASSERT(i >= e4b->bd_info->bb_first_free);
+                       if (fstart == -1) {
+                               fragments++;
+                               fstart = i;
+                       }
+                       continue;
+               }
+               fstart = -1;
+               /* check used bits only */
+               for (j = 0; j < e4b->bd_blkbits + 1; j++) {
+                       buddy2 = mb_find_buddy(e4b, j, &max2);
+                       k = i >> j;
+                       MB_CHECK_ASSERT(k < max2);
+                       MB_CHECK_ASSERT(mb_test_bit(k, buddy2));
+               }
+       }
+       MB_CHECK_ASSERT(!EXT4_MB_GRP_NEED_INIT(e4b->bd_info));
+       MB_CHECK_ASSERT(e4b->bd_info->bb_fragments == fragments);
+
+       grp = ext4_get_group_info(sb, e4b->bd_group);
+       buddy = mb_find_buddy(e4b, 0, &max);
+       list_for_each(cur, &grp->bb_prealloc_list) {
+               ext4_group_t groupnr;
+               struct ext4_prealloc_space *pa;
+               pa = list_entry(cur, struct ext4_prealloc_space, group_list);
+               ext4_get_group_no_and_offset(sb, pa->pstart, &groupnr, &k);
+               MB_CHECK_ASSERT(groupnr == e4b->bd_group);
+               for (i = 0; i < pa->len; i++)
+                       MB_CHECK_ASSERT(mb_test_bit(k + i, buddy));
+       }
+       return 0;
+}
+#undef MB_CHECK_ASSERT
+#define mb_check_buddy(e4b) __mb_check_buddy(e4b,      \
+                                       __FILE__, __FUNCTION__, __LINE__)
+#else
+#define mb_check_buddy(e4b)
+#endif
+
+/* FIXME!! need more doc */
+static void ext4_mb_mark_free_simple(struct super_block *sb,
+                               void *buddy, unsigned first, int len,
+                                       struct ext4_group_info *grp)
+{
+       struct ext4_sb_info *sbi = EXT4_SB(sb);
+       unsigned short min;
+       unsigned short max;
+       unsigned short chunk;
+       unsigned short border;
+
+       BUG_ON(len >= EXT4_BLOCKS_PER_GROUP(sb));
+
+       border = 2 << sb->s_blocksize_bits;
+
+       while (len > 0) {
+               /* find how many blocks can be covered since this position */
+               max = ffs(first | border) - 1;
+
+               /* find how many blocks of power 2 we need to mark */
+               min = fls(len) - 1;
+
+               if (max < min)
+                       min = max;
+               chunk = 1 << min;
+
+               /* mark multiblock chunks only */
+               grp->bb_counters[min]++;
+               if (min > 0)
+                       mb_clear_bit(first >> min,
+                                    buddy + sbi->s_mb_offsets[min]);
+
+               len -= chunk;
+               first += chunk;
+       }
+}
+
+static void ext4_mb_generate_buddy(struct super_block *sb,
+                               void *buddy, void *bitmap, ext4_group_t group)
+{
+       struct ext4_group_info *grp = ext4_get_group_info(sb, group);
+       unsigned short max = EXT4_BLOCKS_PER_GROUP(sb);
+       unsigned short i = 0;
+       unsigned short first;
+       unsigned short len;
+       unsigned free = 0;
+       unsigned fragments = 0;
+       unsigned long long period = get_cycles();
+
+       /* initialize buddy from bitmap which is aggregation
+        * of on-disk bitmap and preallocations */
+       i = ext4_find_next_zero_bit(bitmap, max, 0);
+       grp->bb_first_free = i;
+       while (i < max) {
+               fragments++;
+               first = i;
+               i = ext4_find_next_bit(bitmap, max, i);
+               len = i - first;
+               free += len;
+               if (len > 1)
+                       ext4_mb_mark_free_simple(sb, buddy, first, len, grp);
+               else
+                       grp->bb_counters[0]++;
+               if (i < max)
+                       i = ext4_find_next_zero_bit(bitmap, max, i);
+       }
+       grp->bb_fragments = fragments;
+
+       if (free != grp->bb_free) {
+               printk(KERN_DEBUG
+                       "EXT4-fs: group %lu: %u blocks in bitmap, %u in gd\n",
+                       group, free, grp->bb_free);
+               grp->bb_free = free;
+       }
+
+       clear_bit(EXT4_GROUP_INFO_NEED_INIT_BIT, &(grp->bb_state));
+
+       period = get_cycles() - period;
+       spin_lock(&EXT4_SB(sb)->s_bal_lock);
+       EXT4_SB(sb)->s_mb_buddies_generated++;
+       EXT4_SB(sb)->s_mb_generation_time += period;
+       spin_unlock(&EXT4_SB(sb)->s_bal_lock);
+}
+
+/* The buddy information is attached the buddy cache inode
+ * for convenience. The information regarding each group
+ * is loaded via ext4_mb_load_buddy. The information involve
+ * block bitmap and buddy information. The information are
+ * stored in the inode as
+ *
+ * {                        page                        }
+ * [ group 0 buddy][ group 0 bitmap] [group 1][ group 1]...
+ *
+ *
+ * one block each for bitmap and buddy information.
+ * So for each group we take up 2 blocks. A page can
+ * contain blocks_per_page (PAGE_CACHE_SIZE / blocksize)  blocks.
+ * So it can have information regarding groups_per_page which
+ * is blocks_per_page/2
+ */
+
+static int ext4_mb_init_cache(struct page *page, char *incore)
+{
+       int blocksize;
+       int blocks_per_page;
+       int groups_per_page;
+       int err = 0;
+       int i;
+       ext4_group_t first_group;
+       int first_block;
+       struct super_block *sb;
+       struct buffer_head *bhs;
+       struct buffer_head **bh;
+       struct inode *inode;
+       char *data;
+       char *bitmap;
+
+       mb_debug("init page %lu\n", page->index);
+
+       inode = page->mapping->host;
+       sb = inode->i_sb;
+       blocksize = 1 << inode->i_blkbits;
+       blocks_per_page = PAGE_CACHE_SIZE / blocksize;
+
+       groups_per_page = blocks_per_page >> 1;
+       if (groups_per_page == 0)
+               groups_per_page = 1;
+
+       /* allocate buffer_heads to read bitmaps */
+       if (groups_per_page > 1) {
+               err = -ENOMEM;
+               i = sizeof(struct buffer_head *) * groups_per_page;
+               bh = kzalloc(i, GFP_NOFS);
+               if (bh == NULL)
+                       goto out;
+       } else
+               bh = &bhs;
+
+       first_group = page->index * blocks_per_page / 2;
+
+       /* read all groups the page covers into the cache */
+       for (i = 0; i < groups_per_page; i++) {
+               struct ext4_group_desc *desc;
+
+               if (first_group + i >= EXT4_SB(sb)->s_groups_count)
+                       break;
+
+               err = -EIO;
+               desc = ext4_get_group_desc(sb, first_group + i, NULL);
+               if (desc == NULL)
+                       goto out;
+
+               err = -ENOMEM;
+               bh[i] = sb_getblk(sb, ext4_block_bitmap(sb, desc));
+               if (bh[i] == NULL)
+                       goto out;
+
+               if (bh_uptodate_or_lock(bh[i]))
+                       continue;
+
+               if (desc->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT)) {
+                       ext4_init_block_bitmap(sb, bh[i],
+                                               first_group + i, desc);
+                       set_buffer_uptodate(bh[i]);
+                       unlock_buffer(bh[i]);
+                       continue;
+               }
+               get_bh(bh[i]);
+               bh[i]->b_end_io = end_buffer_read_sync;
+               submit_bh(READ, bh[i]);
+               mb_debug("read bitmap for group %lu\n", first_group + i);
+       }
+
+       /* wait for I/O completion */
+       for (i = 0; i < groups_per_page && bh[i]; i++)
+               wait_on_buffer(bh[i]);
+
+       err = -EIO;
+       for (i = 0; i < groups_per_page && bh[i]; i++)
+               if (!buffer_uptodate(bh[i]))
+                       goto out;
+
+       first_block = page->index * blocks_per_page;
+       for (i = 0; i < blocks_per_page; i++) {
+               int group;
+               struct ext4_group_info *grinfo;
+
+               group = (first_block + i) >> 1;
+               if (group >= EXT4_SB(sb)->s_groups_count)
+                       break;
+
+               /*
+                * data carry information regarding this
+                * particular group in the format specified
+                * above
+                *
+                */
+               data = page_address(page) + (i * blocksize);
+               bitmap = bh[group - first_group]->b_data;
+
+               /*
+                * We place the buddy block and bitmap block
+                * close together
+                */
+               if ((first_block + i) & 1) {
+                       /* this is block of buddy */
+                       BUG_ON(incore == NULL);
+                       mb_debug("put buddy for group %u in page %lu/%x\n",
+                               group, page->index, i * blocksize);
+                       memset(data, 0xff, blocksize);
+                       grinfo = ext4_get_group_info(sb, group);
+                       grinfo->bb_fragments = 0;
+                       memset(grinfo->bb_counters, 0,
+                              sizeof(unsigned short)*(sb->s_blocksize_bits+2));
+                       /*
+                        * incore got set to the group block bitmap below
+                        */
+                       ext4_mb_generate_buddy(sb, data, incore, group);
+                       incore = NULL;
+               } else {
+                       /* this is block of bitmap */
+                       BUG_ON(incore != NULL);
+                       mb_debug("put bitmap for group %u in page %lu/%x\n",
+                               group, page->index, i * blocksize);
+
+                       /* see comments in ext4_mb_put_pa() */
+                       ext4_lock_group(sb, group);
+                       memcpy(data, bitmap, blocksize);
+
+                       /* mark all preallocated blks used in in-core bitmap */
+                       ext4_mb_generate_from_pa(sb, data, group);
+                       ext4_unlock_group(sb, group);
+
+                       /* set incore so that the buddy information can be
+                        * generated using this
+                        */
+                       incore = data;
+               }
+       }
+       SetPageUptodate(page);
+
+out:
+       if (bh) {
+               for (i = 0; i < groups_per_page && bh[i]; i++)
+                       brelse(bh[i]);
+               if (bh != &bhs)
+                       kfree(bh);
+       }
+       return err;
+}
+
+static int ext4_mb_load_buddy(struct super_block *sb, ext4_group_t group,
+               struct ext4_buddy *e4b)
+{
+       struct ext4_sb_info *sbi = EXT4_SB(sb);
+       struct inode *inode = sbi->s_buddy_cache;
+       int blocks_per_page;
+       int block;
+       int pnum;
+       int poff;
+       struct page *page;
+
+       mb_debug("load group %lu\n", group);
+
+       blocks_per_page = PAGE_CACHE_SIZE / sb->s_blocksize;
+
+       e4b->bd_blkbits = sb->s_blocksize_bits;
+       e4b->bd_info = ext4_get_group_info(sb, group);
+       e4b->bd_sb = sb;
+       e4b->bd_group = group;
+       e4b->bd_buddy_page = NULL;
+       e4b->bd_bitmap_page = NULL;
+
+       /*
+        * the buddy cache inode stores the block bitmap
+        * and buddy information in consecutive blocks.
+        * So for each group we need two blocks.
+        */
+       block = group * 2;
+       pnum = block / blocks_per_page;
+       poff = block % blocks_per_page;
+
+       /* we could use find_or_create_page(), but it locks page
+        * what we'd like to avoid in fast path ... */
+       page = find_get_page(inode->i_mapping, pnum);
+       if (page == NULL || !PageUptodate(page)) {
+               if (page)
+                       page_cache_release(page);
+               page = find_or_create_page(inode->i_mapping, pnum, GFP_NOFS);
+               if (page) {
+                       BUG_ON(page->mapping != inode->i_mapping);
+                       if (!PageUptodate(page)) {
+                               ext4_mb_init_cache(page, NULL);
+                               mb_cmp_bitmaps(e4b, page_address(page) +
+                                              (poff * sb->s_blocksize));
+                       }
+                       unlock_page(page);
+               }
+       }
+       if (page == NULL || !PageUptodate(page))
+               goto err;
+       e4b->bd_bitmap_page = page;
+       e4b->bd_bitmap = page_address(page) + (poff * sb->s_blocksize);
+       mark_page_accessed(page);
+
+       block++;
+       pnum = block / blocks_per_page;
+       poff = block % blocks_per_page;
+
+       page = find_get_page(inode->i_mapping, pnum);
+       if (page == NULL || !PageUptodate(page)) {
+               if (page)
+                       page_cache_release(page);
+               page = find_or_create_page(inode->i_mapping, pnum, GFP_NOFS);
+               if (page) {
+                       BUG_ON(page->mapping != inode->i_mapping);
+                       if (!PageUptodate(page))
+                               ext4_mb_init_cache(page, e4b->bd_bitmap);
+
+                       unlock_page(page);
+               }
+       }
+       if (page == NULL || !PageUptodate(page))
+               goto err;
+       e4b->bd_buddy_page = page;
+       e4b->bd_buddy = page_address(page) + (poff * sb->s_blocksize);
+       mark_page_accessed(page);
+
+       BUG_ON(e4b->bd_bitmap_page == NULL);
+       BUG_ON(e4b->bd_buddy_page == NULL);
+
+       return 0;
+
+err:
+       if (e4b->bd_bitmap_page)
+               page_cache_release(e4b->bd_bitmap_page);
+       if (e4b->bd_buddy_page)
+               page_cache_release(e4b->bd_buddy_page);
+       e4b->bd_buddy = NULL;
+       e4b->bd_bitmap = NULL;
+       return -EIO;
+}
+
+static void ext4_mb_release_desc(struct ext4_buddy *e4b)
+{
+       if (e4b->bd_bitmap_page)
+               page_cache_release(e4b->bd_bitmap_page);
+       if (e4b->bd_buddy_page)
+               page_cache_release(e4b->bd_buddy_page);
+}
+
+
+static int mb_find_order_for_block(struct ext4_buddy *e4b, int block)
+{
+       int order = 1;
+       void *bb;
+
+       BUG_ON(EXT4_MB_BITMAP(e4b) == EXT4_MB_BUDDY(e4b));
+       BUG_ON(block >= (1 << (e4b->bd_blkbits + 3)));
+
+       bb = EXT4_MB_BUDDY(e4b);
+       while (order <= e4b->bd_blkbits + 1) {
+               block = block >> 1;
+               if (!mb_test_bit(block, bb)) {
+                       /* this block is part of buddy of order 'order' */
+                       return order;
+               }
+               bb += 1 << (e4b->bd_blkbits - order);
+               order++;
+       }
+       return 0;
+}
+
+static void mb_clear_bits(spinlock_t *lock, void *bm, int cur, int len)
+{
+       __u32 *addr;
+
+       len = cur + len;
+       while (cur < len) {
+               if ((cur & 31) == 0 && (len - cur) >= 32) {
+                       /* fast path: clear whole word at once */
+                       addr = bm + (cur >> 3);
+                       *addr = 0;
+                       cur += 32;
+                       continue;
+               }
+               mb_clear_bit_atomic(lock, cur, bm);
+               cur++;
+       }
+}
+
+static void mb_set_bits(spinlock_t *lock, void *bm, int cur, int len)
+{
+       __u32 *addr;
+
+       len = cur + len;
+       while (cur < len) {
+               if ((cur & 31) == 0 && (len - cur) >= 32) {
+                       /* fast path: set whole word at once */
+                       addr = bm + (cur >> 3);
+                       *addr = 0xffffffff;
+                       cur += 32;
+                       continue;
+               }
+               mb_set_bit_atomic(lock, cur, bm);
+               cur++;
+       }
+}
+
+static int mb_free_blocks(struct inode *inode, struct ext4_buddy *e4b,
+                         int first, int count)
+{
+       int block = 0;
+       int max = 0;
+       int order;
+       void *buddy;
+       void *buddy2;
+       struct super_block *sb = e4b->bd_sb;
+
+       BUG_ON(first + count > (sb->s_blocksize << 3));
+       BUG_ON(!ext4_is_group_locked(sb, e4b->bd_group));
+       mb_check_buddy(e4b);
+       mb_free_blocks_double(inode, e4b, first, count);
+
+       e4b->bd_info->bb_free += count;
+       if (first < e4b->bd_info->bb_first_free)
+               e4b->bd_info->bb_first_free = first;
+
+       /* let's maintain fragments counter */
+       if (first != 0)
+               block = !mb_test_bit(first - 1, EXT4_MB_BITMAP(e4b));
+       if (first + count < EXT4_SB(sb)->s_mb_maxs[0])
+               max = !mb_test_bit(first + count, EXT4_MB_BITMAP(e4b));
+       if (block && max)
+               e4b->bd_info->bb_fragments--;
+       else if (!block && !max)
+               e4b->bd_info->bb_fragments++;
+
+       /* let's maintain buddy itself */
+       while (count-- > 0) {
+               block = first++;
+               order = 0;
+
+               if (!mb_test_bit(block, EXT4_MB_BITMAP(e4b))) {
+                       ext4_fsblk_t blocknr;
+                       blocknr = e4b->bd_group * EXT4_BLOCKS_PER_GROUP(sb);
+                       blocknr += block;
+                       blocknr +=
+                           le32_to_cpu(EXT4_SB(sb)->s_es->s_first_data_block);
+
+                       ext4_error(sb, __FUNCTION__, "double-free of inode"
+                                  " %lu's block %llu(bit %u in group %lu)\n",
+                                  inode ? inode->i_ino : 0, blocknr, block,
+                                  e4b->bd_group);
+               }
+               mb_clear_bit(block, EXT4_MB_BITMAP(e4b));
+               e4b->bd_info->bb_counters[order]++;
+
+               /* start of the buddy */
+               buddy = mb_find_buddy(e4b, order, &max);
+
+               do {
+                       block &= ~1UL;
+                       if (mb_test_bit(block, buddy) ||
+                                       mb_test_bit(block + 1, buddy))
+                               break;
+
+                       /* both the buddies are free, try to coalesce them */
+                       buddy2 = mb_find_buddy(e4b, order + 1, &max);
+
+                       if (!buddy2)
+                               break;
+
+                       if (order > 0) {
+                               /* for special purposes, we don't set
+                                * free bits in bitmap */
+                               mb_set_bit(block, buddy);
+                               mb_set_bit(block + 1, buddy);
+                       }
+                       e4b->bd_info->bb_counters[order]--;
+                       e4b->bd_info->bb_counters[order]--;
+
+                       block = block >> 1;
+                       order++;
+                       e4b->bd_info->bb_counters[order]++;
+
+                       mb_clear_bit(block, buddy2);
+                       buddy = buddy2;
+               } while (1);
+       }
+       mb_check_buddy(e4b);
+
+       return 0;
+}
+
+static int mb_find_extent(struct ext4_buddy *e4b, int order, int block,
+                               int needed, struct ext4_free_extent *ex)
+{
+       int next = block;
+       int max;
+       int ord;
+       void *buddy;
+
+       BUG_ON(!ext4_is_group_locked(e4b->bd_sb, e4b->bd_group));
+       BUG_ON(ex == NULL);
+
+       buddy = mb_find_buddy(e4b, order, &max);
+       BUG_ON(buddy == NULL);
+       BUG_ON(block >= max);
+       if (mb_test_bit(block, buddy)) {
+               ex->fe_len = 0;
+               ex->fe_start = 0;
+               ex->fe_group = 0;
+               return 0;
+       }
+
+       /* FIXME dorp order completely ? */
+       if (likely(order == 0)) {
+               /* find actual order */
+               order = mb_find_order_for_block(e4b, block);
+               block = block >> order;
+       }
+
+       ex->fe_len = 1 << order;
+       ex->fe_start = block << order;
+       ex->fe_group = e4b->bd_group;
+
+       /* calc difference from given start */
+       next = next - ex->fe_start;
+       ex->fe_len -= next;
+       ex->fe_start += next;
+
+       while (needed > ex->fe_len &&
+              (buddy = mb_find_buddy(e4b, order, &max))) {
+
+               if (block + 1 >= max)
+                       break;
+
+               next = (block + 1) * (1 << order);
+               if (mb_test_bit(next, EXT4_MB_BITMAP(e4b)))
+                       break;
+
+               ord = mb_find_order_for_block(e4b, next);
+
+               order = ord;
+               block = next >> order;
+               ex->fe_len += 1 << order;
+       }
+
+       BUG_ON(ex->fe_start + ex->fe_len > (1 << (e4b->bd_blkbits + 3)));
+       return ex->fe_len;
+}
+
+static int mb_mark_used(struct ext4_buddy *e4b, struct ext4_free_extent *ex)
+{
+       int ord;
+       int mlen = 0;
+       int max = 0;
+       int cur;
+       int start = ex->fe_start;
+       int len = ex->fe_len;
+       unsigned ret = 0;
+       int len0 = len;
+       void *buddy;
+
+       BUG_ON(start + len > (e4b->bd_sb->s_blocksize << 3));
+       BUG_ON(e4b->bd_group != ex->fe_group);
+       BUG_ON(!ext4_is_group_locked(e4b->bd_sb, e4b->bd_group));
+       mb_check_buddy(e4b);
+       mb_mark_used_double(e4b, start, len);
+
+       e4b->bd_info->bb_free -= len;
+       if (e4b->bd_info->bb_first_free == start)
+               e4b->bd_info->bb_first_free += len;
+
+       /* let's maintain fragments counter */
+       if (start != 0)
+               mlen = !mb_test_bit(start - 1, EXT4_MB_BITMAP(e4b));
+       if (start + len < EXT4_SB(e4b->bd_sb)->s_mb_maxs[0])
+               max = !mb_test_bit(start + len, EXT4_MB_BITMAP(e4b));
+       if (mlen && max)
+               e4b->bd_info->bb_fragments++;
+       else if (!mlen && !max)
+               e4b->bd_info->bb_fragments--;
+
+       /* let's maintain buddy itself */
+       while (len) {
+               ord = mb_find_order_for_block(e4b, start);
+
+               if (((start >> ord) << ord) == start && len >= (1 << ord)) {
+                       /* the whole chunk may be allocated at once! */
+                       mlen = 1 << ord;
+                       buddy = mb_find_buddy(e4b, ord, &max);
+                       BUG_ON((start >> ord) >= max);
+                       mb_set_bit(start >> ord, buddy);
+                       e4b->bd_info->bb_counters[ord]--;
+                       start += mlen;
+                       len -= mlen;
+                       BUG_ON(len < 0);
+                       continue;
+               }
+
+               /* store for history */
+               if (ret == 0)
+                       ret = len | (ord << 16);
+
+               /* we have to split large buddy */
+               BUG_ON(ord <= 0);
+               buddy = mb_find_buddy(e4b, ord, &max);
+               mb_set_bit(start >> ord, buddy);
+               e4b->bd_info->bb_counters[ord]--;
+
+               ord--;
+               cur = (start >> ord) & ~1U;
+               buddy = mb_find_buddy(e4b, ord, &max);
+               mb_clear_bit(cur, buddy);
+               mb_clear_bit(cur + 1, buddy);
+               e4b->bd_info->bb_counters[ord]++;
+               e4b->bd_info->bb_counters[ord]++;
+       }
+
+       mb_set_bits(sb_bgl_lock(EXT4_SB(e4b->bd_sb), ex->fe_group),
+                       EXT4_MB_BITMAP(e4b), ex->fe_start, len0);
+       mb_check_buddy(e4b);
+
+       return ret;
+}
+
+/*
+ * Must be called under group lock!
+ */
+static void ext4_mb_use_best_found(struct ext4_allocation_context *ac,
+                                       struct ext4_buddy *e4b)
+{
+       struct ext4_sb_info *sbi = EXT4_SB(ac->ac_sb);
+       int ret;
+
+       BUG_ON(ac->ac_b_ex.fe_group != e4b->bd_group);
+       BUG_ON(ac->ac_status == AC_STATUS_FOUND);
+
+       ac->ac_b_ex.fe_len = min(ac->ac_b_ex.fe_len, ac->ac_g_ex.fe_len);
+       ac->ac_b_ex.fe_logical = ac->ac_g_ex.fe_logical;
+       ret = mb_mark_used(e4b, &ac->ac_b_ex);
+
+       /* preallocation can change ac_b_ex, thus we store actually
+        * allocated blocks for history */
+       ac->ac_f_ex = ac->ac_b_ex;
+
+       ac->ac_status = AC_STATUS_FOUND;
+       ac->ac_tail = ret & 0xffff;
+       ac->ac_buddy = ret >> 16;
+
+       /* XXXXXXX: SUCH A HORRIBLE **CK */
+       /*FIXME!! Why ? */
+       ac->ac_bitmap_page = e4b->bd_bitmap_page;
+       get_page(ac->ac_bitmap_page);
+       ac->ac_buddy_page = e4b->bd_buddy_page;
+       get_page(ac->ac_buddy_page);
+
+       /* store last allocated for subsequent stream allocation */
+       if ((ac->ac_flags & EXT4_MB_HINT_DATA)) {
+               spin_lock(&sbi->s_md_lock);
+               sbi->s_mb_last_group = ac->ac_f_ex.fe_group;
+               sbi->s_mb_last_start = ac->ac_f_ex.fe_start;
+               spin_unlock(&sbi->s_md_lock);
+       }
+}
+
+/*
+ * regular allocator, for general purposes allocation
+ */
+
+static void ext4_mb_check_limits(struct ext4_allocation_context *ac,
+                                       struct ext4_buddy *e4b,
+                                       int finish_group)
+{
+       struct ext4_sb_info *sbi = EXT4_SB(ac->ac_sb);
+       struct ext4_free_extent *bex = &ac->ac_b_ex;
+       struct ext4_free_extent *gex = &ac->ac_g_ex;
+       struct ext4_free_extent ex;
+       int max;
+
+       /*
+        * We don't want to scan for a whole year
+        */
+       if (ac->ac_found > sbi->s_mb_max_to_scan &&
+                       !(ac->ac_flags & EXT4_MB_HINT_FIRST)) {
+               ac->ac_status = AC_STATUS_BREAK;
+               return;
+       }
+
+       /*
+        * Haven't found good chunk so far, let's continue
+        */
+       if (bex->fe_len < gex->fe_len)
+               return;
+
+       if ((finish_group || ac->ac_found > sbi->s_mb_min_to_scan)
+                       && bex->fe_group == e4b->bd_group) {
+               /* recheck chunk's availability - we don't know
+                * when it was found (within this lock-unlock
+                * period or not) */
+               max = mb_find_extent(e4b, 0, bex->fe_start, gex->fe_len, &ex);
+               if (max >= gex->fe_len) {
+                       ext4_mb_use_best_found(ac, e4b);
+                       return;
+               }
+       }
+}
+
+/*
+ * The routine checks whether found extent is good enough. If it is,
+ * then the extent gets marked used and flag is set to the context
+ * to stop scanning. Otherwise, the extent is compared with the
+ * previous found extent and if new one is better, then it's stored
+ * in the context. Later, the best found extent will be used, if
+ * mballoc can't find good enough extent.
+ *
+ * FIXME: real allocation policy is to be designed yet!
+ */
+static void ext4_mb_measure_extent(struct ext4_allocation_context *ac,
+                                       struct ext4_free_extent *ex,
+                                       struct ext4_buddy *e4b)
+{
+       struct ext4_free_extent *bex = &ac->ac_b_ex;
+       struct ext4_free_extent *gex = &ac->ac_g_ex;
+
+       BUG_ON(ex->fe_len <= 0);
+       BUG_ON(ex->fe_len >= EXT4_BLOCKS_PER_GROUP(ac->ac_sb));
+       BUG_ON(ex->fe_start >= EXT4_BLOCKS_PER_GROUP(ac->ac_sb));
+       BUG_ON(ac->ac_status != AC_STATUS_CONTINUE);
+
+       ac->ac_found++;
+
+       /*
+        * The special case - take what you catch first
+        */
+       if (unlikely(ac->ac_flags & EXT4_MB_HINT_FIRST)) {
+               *bex = *ex;
+               ext4_mb_use_best_found(ac, e4b);
+               return;
+       }
+
+       /*
+        * Let's check whether the chuck is good enough
+        */
+       if (ex->fe_len == gex->fe_len) {
+               *bex = *ex;
+               ext4_mb_use_best_found(ac, e4b);
+               return;
+       }
+
+       /*
+        * If this is first found extent, just store it in the context
+        */
+       if (bex->fe_len == 0) {
+               *bex = *ex;
+               return;
+       }
+
+       /*
+        * If new found extent is better, store it in the context
+        */
+       if (bex->fe_len < gex->fe_len) {
+               /* if the request isn't satisfied, any found extent
+                * larger than previous best one is better */
+               if (ex->fe_len > bex->fe_len)
+                       *bex = *ex;
+       } else if (ex->fe_len > gex->fe_len) {
+               /* if the request is satisfied, then we try to find
+                * an extent that still satisfy the request, but is
+                * smaller than previous one */
+               if (ex->fe_len < bex->fe_len)
+                       *bex = *ex;
+       }
+
+       ext4_mb_check_limits(ac, e4b, 0);
+}
+
+static int ext4_mb_try_best_found(struct ext4_allocation_context *ac,
+                                       struct ext4_buddy *e4b)
+{
+       struct ext4_free_extent ex = ac->ac_b_ex;
+       ext4_group_t group = ex.fe_group;
+       int max;
+       int err;
+
+       BUG_ON(ex.fe_len <= 0);
+       err = ext4_mb_load_buddy(ac->ac_sb, group, e4b);
+       if (err)
+               return err;
+
+       ext4_lock_group(ac->ac_sb, group);
+       max = mb_find_extent(e4b, 0, ex.fe_start, ex.fe_len, &ex);
+
+       if (max > 0) {
+               ac->ac_b_ex = ex;
+               ext4_mb_use_best_found(ac, e4b);
+       }
+
+       ext4_unlock_group(ac->ac_sb, group);
+       ext4_mb_release_desc(e4b);
+
+       return 0;
+}
+
+static int ext4_mb_find_by_goal(struct ext4_allocation_context *ac,
+                               struct ext4_buddy *e4b)
+{
+       ext4_group_t group = ac->ac_g_ex.fe_group;
+       int max;
+       int err;
+       struct ext4_sb_info *sbi = EXT4_SB(ac->ac_sb);
+       struct ext4_super_block *es = sbi->s_es;
+       struct ext4_free_extent ex;
+
+       if (!(ac->ac_flags & EXT4_MB_HINT_TRY_GOAL))
+               return 0;
+
+       err = ext4_mb_load_buddy(ac->ac_sb, group, e4b);
+       if (err)
+               return err;
+
+       ext4_lock_group(ac->ac_sb, group);
+       max = mb_find_extent(e4b, 0, ac->ac_g_ex.fe_start,
+                            ac->ac_g_ex.fe_len, &ex);
+
+       if (max >= ac->ac_g_ex.fe_len && ac->ac_g_ex.fe_len == sbi->s_stripe) {
+               ext4_fsblk_t start;
+
+               start = (e4b->bd_group * EXT4_BLOCKS_PER_GROUP(ac->ac_sb)) +
+                       ex.fe_start + le32_to_cpu(es->s_first_data_block);
+               /* use do_div to get remainder (would be 64-bit modulo) */
+               if (do_div(start, sbi->s_stripe) == 0) {
+                       ac->ac_found++;
+                       ac->ac_b_ex = ex;
+                       ext4_mb_use_best_found(ac, e4b);
+               }
+       } else if (max >= ac->ac_g_ex.fe_len) {
+               BUG_ON(ex.fe_len <= 0);
+               BUG_ON(ex.fe_group != ac->ac_g_ex.fe_group);
+               BUG_ON(ex.fe_start != ac->ac_g_ex.fe_start);
+               ac->ac_found++;
+               ac->ac_b_ex = ex;
+               ext4_mb_use_best_found(ac, e4b);
+       } else if (max > 0 && (ac->ac_flags & EXT4_MB_HINT_MERGE)) {
+               /* Sometimes, caller may want to merge even small
+                * number of blocks to an existing extent */
+               BUG_ON(ex.fe_len <= 0);
+               BUG_ON(ex.fe_group != ac->ac_g_ex.fe_group);
+               BUG_ON(ex.fe_start != ac->ac_g_ex.fe_start);
+               ac->ac_found++;
+               ac->ac_b_ex = ex;
+               ext4_mb_use_best_found(ac, e4b);
+       }
+       ext4_unlock_group(ac->ac_sb, group);
+       ext4_mb_release_desc(e4b);
+
+       return 0;
+}
+
+/*
+ * The routine scans buddy structures (not bitmap!) from given order
+ * to max order and tries to find big enough chunk to satisfy the req
+ */
+static void ext4_mb_simple_scan_group(struct ext4_allocation_context *ac,
+                                       struct ext4_buddy *e4b)
+{
+       struct super_block *sb = ac->ac_sb;
+       struct ext4_group_info *grp = e4b->bd_info;
+       void *buddy;
+       int i;
+       int k;
+       int max;
+
+       BUG_ON(ac->ac_2order <= 0);
+       for (i = ac->ac_2order; i <= sb->s_blocksize_bits + 1; i++) {
+               if (grp->bb_counters[i] == 0)
+                       continue;
+
+               buddy = mb_find_buddy(e4b, i, &max);
+               BUG_ON(buddy == NULL);
+
+               k = ext4_find_next_zero_bit(buddy, max, 0);
+               BUG_ON(k >= max);
+
+               ac->ac_found++;
+
+               ac->ac_b_ex.fe_len = 1 << i;
+               ac->ac_b_ex.fe_start = k << i;
+               ac->ac_b_ex.fe_group = e4b->bd_group;
+
+               ext4_mb_use_best_found(ac, e4b);
+
+               BUG_ON(ac->ac_b_ex.fe_len != ac->ac_g_ex.fe_len);
+
+               if (EXT4_SB(sb)->s_mb_stats)
+                       atomic_inc(&EXT4_SB(sb)->s_bal_2orders);
+
+               break;
+       }
+}
+
+/*
+ * The routine scans the group and measures all found extents.
+ * In order to optimize scanning, caller must pass number of
+ * free blocks in the group, so the routine can know upper limit.
+ */
+static void ext4_mb_complex_scan_group(struct ext4_allocation_context *ac,
+                                       struct ext4_buddy *e4b)
+{
+       struct super_block *sb = ac->ac_sb;
+       void *bitmap = EXT4_MB_BITMAP(e4b);
+       struct ext4_free_extent ex;
+       int i;
+       int free;
+
+       free = e4b->bd_info->bb_free;
+       BUG_ON(free <= 0);
+
+       i = e4b->bd_info->bb_first_free;
+
+       while (free && ac->ac_status == AC_STATUS_CONTINUE) {
+               i = ext4_find_next_zero_bit(bitmap,
+                                               EXT4_BLOCKS_PER_GROUP(sb), i);
+               if (i >= EXT4_BLOCKS_PER_GROUP(sb)) {
+                       BUG_ON(free != 0);
+                       break;
+               }
+
+               mb_find_extent(e4b, 0, i, ac->ac_g_ex.fe_len, &ex);
+               BUG_ON(ex.fe_len <= 0);
+               BUG_ON(free < ex.fe_len);
+
+               ext4_mb_measure_extent(ac, &ex, e4b);
+
+               i += ex.fe_len;
+               free -= ex.fe_len;
+       }
+
+       ext4_mb_check_limits(ac, e4b, 1);
+}
+
+/*
+ * This is a special case for storages like raid5
+ * we try to find stripe-aligned chunks for stripe-size requests
+ * XXX should do so at least for multiples of stripe size as well
+ */
+static void ext4_mb_scan_aligned(struct ext4_allocation_context *ac,
+                                struct ext4_buddy *e4b)
+{
+       struct super_block *sb = ac->ac_sb;
+       struct ext4_sb_info *sbi = EXT4_SB(sb);
+       void *bitmap = EXT4_MB_BITMAP(e4b);
+       struct ext4_free_extent ex;
+       ext4_fsblk_t first_group_block;
+       ext4_fsblk_t a;
+       ext4_grpblk_t i;
+       int max;
+
+       BUG_ON(sbi->s_stripe == 0);
+
+       /* find first stripe-aligned block in group */
+       first_group_block = e4b->bd_group * EXT4_BLOCKS_PER_GROUP(sb)
+               + le32_to_cpu(sbi->s_es->s_first_data_block);
+       a = first_group_block + sbi->s_stripe - 1;
+       do_div(a, sbi->s_stripe);
+       i = (a * sbi->s_stripe) - first_group_block;
+
+       while (i < EXT4_BLOCKS_PER_GROUP(sb)) {
+               if (!mb_test_bit(i, bitmap)) {
+                       max = mb_find_extent(e4b, 0, i, sbi->s_stripe, &ex);
+                       if (max >= sbi->s_stripe) {
+                               ac->ac_found++;
+                               ac->ac_b_ex = ex;
+                               ext4_mb_use_best_found(ac, e4b);
+                               break;
+                       }
+               }
+               i += sbi->s_stripe;
+       }
+}
+
+static int ext4_mb_good_group(struct ext4_allocation_context *ac,
+                               ext4_group_t group, int cr)
+{
+       unsigned free, fragments;
+       unsigned i, bits;
+       struct ext4_group_desc *desc;
+       struct ext4_group_info *grp = ext4_get_group_info(ac->ac_sb, group);
+
+       BUG_ON(cr < 0 || cr >= 4);
+       BUG_ON(EXT4_MB_GRP_NEED_INIT(grp));
+
+       free = grp->bb_free;
+       fragments = grp->bb_fragments;
+       if (free == 0)
+               return 0;
+       if (fragments == 0)
+               return 0;
+
+       switch (cr) {
+       case 0:
+               BUG_ON(ac->ac_2order == 0);
+               /* If this group is uninitialized, skip it initially */
+               desc = ext4_get_group_desc(ac->ac_sb, group, NULL);
+               if (desc->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT))
+                       return 0;
+
+               bits = ac->ac_sb->s_blocksize_bits + 1;
+               for (i = ac->ac_2order; i <= bits; i++)
+                       if (grp->bb_counters[i] > 0)
+                               return 1;
+               break;
+       case 1:
+               if ((free / fragments) >= ac->ac_g_ex.fe_len)
+                       return 1;
+               break;
+       case 2:
+               if (free >= ac->ac_g_ex.fe_len)
+                       return 1;
+               break;
+       case 3:
+               return 1;
+       default:
+               BUG();
+       }
+
+       return 0;
+}
+
+static int ext4_mb_regular_allocator(struct ext4_allocation_context *ac)
+{
+       ext4_group_t group;
+       ext4_group_t i;
+       int cr;
+       int err = 0;
+       int bsbits;
+       struct ext4_sb_info *sbi;
+       struct super_block *sb;
+       struct ext4_buddy e4b;
+       loff_t size, isize;
+
+       sb = ac->ac_sb;
+       sbi = EXT4_SB(sb);
+       BUG_ON(ac->ac_status == AC_STATUS_FOUND);
+
+       /* first, try the goal */
+       err = ext4_mb_find_by_goal(ac, &e4b);
+       if (err || ac->ac_status == AC_STATUS_FOUND)
+               goto out;
+
+       if (unlikely(ac->ac_flags & EXT4_MB_HINT_GOAL_ONLY))
+               goto out;
+
+       /*
+        * ac->ac2_order is set only if the fe_len is a power of 2
+        * if ac2_order is set we also set criteria to 0 so that we
+        * try exact allocation using buddy.
+        */
+       i = fls(ac->ac_g_ex.fe_len);
+       ac->ac_2order = 0;
+       /*
+        * We search using buddy data only if the order of the request
+        * is greater than equal to the sbi_s_mb_order2_reqs
+        * You can tune it via /proc/fs/ext4/<partition>/order2_req
+        */
+       if (i >= sbi->s_mb_order2_reqs) {
+               /*
+                * This should tell if fe_len is exactly power of 2
+                */
+               if ((ac->ac_g_ex.fe_len & (~(1 << (i - 1)))) == 0)
+                       ac->ac_2order = i - 1;
+       }
+
+       bsbits = ac->ac_sb->s_blocksize_bits;
+       /* if stream allocation is enabled, use global goal */
+       size = ac->ac_o_ex.fe_logical + ac->ac_o_ex.fe_len;
+       isize = i_size_read(ac->ac_inode) >> bsbits;
+       if (size < isize)
+               size = isize;
+
+       if (size < sbi->s_mb_stream_request &&
+                       (ac->ac_flags & EXT4_MB_HINT_DATA)) {
+               /* TBD: may be hot point */
+               spin_lock(&sbi->s_md_lock);
+               ac->ac_g_ex.fe_group = sbi->s_mb_last_group;
+               ac->ac_g_ex.fe_start = sbi->s_mb_last_start;
+               spin_unlock(&sbi->s_md_lock);
+       }
+
+       /* searching for the right group start from the goal value specified */
+       group = ac->ac_g_ex.fe_group;
+
+       /* Let's just scan groups to find more-less suitable blocks */
+       cr = ac->ac_2order ? 0 : 1;
+       /*
+        * cr == 0 try to get exact allocation,
+        * cr == 3  try to get anything
+        */
+repeat:
+       for (; cr < 4 && ac->ac_status == AC_STATUS_CONTINUE; cr++) {
+               ac->ac_criteria = cr;
+               for (i = 0; i < EXT4_SB(sb)->s_groups_count; group++, i++) {
+                       struct ext4_group_info *grp;
+                       struct ext4_group_desc *desc;
+
+                       if (group == EXT4_SB(sb)->s_groups_count)
+                               group = 0;
+
+                       /* quick check to skip empty groups */
+                       grp = ext4_get_group_info(ac->ac_sb, group);
+                       if (grp->bb_free == 0)
+                               continue;
+
+                       /*
+                        * if the group is already init we check whether it is
+                        * a good group and if not we don't load the buddy
+                        */
+                       if (EXT4_MB_GRP_NEED_INIT(grp)) {
+                               /*
+                                * we need full data about the group
+                                * to make a good selection
+                                */
+                               err = ext4_mb_load_buddy(sb, group, &e4b);
+                               if (err)
+                                       goto out;
+                               ext4_mb_release_desc(&e4b);
+                       }
+
+                       /*
+                        * If the particular group doesn't satisfy our
+                        * criteria we continue with the next group
+                        */
+                       if (!ext4_mb_good_group(ac, group, cr))
+                               continue;
+
+                       err = ext4_mb_load_buddy(sb, group, &e4b);
+                       if (err)
+                               goto out;
+
+                       ext4_lock_group(sb, group);
+                       if (!ext4_mb_good_group(ac, group, cr)) {
+                               /* someone did allocation from this group */
+                               ext4_unlock_group(sb, group);
+                               ext4_mb_release_desc(&e4b);
+                               continue;
+                       }
+
+                       ac->ac_groups_scanned++;
+                       desc = ext4_get_group_desc(sb, group, NULL);
+                       if (cr == 0 || (desc->bg_flags &
+                                       cpu_to_le16(EXT4_BG_BLOCK_UNINIT) &&
+                                       ac->ac_2order != 0))
+                               ext4_mb_simple_scan_group(ac, &e4b);
+                       else if (cr == 1 &&
+                                       ac->ac_g_ex.fe_len == sbi->s_stripe)
+                               ext4_mb_scan_aligned(ac, &e4b);
+                       else
+                               ext4_mb_complex_scan_group(ac, &e4b);
+
+                       ext4_unlock_group(sb, group);
+                       ext4_mb_release_desc(&e4b);
+
+                       if (ac->ac_status != AC_STATUS_CONTINUE)
+                               break;
+               }
+       }
+
+       if (ac->ac_b_ex.fe_len > 0 && ac->ac_status != AC_STATUS_FOUND &&
+           !(ac->ac_flags & EXT4_MB_HINT_FIRST)) {
+               /*
+                * We've been searching too long. Let's try to allocate
+                * the best chunk we've found so far
+                */
+
+               ext4_mb_try_best_found(ac, &e4b);
+               if (ac->ac_status != AC_STATUS_FOUND) {
+                       /*
+                        * Someone more lucky has already allocated it.
+                        * The only thing we can do is just take first
+                        * found block(s)
+                       printk(KERN_DEBUG "EXT4-fs: someone won our chunk\n");
+                        */
+                       ac->ac_b_ex.fe_group = 0;
+                       ac->ac_b_ex.fe_start = 0;
+                       ac->ac_b_ex.fe_len = 0;
+                       ac->ac_status = AC_STATUS_CONTINUE;
+                       ac->ac_flags |= EXT4_MB_HINT_FIRST;
+                       cr = 3;
+                       atomic_inc(&sbi->s_mb_lost_chunks);
+                       goto repeat;
+               }
+       }
+out:
+       return err;
+}
+
+#ifdef EXT4_MB_HISTORY
+struct ext4_mb_proc_session {
+       struct ext4_mb_history *history;
+       struct super_block *sb;
+       int start;
+       int max;
+};
+
+static void *ext4_mb_history_skip_empty(struct ext4_mb_proc_session *s,
+                                       struct ext4_mb_history *hs,
+                                       int first)
+{
+       if (hs == s->history + s->max)
+               hs = s->history;
+       if (!first && hs == s->history + s->start)
+               return NULL;
+       while (hs->orig.fe_len == 0) {
+               hs++;
+               if (hs == s->history + s->max)
+                       hs = s->history;
+               if (hs == s->history + s->start)
+                       return NULL;
+       }
+       return hs;
+}
+
+static void *ext4_mb_seq_history_start(struct seq_file *seq, loff_t *pos)
+{
+       struct ext4_mb_proc_session *s = seq->private;
+       struct ext4_mb_history *hs;
+       int l = *pos;
+
+       if (l == 0)
+               return SEQ_START_TOKEN;
+       hs = ext4_mb_history_skip_empty(s, s->history + s->start, 1);
+       if (!hs)
+               return NULL;
+       while (--l && (hs = ext4_mb_history_skip_empty(s, ++hs, 0)) != NULL);
+       return hs;
+}
+
+static void *ext4_mb_seq_history_next(struct seq_file *seq, void *v,
+                                     loff_t *pos)
+{
+       struct ext4_mb_proc_session *s = seq->private;
+       struct ext4_mb_history *hs = v;
+
+       ++*pos;
+       if (v == SEQ_START_TOKEN)
+               return ext4_mb_history_skip_empty(s, s->history + s->start, 1);
+       else
+               return ext4_mb_history_skip_empty(s, ++hs, 0);
+}
+
+static int ext4_mb_seq_history_show(struct seq_file *seq, void *v)
+{
+       char buf[25], buf2[25], buf3[25], *fmt;
+       struct ext4_mb_history *hs = v;
+
+       if (v == SEQ_START_TOKEN) {
+               seq_printf(seq, "%-5s %-8s %-23s %-23s %-23s %-5s "
+                               "%-5s %-2s %-5s %-5s %-5s %-6s\n",
+                         "pid", "inode", "original", "goal", "result", "found",
+                          "grps", "cr", "flags", "merge", "tail", "broken");
+               return 0;
+       }
+
+       if (hs->op == EXT4_MB_HISTORY_ALLOC) {
+               fmt = "%-5u %-8u %-23s %-23s %-23s %-5u %-5u %-2u "
+                       "%-5u %-5s %-5u %-6u\n";
+               sprintf(buf2, "%lu/%d/%u@%u", hs->result.fe_group,
+                       hs->result.fe_start, hs->result.fe_len,
+                       hs->result.fe_logical);
+               sprintf(buf, "%lu/%d/%u@%u", hs->orig.fe_group,
+                       hs->orig.fe_start, hs->orig.fe_len,
+                       hs->orig.fe_logical);
+               sprintf(buf3, "%lu/%d/%u@%u", hs->goal.fe_group,
+                       hs->goal.fe_start, hs->goal.fe_len,
+                       hs->goal.fe_logical);
+               seq_printf(seq, fmt, hs->pid, hs->ino, buf, buf3, buf2,
+                               hs->found, hs->groups, hs->cr, hs->flags,
+                               hs->merged ? "M" : "", hs->tail,
+                               hs->buddy ? 1 << hs->buddy : 0);
+       } else if (hs->op == EXT4_MB_HISTORY_PREALLOC) {
+               fmt = "%-5u %-8u %-23s %-23s %-23s\n";
+               sprintf(buf2, "%lu/%d/%u@%u", hs->result.fe_group,
+                       hs->result.fe_start, hs->result.fe_len,
+                       hs->result.fe_logical);
+               sprintf(buf, "%lu/%d/%u@%u", hs->orig.fe_group,
+                       hs->orig.fe_start, hs->orig.fe_len,
+                       hs->orig.fe_logical);
+               seq_printf(seq, fmt, hs->pid, hs->ino, buf, "", buf2);
+       } else if (hs->op == EXT4_MB_HISTORY_DISCARD) {
+               sprintf(buf2, "%lu/%d/%u", hs->result.fe_group,
+                       hs->result.fe_start, hs->result.fe_len);
+               seq_printf(seq, "%-5u %-8u %-23s discard\n",
+                               hs->pid, hs->ino, buf2);
+       } else if (hs->op == EXT4_MB_HISTORY_FREE) {
+               sprintf(buf2, "%lu/%d/%u", hs->result.fe_group,
+                       hs->result.fe_start, hs->result.fe_len);
+               seq_printf(seq, "%-5u %-8u %-23s free\n",
+                               hs->pid, hs->ino, buf2);
+       }
+       return 0;
+}
+
+static void ext4_mb_seq_history_stop(struct seq_file *seq, void *v)
+{
+}
+
+static struct seq_operations ext4_mb_seq_history_ops = {
+       .start  = ext4_mb_seq_history_start,
+       .next   = ext4_mb_seq_history_next,
+       .stop   = ext4_mb_seq_history_stop,
+       .show   = ext4_mb_seq_history_show,
+};
+
+static int ext4_mb_seq_history_open(struct inode *inode, struct file *file)
+{
+       struct super_block *sb = PDE(inode)->data;
+       struct ext4_sb_info *sbi = EXT4_SB(sb);
+       struct ext4_mb_proc_session *s;
+       int rc;
+       int size;
+
+       s = kmalloc(sizeof(*s), GFP_KERNEL);
+       if (s == NULL)
+               return -ENOMEM;
+       s->sb = sb;
+       size = sizeof(struct ext4_mb_history) * sbi->s_mb_history_max;
+       s->history = kmalloc(size, GFP_KERNEL);
+       if (s->history == NULL) {
+               kfree(s);
+               return -ENOMEM;
+       }
+
+       spin_lock(&sbi->s_mb_history_lock);
+       memcpy(s->history, sbi->s_mb_history, size);
+       s->max = sbi->s_mb_history_max;
+       s->start = sbi->s_mb_history_cur % s->max;
+       spin_unlock(&sbi->s_mb_history_lock);
+
+       rc = seq_open(file, &ext4_mb_seq_history_ops);
+       if (rc == 0) {
+               struct seq_file *m = (struct seq_file *)file->private_data;
+               m->private = s;
+       } else {
+               kfree(s->history);
+               kfree(s);
+       }
+       return rc;
+
+}
+
+static int ext4_mb_seq_history_release(struct inode *inode, struct file *file)
+{
+       struct seq_file *seq = (struct seq_file *)file->private_data;
+       struct ext4_mb_proc_session *s = seq->private;
+       kfree(s->history);
+       kfree(s);
+       return seq_release(inode, file);
+}
+
+static ssize_t ext4_mb_seq_history_write(struct file *file,
+                               const char __user *buffer,
+                               size_t count, loff_t *ppos)
+{
+       struct seq_file *seq = (struct seq_file *)file->private_data;
+       struct ext4_mb_proc_session *s = seq->private;
+       struct super_block *sb = s->sb;
+       char str[32];
+       int value;
+
+       if (count >= sizeof(str)) {
+               printk(KERN_ERR "EXT4-fs: %s string too long, max %u bytes\n",
+                               "mb_history", (int)sizeof(str));
+               return -EOVERFLOW;
+       }
+
+       if (copy_from_user(str, buffer, count))
+               return -EFAULT;
+
+       value = simple_strtol(str, NULL, 0);
+       if (value < 0)
+               return -ERANGE;
+       EXT4_SB(sb)->s_mb_history_filter = value;
+
+       return count;
+}
+
+static struct file_operations ext4_mb_seq_history_fops = {
+       .owner          = THIS_MODULE,
+       .open           = ext4_mb_seq_history_open,
+       .read           = seq_read,
+       .write          = ext4_mb_seq_history_write,
+       .llseek         = seq_lseek,
+       .release        = ext4_mb_seq_history_release,
+};
+
+static void *ext4_mb_seq_groups_start(struct seq_file *seq, loff_t *pos)
+{
+       struct super_block *sb = seq->private;
+       struct ext4_sb_info *sbi = EXT4_SB(sb);
+       ext4_group_t group;
+
+       if (*pos < 0 || *pos >= sbi->s_groups_count)
+               return NULL;
+
+       group = *pos + 1;
+       return (void *) group;
+}
+
+static void *ext4_mb_seq_groups_next(struct seq_file *seq, void *v, loff_t *pos)
+{
+       struct super_block *sb = seq->private;
+       struct ext4_sb_info *sbi = EXT4_SB(sb);
+       ext4_group_t group;
+
+       ++*pos;
+       if (*pos < 0 || *pos >= sbi->s_groups_count)
+               return NULL;
+       group = *pos + 1;
+       return (void *) group;;
+}
+
+static int ext4_mb_seq_groups_show(struct seq_file *seq, void *v)
+{
+       struct super_block *sb = seq->private;
+       long group = (long) v;
+       int i;
+       int err;
+       struct ext4_buddy e4b;
+       struct sg {
+               struct ext4_group_info info;
+               unsigned short counters[16];
+       } sg;
+
+       group--;
+       if (group == 0)
+               seq_printf(seq, "#%-5s: %-5s %-5s %-5s "
+                               "[ %-5s %-5s %-5s %-5s %-5s %-5s %-5s "
+                                 "%-5s %-5s %-5s %-5s %-5s %-5s %-5s ]\n",
+                          "group", "free", "frags", "first",
+                          "2^0", "2^1", "2^2", "2^3", "2^4", "2^5", "2^6",
+                          "2^7", "2^8", "2^9", "2^10", "2^11", "2^12", "2^13");
+
+       i = (sb->s_blocksize_bits + 2) * sizeof(sg.info.bb_counters[0]) +
+               sizeof(struct ext4_group_info);
+       err = ext4_mb_load_buddy(sb, group, &e4b);
+       if (err) {
+               seq_printf(seq, "#%-5lu: I/O error\n", group);
+               return 0;
+       }
+       ext4_lock_group(sb, group);
+       memcpy(&sg, ext4_get_group_info(sb, group), i);
+       ext4_unlock_group(sb, group);
+       ext4_mb_release_desc(&e4b);
+
+       seq_printf(seq, "#%-5lu: %-5u %-5u %-5u [", group, sg.info.bb_free,
+                       sg.info.bb_fragments, sg.info.bb_first_free);
+       for (i = 0; i <= 13; i++)
+               seq_printf(seq, " %-5u", i <= sb->s_blocksize_bits + 1 ?
+                               sg.info.bb_counters[i] : 0);
+       seq_printf(seq, " ]\n");
+
+       return 0;
+}
+
+static void ext4_mb_seq_groups_stop(struct seq_file *seq, void *v)
+{
+}
+
+static struct seq_operations ext4_mb_seq_groups_ops = {
+       .start  = ext4_mb_seq_groups_start,
+       .next   = ext4_mb_seq_groups_next,
+       .stop   = ext4_mb_seq_groups_stop,
+       .show   = ext4_mb_seq_groups_show,
+};
+
+static int ext4_mb_seq_groups_open(struct inode *inode, struct file *file)
+{
+       struct super_block *sb = PDE(inode)->data;
+       int rc;
+
+       rc = seq_open(file, &ext4_mb_seq_groups_ops);
+       if (rc == 0) {
+               struct seq_file *m = (struct seq_file *)file->private_data;
+               m->private = sb;
+       }
+       return rc;
+
+}
+
+static struct file_operations ext4_mb_seq_groups_fops = {
+       .owner          = THIS_MODULE,
+       .open           = ext4_mb_seq_groups_open,
+       .read           = seq_read,
+       .llseek         = seq_lseek,
+       .release        = seq_release,
+};
+
+static void ext4_mb_history_release(struct super_block *sb)
+{
+       struct ext4_sb_info *sbi = EXT4_SB(sb);
+
+       remove_proc_entry("mb_groups", sbi->s_mb_proc);
+       remove_proc_entry("mb_history", sbi->s_mb_proc);
+
+       kfree(sbi->s_mb_history);
+}
+
+static void ext4_mb_history_init(struct super_block *sb)
+{
+       struct ext4_sb_info *sbi = EXT4_SB(sb);
+       int i;
+
+       if (sbi->s_mb_proc != NULL) {
+               struct proc_dir_entry *p;
+               p = create_proc_entry("mb_history", S_IRUGO, sbi->s_mb_proc);
+               if (p) {
+                       p->proc_fops = &ext4_mb_seq_history_fops;
+                       p->data = sb;
+               }
+               p = create_proc_entry("mb_groups", S_IRUGO, sbi->s_mb_proc);
+               if (p) {
+                       p->proc_fops = &ext4_mb_seq_groups_fops;
+                       p->data = sb;
+               }
+       }
+
+       sbi->s_mb_history_max = 1000;
+       sbi->s_mb_history_cur = 0;
+       spin_lock_init(&sbi->s_mb_history_lock);
+       i = sbi->s_mb_history_max * sizeof(struct ext4_mb_history);
+       sbi->s_mb_history = kmalloc(i, GFP_KERNEL);
+       if (likely(sbi->s_mb_history != NULL))
+               memset(sbi->s_mb_history, 0, i);
+       /* if we can't allocate history, then we simple won't use it */
+}
+
+static void ext4_mb_store_history(struct ext4_allocation_context *ac)
+{
+       struct ext4_sb_info *sbi = EXT4_SB(ac->ac_sb);
+       struct ext4_mb_history h;
+
+       if (unlikely(sbi->s_mb_history == NULL))
+               return;
+
+       if (!(ac->ac_op & sbi->s_mb_history_filter))
+               return;
+
+       h.op = ac->ac_op;
+       h.pid = current->pid;
+       h.ino = ac->ac_inode ? ac->ac_inode->i_ino : 0;
+       h.orig = ac->ac_o_ex;
+       h.result = ac->ac_b_ex;
+       h.flags = ac->ac_flags;
+       h.found = ac->ac_found;
+       h.groups = ac->ac_groups_scanned;
+       h.cr = ac->ac_criteria;
+       h.tail = ac->ac_tail;
+       h.buddy = ac->ac_buddy;
+       h.merged = 0;
+       if (ac->ac_op == EXT4_MB_HISTORY_ALLOC) {
+               if (ac->ac_g_ex.fe_start == ac->ac_b_ex.fe_start &&
+                               ac->ac_g_ex.fe_group == ac->ac_b_ex.fe_group)
+                       h.merged = 1;
+               h.goal = ac->ac_g_ex;
+               h.result = ac->ac_f_ex;
+       }
+
+       spin_lock(&sbi->s_mb_history_lock);
+       memcpy(sbi->s_mb_history + sbi->s_mb_history_cur, &h, sizeof(h));
+       if (++sbi->s_mb_history_cur >= sbi->s_mb_history_max)
+               sbi->s_mb_history_cur = 0;
+       spin_unlock(&sbi->s_mb_history_lock);
+}
+
+#else
+#define ext4_mb_history_release(sb)
+#define ext4_mb_history_init(sb)
+#endif
+
+static int ext4_mb_init_backend(struct super_block *sb)
+{
+       ext4_group_t i;
+       int j, len, metalen;
+       struct ext4_sb_info *sbi = EXT4_SB(sb);
+       int num_meta_group_infos =
+               (sbi->s_groups_count + EXT4_DESC_PER_BLOCK(sb) - 1) >>
+                       EXT4_DESC_PER_BLOCK_BITS(sb);
+       struct ext4_group_info **meta_group_info;
+
+       /* An 8TB filesystem with 64-bit pointers requires a 4096 byte
+        * kmalloc. A 128kb malloc should suffice for a 256TB filesystem.
+        * So a two level scheme suffices for now. */
+       sbi->s_group_info = kmalloc(sizeof(*sbi->s_group_info) *
+                                   num_meta_group_infos, GFP_KERNEL);
+       if (sbi->s_group_info == NULL) {
+               printk(KERN_ERR "EXT4-fs: can't allocate buddy meta group\n");
+               return -ENOMEM;
+       }
+       sbi->s_buddy_cache = new_inode(sb);
+       if (sbi->s_buddy_cache == NULL) {
+               printk(KERN_ERR "EXT4-fs: can't get new inode\n");
+               goto err_freesgi;
+       }
+       EXT4_I(sbi->s_buddy_cache)->i_disksize = 0;
+
+       metalen = sizeof(*meta_group_info) << EXT4_DESC_PER_BLOCK_BITS(sb);
+       for (i = 0; i < num_meta_group_infos; i++) {
+               if ((i + 1) == num_meta_group_infos)
+                       metalen = sizeof(*meta_group_info) *
+                               (sbi->s_groups_count -
+                                       (i << EXT4_DESC_PER_BLOCK_BITS(sb)));
+               meta_group_info = kmalloc(metalen, GFP_KERNEL);
+               if (meta_group_info == NULL) {
+                       printk(KERN_ERR "EXT4-fs: can't allocate mem for a "
+                              "buddy group\n");
+                       goto err_freemeta;
+               }
+               sbi->s_group_info[i] = meta_group_info;
+       }
+
+       /*
+        * calculate needed size. if change bb_counters size,
+        * don't forget about ext4_mb_generate_buddy()
+        */
+       len = sizeof(struct ext4_group_info);
+       len += sizeof(unsigned short) * (sb->s_blocksize_bits + 2);
+       for (i = 0; i < sbi->s_groups_count; i++) {
+               struct ext4_group_desc *desc;
+
+               meta_group_info =
+                       sbi->s_group_info[i >> EXT4_DESC_PER_BLOCK_BITS(sb)];
+               j = i & (EXT4_DESC_PER_BLOCK(sb) - 1);
+
+               meta_group_info[j] = kzalloc(len, GFP_KERNEL);
+               if (meta_group_info[j] == NULL) {
+                       printk(KERN_ERR "EXT4-fs: can't allocate buddy mem\n");
+                       i--;
+                       goto err_freebuddy;
+               }
+               desc = ext4_get_group_desc(sb, i, NULL);
+               if (desc == NULL) {
+                       printk(KERN_ERR
+                               "EXT4-fs: can't read descriptor %lu\n", i);
+                       goto err_freebuddy;
+               }
+               memset(meta_group_info[j], 0, len);
+               set_bit(EXT4_GROUP_INFO_NEED_INIT_BIT,
+                       &(meta_group_info[j]->bb_state));
+
+               /*
+                * initialize bb_free to be able to skip
+                * empty groups without initialization
+                */
+               if (desc->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT)) {
+                       meta_group_info[j]->bb_free =
+                               ext4_free_blocks_after_init(sb, i, desc);
+               } else {
+                       meta_group_info[j]->bb_free =
+                               le16_to_cpu(desc->bg_free_blocks_count);
+               }
+
+               INIT_LIST_HEAD(&meta_group_info[j]->bb_prealloc_list);
+
+#ifdef DOUBLE_CHECK
+               {
+                       struct buffer_head *bh;
+                       meta_group_info[j]->bb_bitmap =
+                               kmalloc(sb->s_blocksize, GFP_KERNEL);
+                       BUG_ON(meta_group_info[j]->bb_bitmap == NULL);
+                       bh = read_block_bitmap(sb, i);
+                       BUG_ON(bh == NULL);
+                       memcpy(meta_group_info[j]->bb_bitmap, bh->b_data,
+                                       sb->s_blocksize);
+                       put_bh(bh);
+               }
+#endif
+
+       }
+
+       return 0;
+
+err_freebuddy:
+       while (i >= 0) {
+               kfree(ext4_get_group_info(sb, i));
+               i--;
+       }
+       i = num_meta_group_infos;
+err_freemeta:
+       while (--i >= 0)
+               kfree(sbi->s_group_info[i]);
+       iput(sbi->s_buddy_cache);
+err_freesgi:
+       kfree(sbi->s_group_info);
+       return -ENOMEM;
+}
+
+int ext4_mb_init(struct super_block *sb, int needs_recovery)
+{
+       struct ext4_sb_info *sbi = EXT4_SB(sb);
+       unsigned i;
+       unsigned offset;
+       unsigned max;
+
+       if (!test_opt(sb, MBALLOC))
+               return 0;
+
+       i = (sb->s_blocksize_bits + 2) * sizeof(unsigned short);
+
+       sbi->s_mb_offsets = kmalloc(i, GFP_KERNEL);
+       if (sbi->s_mb_offsets == NULL) {
+               clear_opt(sbi->s_mount_opt, MBALLOC);
+               return -ENOMEM;
+       }
+       sbi->s_mb_maxs = kmalloc(i, GFP_KERNEL);
+       if (sbi->s_mb_maxs == NULL) {
+               clear_opt(sbi->s_mount_opt, MBALLOC);
+               kfree(sbi->s_mb_maxs);
+               return -ENOMEM;
+       }
+
+       /* order 0 is regular bitmap */
+       sbi->s_mb_maxs[0] = sb->s_blocksize << 3;
+       sbi->s_mb_offsets[0] = 0;
+
+       i = 1;
+       offset = 0;
+       max = sb->s_blocksize << 2;
+       do {
+               sbi->s_mb_offsets[i] = offset;
+               sbi->s_mb_maxs[i] = max;
+               offset += 1 << (sb->s_blocksize_bits - i);
+               max = max >> 1;
+               i++;
+       } while (i <= sb->s_blocksize_bits + 1);
+
+       /* init file for buddy data */
+       i = ext4_mb_init_backend(sb);
+       if (i) {
+               clear_opt(sbi->s_mount_opt, MBALLOC);
+               kfree(sbi->s_mb_offsets);
+               kfree(sbi->s_mb_maxs);
+               return i;
+       }
+
+       spin_lock_init(&sbi->s_md_lock);
+       INIT_LIST_HEAD(&sbi->s_active_transaction);
+       INIT_LIST_HEAD(&sbi->s_closed_transaction);
+       INIT_LIST_HEAD(&sbi->s_committed_transaction);
+       spin_lock_init(&sbi->s_bal_lock);
+
+       sbi->s_mb_max_to_scan = MB_DEFAULT_MAX_TO_SCAN;
+       sbi->s_mb_min_to_scan = MB_DEFAULT_MIN_TO_SCAN;
+       sbi->s_mb_stats = MB_DEFAULT_STATS;
+       sbi->s_mb_stream_request = MB_DEFAULT_STREAM_THRESHOLD;
+       sbi->s_mb_order2_reqs = MB_DEFAULT_ORDER2_REQS;
+       sbi->s_mb_history_filter = EXT4_MB_HISTORY_DEFAULT;
+       sbi->s_mb_group_prealloc = MB_DEFAULT_GROUP_PREALLOC;
+
+       i = sizeof(struct ext4_locality_group) * NR_CPUS;
+       sbi->s_locality_groups = kmalloc(i, GFP_KERNEL);
+       if (sbi->s_locality_groups == NULL) {
+               clear_opt(sbi->s_mount_opt, MBALLOC);
+               kfree(sbi->s_mb_offsets);
+               kfree(sbi->s_mb_maxs);
+               return -ENOMEM;
+       }
+       for (i = 0; i < NR_CPUS; i++) {
+               struct ext4_locality_group *lg;
+               lg = &sbi->s_locality_groups[i];
+               mutex_init(&lg->lg_mutex);
+               INIT_LIST_HEAD(&lg->lg_prealloc_list);
+               spin_lock_init(&lg->lg_prealloc_lock);
+       }
+
+       ext4_mb_init_per_dev_proc(sb);
+       ext4_mb_history_init(sb);
+
+       printk("EXT4-fs: mballoc enabled\n");
+       return 0;
+}
+
+/* need to called with ext4 group lock (ext4_lock_group) */
+static void ext4_mb_cleanup_pa(struct ext4_group_info *grp)
+{
+       struct ext4_prealloc_space *pa;
+       struct list_head *cur, *tmp;
+       int count = 0;
+
+       list_for_each_safe(cur, tmp, &grp->bb_prealloc_list) {
+               pa = list_entry(cur, struct ext4_prealloc_space, pa_group_list);
+               list_del(&pa->pa_group_list);
+               count++;
+               kfree(pa);
+       }
+       if (count)
+               mb_debug("mballoc: %u PAs left\n", count);
+
+}
+
+int ext4_mb_release(struct super_block *sb)
+{
+       ext4_group_t i;
+       int num_meta_group_infos;
+       struct ext4_group_info *grinfo;
+       struct ext4_sb_info *sbi = EXT4_SB(sb);
+
+       if (!test_opt(sb, MBALLOC))
+               return 0;
+
+       /* release freed, non-committed blocks */
+       spin_lock(&sbi->s_md_lock);
+       list_splice_init(&sbi->s_closed_transaction,
+                       &sbi->s_committed_transaction);
+       list_splice_init(&sbi->s_active_transaction,
+                       &sbi->s_committed_transaction);
+       spin_unlock(&sbi->s_md_lock);
+       ext4_mb_free_committed_blocks(sb);
+
+       if (sbi->s_group_info) {
+               for (i = 0; i < sbi->s_groups_count; i++) {
+                       grinfo = ext4_get_group_info(sb, i);
+#ifdef DOUBLE_CHECK
+                       kfree(grinfo->bb_bitmap);
+#endif
+                       ext4_lock_group(sb, i);
+                       ext4_mb_cleanup_pa(grinfo);
+                       ext4_unlock_group(sb, i);
+                       kfree(grinfo);
+               }
+               num_meta_group_infos = (sbi->s_groups_count +
+                               EXT4_DESC_PER_BLOCK(sb) - 1) >>
+                       EXT4_DESC_PER_BLOCK_BITS(sb);
+               for (i = 0; i < num_meta_group_infos; i++)
+                       kfree(sbi->s_group_info[i]);
+               kfree(sbi->s_group_info);
+       }
+       kfree(sbi->s_mb_offsets);
+       kfree(sbi->s_mb_maxs);
+       if (sbi->s_buddy_cache)
+               iput(sbi->s_buddy_cache);
+       if (sbi->s_mb_stats) {
+               printk(KERN_INFO
+                      "EXT4-fs: mballoc: %u blocks %u reqs (%u success)\n",
+                               atomic_read(&sbi->s_bal_allocated),
+                               atomic_read(&sbi->s_bal_reqs),
+                               atomic_read(&sbi->s_bal_success));
+               printk(KERN_INFO
+                     "EXT4-fs: mballoc: %u extents scanned, %u goal hits, "
+                               "%u 2^N hits, %u breaks, %u lost\n",
+                               atomic_read(&sbi->s_bal_ex_scanned),
+                               atomic_read(&sbi->s_bal_goals),
+                               atomic_read(&sbi->s_bal_2orders),
+                               atomic_read(&sbi->s_bal_breaks),
+                               atomic_read(&sbi->s_mb_lost_chunks));
+               printk(KERN_INFO
+                      "EXT4-fs: mballoc: %lu generated and it took %Lu\n",
+                               sbi->s_mb_buddies_generated++,
+                               sbi->s_mb_generation_time);
+               printk(KERN_INFO
+                      "EXT4-fs: mballoc: %u preallocated, %u discarded\n",
+                               atomic_read(&sbi->s_mb_preallocated),
+                               atomic_read(&sbi->s_mb_discarded));
+       }
+
+       kfree(sbi->s_locality_groups);
+
+       ext4_mb_history_release(sb);
+       ext4_mb_destroy_per_dev_proc(sb);
+
+       return 0;
+}
+
+static void ext4_mb_free_committed_blocks(struct super_block *sb)
+{
+       struct ext4_sb_info *sbi = EXT4_SB(sb);
+       int err;
+       int i;
+       int count = 0;
+       int count2 = 0;
+       struct ext4_free_metadata *md;
+       struct ext4_buddy e4b;
+
+       if (list_empty(&sbi->s_committed_transaction))
+               return;
+
+       /* there is committed blocks to be freed yet */
+       do {
+               /* get next array of blocks */
+               md = NULL;
+               spin_lock(&sbi->s_md_lock);
+               if (!list_empty(&sbi->s_committed_transaction)) {
+                       md = list_entry(sbi->s_committed_transaction.next,
+                                       struct ext4_free_metadata, list);
+                       list_del(&md->list);
+               }
+               spin_unlock(&sbi->s_md_lock);
+
+               if (md == NULL)
+                       break;
+
+               mb_debug("gonna free %u blocks in group %lu (0x%p):",
+                               md->num, md->group, md);
+
+               err = ext4_mb_load_buddy(sb, md->group, &e4b);
+               /* we expect to find existing buddy because it's pinned */
+               BUG_ON(err != 0);
+
+               /* there are blocks to put in buddy to make them really free */
+               count += md->num;
+               count2++;
+               ext4_lock_group(sb, md->group);
+               for (i = 0; i < md->num; i++) {
+                       mb_debug(" %u", md->blocks[i]);
+                       err = mb_free_blocks(NULL, &e4b, md->blocks[i], 1);
+                       BUG_ON(err != 0);
+               }
+               mb_debug("\n");
+               ext4_unlock_group(sb, md->group);
+
+               /* balance refcounts from ext4_mb_free_metadata() */
+               page_cache_release(e4b.bd_buddy_page);
+               page_cache_release(e4b.bd_bitmap_page);
+
+               kfree(md);
+               ext4_mb_release_desc(&e4b);
+
+       } while (md);
+
+       mb_debug("freed %u blocks in %u structures\n", count, count2);
+}
+
+#define EXT4_ROOT                      "ext4"
+#define EXT4_MB_STATS_NAME             "stats"
+#define EXT4_MB_MAX_TO_SCAN_NAME       "max_to_scan"
+#define EXT4_MB_MIN_TO_SCAN_NAME       "min_to_scan"
+#define EXT4_MB_ORDER2_REQ             "order2_req"
+#define EXT4_MB_STREAM_REQ             "stream_req"
+#define EXT4_MB_GROUP_PREALLOC         "group_prealloc"
+
+
+
+#define MB_PROC_VALUE_READ(name)                               \
+static int ext4_mb_read_##name(char *page, char **start,       \
+               off_t off, int count, int *eof, void *data)     \
+{                                                              \
+       struct ext4_sb_info *sbi = data;                        \
+       int len;                                                \
+       *eof = 1;                                               \
+       if (off != 0)                                           \
+               return 0;                                       \
+       len = sprintf(page, "%ld\n", sbi->s_mb_##name);         \
+       *start = page;                                          \
+       return len;                                             \
+}
+
+#define MB_PROC_VALUE_WRITE(name)                              \
+static int ext4_mb_write_##name(struct file *file,             \
+               const char __user *buf, unsigned long cnt, void *data)  \
+{                                                              \
+       struct ext4_sb_info *sbi = data;                        \
+       char str[32];                                           \
+       long value;                                             \
+       if (cnt >= sizeof(str))                                 \
+               return -EINVAL;                                 \
+       if (copy_from_user(str, buf, cnt))                      \
+               return -EFAULT;                                 \
+       value = simple_strtol(str, NULL, 0);                    \
+       if (value <= 0)                                         \
+               return -ERANGE;                                 \
+       sbi->s_mb_##name = value;                               \
+       return cnt;                                             \
+}
+
+MB_PROC_VALUE_READ(stats);
+MB_PROC_VALUE_WRITE(stats);
+MB_PROC_VALUE_READ(max_to_scan);
+MB_PROC_VALUE_WRITE(max_to_scan);
+MB_PROC_VALUE_READ(min_to_scan);
+MB_PROC_VALUE_WRITE(min_to_scan);
+MB_PROC_VALUE_READ(order2_reqs);
+MB_PROC_VALUE_WRITE(order2_reqs);
+MB_PROC_VALUE_READ(stream_request);
+MB_PROC_VALUE_WRITE(stream_request);
+MB_PROC_VALUE_READ(group_prealloc);
+MB_PROC_VALUE_WRITE(group_prealloc);
+
+#define        MB_PROC_HANDLER(name, var)                                      \
+do {                                                                   \
+       proc = create_proc_entry(name, mode, sbi->s_mb_proc);           \
+       if (proc == NULL) {                                             \
+               printk(KERN_ERR "EXT4-fs: can't to create %s\n", name); \
+               goto err_out;                                           \
+       }                                                               \
+       proc->data = sbi;                                               \
+       proc->read_proc  = ext4_mb_read_##var ;                         \
+       proc->write_proc = ext4_mb_write_##var;                         \
+} while (0)
+
+static int ext4_mb_init_per_dev_proc(struct super_block *sb)
+{
+       mode_t mode = S_IFREG | S_IRUGO | S_IWUSR;
+       struct ext4_sb_info *sbi = EXT4_SB(sb);
+       struct proc_dir_entry *proc;
+       char devname[64];
+
+       snprintf(devname, sizeof(devname) - 1, "%s",
+               bdevname(sb->s_bdev, devname));
+       sbi->s_mb_proc = proc_mkdir(devname, proc_root_ext4);
+
+       MB_PROC_HANDLER(EXT4_MB_STATS_NAME, stats);
+       MB_PROC_HANDLER(EXT4_MB_MAX_TO_SCAN_NAME, max_to_scan);
+       MB_PROC_HANDLER(EXT4_MB_MIN_TO_SCAN_NAME, min_to_scan);
+       MB_PROC_HANDLER(EXT4_MB_ORDER2_REQ, order2_reqs);
+       MB_PROC_HANDLER(EXT4_MB_STREAM_REQ, stream_request);
+       MB_PROC_HANDLER(EXT4_MB_GROUP_PREALLOC, group_prealloc);
+
+       return 0;
+
+err_out:
+       printk(KERN_ERR "EXT4-fs: Unable to create %s\n", devname);
+       remove_proc_entry(EXT4_MB_GROUP_PREALLOC, sbi->s_mb_proc);
+       remove_proc_entry(EXT4_MB_STREAM_REQ, sbi->s_mb_proc);
+       remove_proc_entry(EXT4_MB_ORDER2_REQ, sbi->s_mb_proc);
+       remove_proc_entry(EXT4_MB_MIN_TO_SCAN_NAME, sbi->s_mb_proc);
+       remove_proc_entry(EXT4_MB_MAX_TO_SCAN_NAME, sbi->s_mb_proc);
+       remove_proc_entry(EXT4_MB_STATS_NAME, sbi->s_mb_proc);
+       remove_proc_entry(devname, proc_root_ext4);
+       sbi->s_mb_proc = NULL;
+
+       return -ENOMEM;
+}
+
+static int ext4_mb_destroy_per_dev_proc(struct super_block *sb)
+{
+       struct ext4_sb_info *sbi = EXT4_SB(sb);
+       char devname[64];
+
+       if (sbi->s_mb_proc == NULL)
+               return -EINVAL;
+
+       snprintf(devname, sizeof(devname) - 1, "%s",
+               bdevname(sb->s_bdev, devname));
+       remove_proc_entry(EXT4_MB_GROUP_PREALLOC, sbi->s_mb_proc);
+       remove_proc_entry(EXT4_MB_STREAM_REQ, sbi->s_mb_proc);
+       remove_proc_entry(EXT4_MB_ORDER2_REQ, sbi->s_mb_proc);
+       remove_proc_entry(EXT4_MB_MIN_TO_SCAN_NAME, sbi->s_mb_proc);
+       remove_proc_entry(EXT4_MB_MAX_TO_SCAN_NAME, sbi->s_mb_proc);
+       remove_proc_entry(EXT4_MB_STATS_NAME, sbi->s_mb_proc);
+       remove_proc_entry(devname, proc_root_ext4);
+
+       return 0;
+}
+
+int __init init_ext4_mballoc(void)
+{
+       ext4_pspace_cachep =
+               kmem_cache_create("ext4_prealloc_space",
+                                    sizeof(struct ext4_prealloc_space),
+                                    0, SLAB_RECLAIM_ACCOUNT, NULL);
+       if (ext4_pspace_cachep == NULL)
+               return -ENOMEM;
+
+#ifdef CONFIG_PROC_FS
+       proc_root_ext4 = proc_mkdir(EXT4_ROOT, proc_root_fs);
+       if (proc_root_ext4 == NULL)
+               printk(KERN_ERR "EXT4-fs: Unable to create %s\n", EXT4_ROOT);
+#endif
+
+       return 0;
+}
+
+void exit_ext4_mballoc(void)
+{
+       /* XXX: synchronize_rcu(); */
+       kmem_cache_destroy(ext4_pspace_cachep);
+#ifdef CONFIG_PROC_FS
+       remove_proc_entry(EXT4_ROOT, proc_root_fs);
+#endif
+}
+
+
+/*
+ * Check quota and mark choosed space (ac->ac_b_ex) non-free in bitmaps
+ * Returns 0 if success or error code
+ */
+static int ext4_mb_mark_diskspace_used(struct ext4_allocation_context *ac,
+                               handle_t *handle)
+{
+       struct buffer_head *bitmap_bh = NULL;
+       struct ext4_super_block *es;
+       struct ext4_group_desc *gdp;
+       struct buffer_head *gdp_bh;
+       struct ext4_sb_info *sbi;
+       struct super_block *sb;
+       ext4_fsblk_t block;
+       int err;
+
+       BUG_ON(ac->ac_status != AC_STATUS_FOUND);
+       BUG_ON(ac->ac_b_ex.fe_len <= 0);
+
+       sb = ac->ac_sb;
+       sbi = EXT4_SB(sb);
+       es = sbi->s_es;
+
+       ext4_debug("using block group %lu(%d)\n", ac->ac_b_ex.fe_group,
+                       gdp->bg_free_blocks_count);
+
+       err = -EIO;
+       bitmap_bh = read_block_bitmap(sb, ac->ac_b_ex.fe_group);
+       if (!bitmap_bh)
+               goto out_err;
+
+       err = ext4_journal_get_write_access(handle, bitmap_bh);
+       if (err)
+               goto out_err;
+
+       err = -EIO;
+       gdp = ext4_get_group_desc(sb, ac->ac_b_ex.fe_group, &gdp_bh);
+       if (!gdp)
+               goto out_err;
+
+       err = ext4_journal_get_write_access(handle, gdp_bh);
+       if (err)
+               goto out_err;
+
+       block = ac->ac_b_ex.fe_group * EXT4_BLOCKS_PER_GROUP(sb)
+               + ac->ac_b_ex.fe_start
+               + le32_to_cpu(es->s_first_data_block);
+
+       if (block == ext4_block_bitmap(sb, gdp) ||
+                       block == ext4_inode_bitmap(sb, gdp) ||
+                       in_range(block, ext4_inode_table(sb, gdp),
+                               EXT4_SB(sb)->s_itb_per_group)) {
+
+               ext4_error(sb, __FUNCTION__,
+                          "Allocating block in system zone - block = %llu",
+                          block);
+       }
+#ifdef AGGRESSIVE_CHECK
+       {
+               int i;
+               for (i = 0; i < ac->ac_b_ex.fe_len; i++) {
+                       BUG_ON(mb_test_bit(ac->ac_b_ex.fe_start + i,
+                                               bitmap_bh->b_data));
+               }
+       }
+#endif
+       mb_set_bits(sb_bgl_lock(sbi, ac->ac_b_ex.fe_group), bitmap_bh->b_data,
+                               ac->ac_b_ex.fe_start, ac->ac_b_ex.fe_len);
+
+       spin_lock(sb_bgl_lock(sbi, ac->ac_b_ex.fe_group));
+       if (gdp->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT)) {
+               gdp->bg_flags &= cpu_to_le16(~EXT4_BG_BLOCK_UNINIT);
+               gdp->bg_free_blocks_count =
+                       cpu_to_le16(ext4_free_blocks_after_init(sb,
+                                               ac->ac_b_ex.fe_group,
+                                               gdp));
+       }
+       gdp->bg_free_blocks_count =
+               cpu_to_le16(le16_to_cpu(gdp->bg_free_blocks_count)
+                               - ac->ac_b_ex.fe_len);
+       gdp->bg_checksum = ext4_group_desc_csum(sbi, ac->ac_b_ex.fe_group, gdp);
+       spin_unlock(sb_bgl_lock(sbi, ac->ac_b_ex.fe_group));
+       percpu_counter_sub(&sbi->s_freeblocks_counter, ac->ac_b_ex.fe_len);
+
+       err = ext4_journal_dirty_metadata(handle, bitmap_bh);
+       if (err)
+               goto out_err;
+       err = ext4_journal_dirty_metadata(handle, gdp_bh);
+
+out_err:
+       sb->s_dirt = 1;
+       put_bh(bitmap_bh);
+       return err;
+}
+
+/*
+ * here we normalize request for locality group
+ * Group request are normalized to s_strip size if we set the same via mount
+ * option. If not we set it to s_mb_group_prealloc which can be configured via
+ * /proc/fs/ext4/<partition>/group_prealloc
+ *
+ * XXX: should we try to preallocate more than the group has now?
+ */
+static void ext4_mb_normalize_group_request(struct ext4_allocation_context *ac)
+{
+       struct super_block *sb = ac->ac_sb;
+       struct ext4_locality_group *lg = ac->ac_lg;
+
+       BUG_ON(lg == NULL);
+       if (EXT4_SB(sb)->s_stripe)
+               ac->ac_g_ex.fe_len = EXT4_SB(sb)->s_stripe;
+       else
+               ac->ac_g_ex.fe_len = EXT4_SB(sb)->s_mb_group_prealloc;
+       mb_debug("#%u: goal %lu blocks for locality group\n",
+               current->pid, ac->ac_g_ex.fe_len);
+}
+
+/*
+ * Normalization means making request better in terms of
+ * size and alignment
+ */
+static void ext4_mb_normalize_request(struct ext4_allocation_context *ac,
+                               struct ext4_allocation_request *ar)
+{
+       int bsbits, max;
+       ext4_lblk_t end;
+       struct list_head *cur;
+       loff_t size, orig_size, start_off;
+       ext4_lblk_t start, orig_start;
+       struct ext4_inode_info *ei = EXT4_I(ac->ac_inode);
+
+       /* do normalize only data requests, metadata requests
+          do not need preallocation */
+       if (!(ac->ac_flags & EXT4_MB_HINT_DATA))
+               return;
+
+       /* sometime caller may want exact blocks */
+       if (unlikely(ac->ac_flags & EXT4_MB_HINT_GOAL_ONLY))
+               return;
+
+       /* caller may indicate that preallocation isn't
+        * required (it's a tail, for example) */
+       if (ac->ac_flags & EXT4_MB_HINT_NOPREALLOC)
+               return;
+
+       if (ac->ac_flags & EXT4_MB_HINT_GROUP_ALLOC) {
+               ext4_mb_normalize_group_request(ac);
+               return ;
+       }
+
+       bsbits = ac->ac_sb->s_blocksize_bits;
+
+       /* first, let's learn actual file size
+        * given current request is allocated */
+       size = ac->ac_o_ex.fe_logical + ac->ac_o_ex.fe_len;
+       size = size << bsbits;
+       if (size < i_size_read(ac->ac_inode))
+               size = i_size_read(ac->ac_inode);
+
+       /* max available blocks in a free group */
+       max = EXT4_BLOCKS_PER_GROUP(ac->ac_sb) - 1 - 1 -
+                               EXT4_SB(ac->ac_sb)->s_itb_per_group;
+
+#define NRL_CHECK_SIZE(req, size, max,bits)    \
+               (req <= (size) || max <= ((size) >> bits))
+
+       /* first, try to predict filesize */
+       /* XXX: should this table be tunable? */
+       start_off = 0;
+       if (size <= 16 * 1024) {
+               size = 16 * 1024;
+       } else if (size <= 32 * 1024) {
+               size = 32 * 1024;
+       } else if (size <= 64 * 1024) {
+               size = 64 * 1024;
+       } else if (size <= 128 * 1024) {
+               size = 128 * 1024;
+       } else if (size <= 256 * 1024) {
+               size = 256 * 1024;
+       } else if (size <= 512 * 1024) {
+               size = 512 * 1024;
+       } else if (size <= 1024 * 1024) {
+               size = 1024 * 1024;
+       } else if (NRL_CHECK_SIZE(size, 4 * 1024 * 1024, max, bsbits)) {
+               start_off = ((loff_t)ac->ac_o_ex.fe_logical >>
+                                               (20 - bsbits)) << 20;
+               size = 1024 * 1024;
+       } else if (NRL_CHECK_SIZE(size, 8 * 1024 * 1024, max, bsbits)) {
+               start_off = ((loff_t)ac->ac_o_ex.fe_logical >>
+                                                       (22 - bsbits)) << 22;
+               size = 4 * 1024 * 1024;
+       } else if (NRL_CHECK_SIZE(ac->ac_o_ex.fe_len,
+                                       (8<<20)>>bsbits, max, bsbits)) {
+               start_off = ((loff_t)ac->ac_o_ex.fe_logical >>
+                                                       (23 - bsbits)) << 23;
+               size = 8 * 1024 * 1024;
+       } else {
+               start_off = (loff_t)ac->ac_o_ex.fe_logical << bsbits;
+               size      = ac->ac_o_ex.fe_len << bsbits;
+       }
+       orig_size = size = size >> bsbits;
+       orig_start = start = start_off >> bsbits;
+
+       /* don't cover already allocated blocks in selected range */
+       if (ar->pleft && start <= ar->lleft) {
+               size -= ar->lleft + 1 - start;
+               start = ar->lleft + 1;
+       }
+       if (ar->pright && start + size - 1 >= ar->lright)
+               size -= start + size - ar->lright;
+
+       end = start + size;
+
+       /* check we don't cross already preallocated blocks */
+       rcu_read_lock();
+       list_for_each_rcu(cur, &ei->i_prealloc_list) {
+               struct ext4_prealloc_space *pa;
+               unsigned long pa_end;
+
+               pa = list_entry(cur, struct ext4_prealloc_space, pa_inode_list);
+
+               if (pa->pa_deleted)
+                       continue;
+               spin_lock(&pa->pa_lock);
+               if (pa->pa_deleted) {
+                       spin_unlock(&pa->pa_lock);
+                       continue;
+               }
+
+               pa_end = pa->pa_lstart + pa->pa_len;
+
+               /* PA must not overlap original request */
+               BUG_ON(!(ac->ac_o_ex.fe_logical >= pa_end ||
+                       ac->ac_o_ex.fe_logical < pa->pa_lstart));
+
+               /* skip PA normalized request doesn't overlap with */
+               if (pa->pa_lstart >= end) {
+                       spin_unlock(&pa->pa_lock);
+                       continue;
+               }
+               if (pa_end <= start) {
+                       spin_unlock(&pa->pa_lock);
+                       continue;
+               }
+               BUG_ON(pa->pa_lstart <= start && pa_end >= end);
+
+               if (pa_end <= ac->ac_o_ex.fe_logical) {
+                       BUG_ON(pa_end < start);
+                       start = pa_end;
+               }
+
+               if (pa->pa_lstart > ac->ac_o_ex.fe_logical) {
+                       BUG_ON(pa->pa_lstart > end);
+                       end = pa->pa_lstart;
+               }
+               spin_unlock(&pa->pa_lock);
+       }
+       rcu_read_unlock();
+       size = end - start;
+
+       /* XXX: extra loop to check we really don't overlap preallocations */
+       rcu_read_lock();
+       list_for_each_rcu(cur, &ei->i_prealloc_list) {
+               struct ext4_prealloc_space *pa;
+               unsigned long pa_end;
+               pa = list_entry(cur, struct ext4_prealloc_space, pa_inode_list);
+               spin_lock(&pa->pa_lock);
+               if (pa->pa_deleted == 0) {
+                       pa_end = pa->pa_lstart + pa->pa_len;
+                       BUG_ON(!(start >= pa_end || end <= pa->pa_lstart));
+               }
+               spin_unlock(&pa->pa_lock);
+       }
+       rcu_read_unlock();
+
+       if (start + size <= ac->ac_o_ex.fe_logical &&
+                       start > ac->ac_o_ex.fe_logical) {
+               printk(KERN_ERR "start %lu, size %lu, fe_logical %lu\n",
+                       (unsigned long) start, (unsigned long) size,
+                       (unsigned long) ac->ac_o_ex.fe_logical);
+       }
+       BUG_ON(start + size <= ac->ac_o_ex.fe_logical &&
+                       start > ac->ac_o_ex.fe_logical);
+       BUG_ON(size <= 0 || size >= EXT4_BLOCKS_PER_GROUP(ac->ac_sb));
+
+       /* now prepare goal request */
+
+       /* XXX: is it better to align blocks WRT to logical
+        * placement or satisfy big request as is */
+       ac->ac_g_ex.fe_logical = start;
+       ac->ac_g_ex.fe_len = size;
+
+       /* define goal start in order to merge */
+       if (ar->pright && (ar->lright == (start + size))) {
+               /* merge to the right */
+               ext4_get_group_no_and_offset(ac->ac_sb, ar->pright - size,
+                                               &ac->ac_f_ex.fe_group,
+                                               &ac->ac_f_ex.fe_start);
+               ac->ac_flags |= EXT4_MB_HINT_TRY_GOAL;
+       }
+       if (ar->pleft && (ar->lleft + 1 == start)) {
+               /* merge to the left */
+               ext4_get_group_no_and_offset(ac->ac_sb, ar->pleft + 1,
+                                               &ac->ac_f_ex.fe_group,
+                                               &ac->ac_f_ex.fe_start);
+               ac->ac_flags |= EXT4_MB_HINT_TRY_GOAL;
+       }
+
+       mb_debug("goal: %u(was %u) blocks at %u\n", (unsigned) size,
+               (unsigned) orig_size, (unsigned) start);
+}
+
+static void ext4_mb_collect_stats(struct ext4_allocation_context *ac)
+{
+       struct ext4_sb_info *sbi = EXT4_SB(ac->ac_sb);
+
+       if (sbi->s_mb_stats && ac->ac_g_ex.fe_len > 1) {
+               atomic_inc(&sbi->s_bal_reqs);
+               atomic_add(ac->ac_b_ex.fe_len, &sbi->s_bal_allocated);
+               if (ac->ac_o_ex.fe_len >= ac->ac_g_ex.fe_len)
+                       atomic_inc(&sbi->s_bal_success);
+               atomic_add(ac->ac_found, &sbi->s_bal_ex_scanned);
+               if (ac->ac_g_ex.fe_start == ac->ac_b_ex.fe_start &&
+                               ac->ac_g_ex.fe_group == ac->ac_b_ex.fe_group)
+                       atomic_inc(&sbi->s_bal_goals);
+               if (ac->ac_found > sbi->s_mb_max_to_scan)
+                       atomic_inc(&sbi->s_bal_breaks);
+       }
+
+       ext4_mb_store_history(ac);
+}
+
+/*
+ * use blocks preallocated to inode
+ */
+static void ext4_mb_use_inode_pa(struct ext4_allocation_context *ac,
+                               struct ext4_prealloc_space *pa)
+{
+       ext4_fsblk_t start;
+       ext4_fsblk_t end;
+       int len;
+
+       /* found preallocated blocks, use them */
+       start = pa->pa_pstart + (ac->ac_o_ex.fe_logical - pa->pa_lstart);
+       end = min(pa->pa_pstart + pa->pa_len, start + ac->ac_o_ex.fe_len);
+       len = end - start;
+       ext4_get_group_no_and_offset(ac->ac_sb, start, &ac->ac_b_ex.fe_group,
+                                       &ac->ac_b_ex.fe_start);
+       ac->ac_b_ex.fe_len = len;
+       ac->ac_status = AC_STATUS_FOUND;
+       ac->ac_pa = pa;
+
+       BUG_ON(start < pa->pa_pstart);
+       BUG_ON(start + len > pa->pa_pstart + pa->pa_len);
+       BUG_ON(pa->pa_free < len);
+       pa->pa_free -= len;
+
+       mb_debug("use %llu/%lu from inode pa %p\n", start, len, pa);
+}
+
+/*
+ * use blocks preallocated to locality group
+ */
+static void ext4_mb_use_group_pa(struct ext4_allocation_context *ac,
+                               struct ext4_prealloc_space *pa)
+{
+       unsigned len = ac->ac_o_ex.fe_len;
+
+       ext4_get_group_no_and_offset(ac->ac_sb, pa->pa_pstart,
+                                       &ac->ac_b_ex.fe_group,
+                                       &ac->ac_b_ex.fe_start);
+       ac->ac_b_ex.fe_len = len;
+       ac->ac_status = AC_STATUS_FOUND;
+       ac->ac_pa = pa;
+
+       /* we don't correct pa_pstart or pa_plen here to avoid
+        * possible race when tte group is being loaded concurrently
+        * instead we correct pa later, after blocks are marked
+        * in on-disk bitmap -- see ext4_mb_release_context() */
+       /*
+        * FIXME!! but the other CPUs can look at this particular
+        * pa and think that it have enought free blocks if we
+        * don't update pa_free here right ?
+        */
+       mb_debug("use %u/%u from group pa %p\n", pa->pa_lstart-len, len, pa);
+}
+
+/*
+ * search goal blocks in preallocated space
+ */
+static int ext4_mb_use_preallocated(struct ext4_allocation_context *ac)
+{
+       struct ext4_inode_info *ei = EXT4_I(ac->ac_inode);
+       struct ext4_locality_group *lg;
+       struct ext4_prealloc_space *pa;
+       struct list_head *cur;
+
+       /* only data can be preallocated */
+       if (!(ac->ac_flags & EXT4_MB_HINT_DATA))
+               return 0;
+
+       /* first, try per-file preallocation */
+       rcu_read_lock();
+       list_for_each_rcu(cur, &ei->i_prealloc_list) {
+               pa = list_entry(cur, struct ext4_prealloc_space, pa_inode_list);
+
+               /* all fields in this condition don't change,
+                * so we can skip locking for them */
+               if (ac->ac_o_ex.fe_logical < pa->pa_lstart ||
+                       ac->ac_o_ex.fe_logical >= pa->pa_lstart + pa->pa_len)
+                       continue;
+
+               /* found preallocated blocks, use them */
+               spin_lock(&pa->pa_lock);
+               if (pa->pa_deleted == 0 && pa->pa_free) {
+                       atomic_inc(&pa->pa_count);
+                       ext4_mb_use_inode_pa(ac, pa);
+                       spin_unlock(&pa->pa_lock);
+                       ac->ac_criteria = 10;
+                       rcu_read_unlock();
+                       return 1;
+               }
+               spin_unlock(&pa->pa_lock);
+       }
+       rcu_read_unlock();
+
+       /* can we use group allocation? */
+       if (!(ac->ac_flags & EXT4_MB_HINT_GROUP_ALLOC))
+               return 0;
+
+       /* inode may have no locality group for some reason */
+       lg = ac->ac_lg;
+       if (lg == NULL)
+               return 0;
+
+       rcu_read_lock();
+       list_for_each_rcu(cur, &lg->lg_prealloc_list) {
+               pa = list_entry(cur, struct ext4_prealloc_space, pa_inode_list);
+               spin_lock(&pa->pa_lock);
+               if (pa->pa_deleted == 0 && pa->pa_free >= ac->ac_o_ex.fe_len) {
+                       atomic_inc(&pa->pa_count);
+                       ext4_mb_use_group_pa(ac, pa);
+                       spin_unlock(&pa->pa_lock);
+                       ac->ac_criteria = 20;
+                       rcu_read_unlock();
+                       return 1;
+               }
+               spin_unlock(&pa->pa_lock);
+       }
+       rcu_read_unlock();
+
+       return 0;
+}
+
+/*
+ * the function goes through all preallocation in this group and marks them
+ * used in in-core bitmap. buddy must be generated from this bitmap
+ * Need to be called with ext4 group lock (ext4_lock_group)
+ */
+static void ext4_mb_generate_from_pa(struct super_block *sb, void *bitmap,
+                                       ext4_group_t group)
+{
+       struct ext4_group_info *grp = ext4_get_group_info(sb, group);
+       struct ext4_prealloc_space *pa;
+       struct list_head *cur;
+       ext4_group_t groupnr;
+       ext4_grpblk_t start;
+       int preallocated = 0;
+       int count = 0;
+       int len;
+
+       /* all form of preallocation discards first load group,
+        * so the only competing code is preallocation use.
+        * we don't need any locking here
+        * notice we do NOT ignore preallocations with pa_deleted
+        * otherwise we could leave used blocks available for
+        * allocation in buddy when concurrent ext4_mb_put_pa()
+        * is dropping preallocation
+        */
+       list_for_each(cur, &grp->bb_prealloc_list) {
+               pa = list_entry(cur, struct ext4_prealloc_space, pa_group_list);
+               spin_lock(&pa->pa_lock);
+               ext4_get_group_no_and_offset(sb, pa->pa_pstart,
+                                            &groupnr, &start);
+               len = pa->pa_len;
+               spin_unlock(&pa->pa_lock);
+               if (unlikely(len == 0))
+                       continue;
+               BUG_ON(groupnr != group);
+               mb_set_bits(sb_bgl_lock(EXT4_SB(sb), group),
+                                               bitmap, start, len);
+               preallocated += len;
+               count++;
+       }
+       mb_debug("prellocated %u for group %lu\n", preallocated, group);
+}
+
+static void ext4_mb_pa_callback(struct rcu_head *head)
+{
+       struct ext4_prealloc_space *pa;
+       pa = container_of(head, struct ext4_prealloc_space, u.pa_rcu);
+       kmem_cache_free(ext4_pspace_cachep, pa);
+}
+
+/*
+ * drops a reference to preallocated space descriptor
+ * if this was the last reference and the space is consumed
+ */
+static void ext4_mb_put_pa(struct ext4_allocation_context *ac,
+                       struct super_block *sb, struct ext4_prealloc_space *pa)
+{
+       unsigned long grp;
+
+       if (!atomic_dec_and_test(&pa->pa_count) || pa->pa_free != 0)
+               return;
+
+       /* in this short window concurrent discard can set pa_deleted */
+       spin_lock(&pa->pa_lock);
+       if (pa->pa_deleted == 1) {
+               spin_unlock(&pa->pa_lock);
+               return;
+       }
+
+       pa->pa_deleted = 1;
+       spin_unlock(&pa->pa_lock);
+
+       /* -1 is to protect from crossing allocation group */
+       ext4_get_group_no_and_offset(sb, pa->pa_pstart - 1, &grp, NULL);
+
+       /*
+        * possible race:
+        *
+        *  P1 (buddy init)                     P2 (regular allocation)
+        *                                      find block B in PA
+        *  copy on-disk bitmap to buddy
+        *                                      mark B in on-disk bitmap
+        *                                      drop PA from group
+        *  mark all PAs in buddy
+        *
+        * thus, P1 initializes buddy with B available. to prevent this
+        * we make "copy" and "mark all PAs" atomic and serialize "drop PA"
+        * against that pair
+        */
+       ext4_lock_group(sb, grp);
+       list_del(&pa->pa_group_list);
+       ext4_unlock_group(sb, grp);
+
+       spin_lock(pa->pa_obj_lock);
+       list_del_rcu(&pa->pa_inode_list);
+       spin_unlock(pa->pa_obj_lock);
+
+       call_rcu(&(pa)->u.pa_rcu, ext4_mb_pa_callback);
+}
+
+/*
+ * creates new preallocated space for given inode
+ */
+static int ext4_mb_new_inode_pa(struct ext4_allocation_context *ac)
+{
+       struct super_block *sb = ac->ac_sb;
+       struct ext4_prealloc_space *pa;
+       struct ext4_group_info *grp;
+       struct ext4_inode_info *ei;
+
+       /* preallocate only when found space is larger then requested */
+       BUG_ON(ac->ac_o_ex.fe_len >= ac->ac_b_ex.fe_len);
+       BUG_ON(ac->ac_status != AC_STATUS_FOUND);
+       BUG_ON(!S_ISREG(ac->ac_inode->i_mode));
+
+       pa = kmem_cache_alloc(ext4_pspace_cachep, GFP_NOFS);
+       if (pa == NULL)
+               return -ENOMEM;
+
+       if (ac->ac_b_ex.fe_len < ac->ac_g_ex.fe_len) {
+               int winl;
+               int wins;
+               int win;
+               int offs;
+
+               /* we can't allocate as much as normalizer wants.
+                * so, found space must get proper lstart
+                * to cover original request */
+               BUG_ON(ac->ac_g_ex.fe_logical > ac->ac_o_ex.fe_logical);
+               BUG_ON(ac->ac_g_ex.fe_len < ac->ac_o_ex.fe_len);
+
+               /* we're limited by original request in that
+                * logical block must be covered any way
+                * winl is window we can move our chunk within */
+               winl = ac->ac_o_ex.fe_logical - ac->ac_g_ex.fe_logical;
+
+               /* also, we should cover whole original request */
+               wins = ac->ac_b_ex.fe_len - ac->ac_o_ex.fe_len;
+
+               /* the smallest one defines real window */
+               win = min(winl, wins);
+
+               offs = ac->ac_o_ex.fe_logical % ac->ac_b_ex.fe_len;
+               if (offs && offs < win)
+                       win = offs;
+
+               ac->ac_b_ex.fe_logical = ac->ac_o_ex.fe_logical - win;
+               BUG_ON(ac->ac_o_ex.fe_logical < ac->ac_b_ex.fe_logical);
+               BUG_ON(ac->ac_o_ex.fe_len > ac->ac_b_ex.fe_len);
+       }
+
+       /* preallocation can change ac_b_ex, thus we store actually
+        * allocated blocks for history */
+       ac->ac_f_ex = ac->ac_b_ex;
+
+       pa->pa_lstart = ac->ac_b_ex.fe_logical;
+       pa->pa_pstart = ext4_grp_offs_to_block(sb, &ac->ac_b_ex);
+       pa->pa_len = ac->ac_b_ex.fe_len;
+       pa->pa_free = pa->pa_len;
+       atomic_set(&pa->pa_count, 1);
+       spin_lock_init(&pa->pa_lock);
+       pa->pa_deleted = 0;
+       pa->pa_linear = 0;
+
+       mb_debug("new inode pa %p: %llu/%u for %u\n", pa,
+                       pa->pa_pstart, pa->pa_len, pa->pa_lstart);
+
+       ext4_mb_use_inode_pa(ac, pa);
+       atomic_add(pa->pa_free, &EXT4_SB(sb)->s_mb_preallocated);
+
+       ei = EXT4_I(ac->ac_inode);
+       grp = ext4_get_group_info(sb, ac->ac_b_ex.fe_group);
+
+       pa->pa_obj_lock = &ei->i_prealloc_lock;
+       pa->pa_inode = ac->ac_inode;
+
+       ext4_lock_group(sb, ac->ac_b_ex.fe_group);
+       list_add(&pa->pa_group_list, &grp->bb_prealloc_list);
+       ext4_unlock_group(sb, ac->ac_b_ex.fe_group);
+
+       spin_lock(pa->pa_obj_lock);
+       list_add_rcu(&pa->pa_inode_list, &ei->i_prealloc_list);
+       spin_unlock(pa->pa_obj_lock);
+
+       return 0;
+}
+
+/*
+ * creates new preallocated space for locality group inodes belongs to
+ */
+static int ext4_mb_new_group_pa(struct ext4_allocation_context *ac)
+{
+       struct super_block *sb = ac->ac_sb;
+       struct ext4_locality_group *lg;
+       struct ext4_prealloc_space *pa;
+       struct ext4_group_info *grp;
+
+       /* preallocate only when found space is larger then requested */
+       BUG_ON(ac->ac_o_ex.fe_len >= ac->ac_b_ex.fe_len);
+       BUG_ON(ac->ac_status != AC_STATUS_FOUND);
+       BUG_ON(!S_ISREG(ac->ac_inode->i_mode));
+
+       BUG_ON(ext4_pspace_cachep == NULL);
+       pa = kmem_cache_alloc(ext4_pspace_cachep, GFP_NOFS);
+       if (pa == NULL)
+               return -ENOMEM;
+
+       /* preallocation can change ac_b_ex, thus we store actually
+        * allocated blocks for history */
+       ac->ac_f_ex = ac->ac_b_ex;
+
+       pa->pa_pstart = ext4_grp_offs_to_block(sb, &ac->ac_b_ex);
+       pa->pa_lstart = pa->pa_pstart;
+       pa->pa_len = ac->ac_b_ex.fe_len;
+       pa->pa_free = pa->pa_len;
+       atomic_set(&pa->pa_count, 1);
+       spin_lock_init(&pa->pa_lock);
+       pa->pa_deleted = 0;
+       pa->pa_linear = 1;
+
+       mb_debug("new group pa %p: %llu/%u for %u\n", pa,
+                       pa->pa_pstart, pa->pa_len, pa->pa_lstart);
+
+       ext4_mb_use_group_pa(ac, pa);
+       atomic_add(pa->pa_free, &EXT4_SB(sb)->s_mb_preallocated);
+
+       grp = ext4_get_group_info(sb, ac->ac_b_ex.fe_group);
+       lg = ac->ac_lg;
+       BUG_ON(lg == NULL);
+
+       pa->pa_obj_lock = &lg->lg_prealloc_lock;
+       pa->pa_inode = NULL;
+
+       ext4_lock_group(sb, ac->ac_b_ex.fe_group);
+       list_add(&pa->pa_group_list, &grp->bb_prealloc_list);
+       ext4_unlock_group(sb, ac->ac_b_ex.fe_group);
+
+       spin_lock(pa->pa_obj_lock);
+       list_add_tail_rcu(&pa->pa_inode_list, &lg->lg_prealloc_list);
+       spin_unlock(pa->pa_obj_lock);
+
+       return 0;
+}
+
+static int ext4_mb_new_preallocation(struct ext4_allocation_context *ac)
+{
+       int err;
+
+       if (ac->ac_flags & EXT4_MB_HINT_GROUP_ALLOC)
+               err = ext4_mb_new_group_pa(ac);
+       else
+               err = ext4_mb_new_inode_pa(ac);
+       return err;
+}
+
+/*
+ * finds all unused blocks in on-disk bitmap, frees them in
+ * in-core bitmap and buddy.
+ * @pa must be unlinked from inode and group lists, so that
+ * nobody else can find/use it.
+ * the caller MUST hold group/inode locks.
+ * TODO: optimize the case when there are no in-core structures yet
+ */
+static int ext4_mb_release_inode_pa(struct ext4_buddy *e4b,
+                               struct buffer_head *bitmap_bh,
+                               struct ext4_prealloc_space *pa)
+{
+       struct ext4_allocation_context ac;
+       struct super_block *sb = e4b->bd_sb;
+       struct ext4_sb_info *sbi = EXT4_SB(sb);
+       unsigned long end;
+       unsigned long next;
+       ext4_group_t group;
+       ext4_grpblk_t bit;
+       sector_t start;
+       int err = 0;
+       int free = 0;
+
+       BUG_ON(pa->pa_deleted == 0);
+       ext4_get_group_no_and_offset(sb, pa->pa_pstart, &group, &bit);
+       BUG_ON(group != e4b->bd_group && pa->pa_len != 0);
+       end = bit + pa->pa_len;
+
+       ac.ac_sb = sb;
+       ac.ac_inode = pa->pa_inode;
+       ac.ac_op = EXT4_MB_HISTORY_DISCARD;
+
+       while (bit < end) {
+               bit = ext4_find_next_zero_bit(bitmap_bh->b_data, end, bit);
+               if (bit >= end)
+                       break;
+               next = ext4_find_next_bit(bitmap_bh->b_data, end, bit);
+               if (next > end)
+                       next = end;
+               start = group * EXT4_BLOCKS_PER_GROUP(sb) + bit +
+                               le32_to_cpu(sbi->s_es->s_first_data_block);
+               mb_debug("    free preallocated %u/%u in group %u\n",
+                               (unsigned) start, (unsigned) next - bit,
+                               (unsigned) group);
+               free += next - bit;
+
+               ac.ac_b_ex.fe_group = group;
+               ac.ac_b_ex.fe_start = bit;
+               ac.ac_b_ex.fe_len = next - bit;
+               ac.ac_b_ex.fe_logical = 0;
+               ext4_mb_store_history(&ac);
+
+               mb_free_blocks(pa->pa_inode, e4b, bit, next - bit);
+               bit = next + 1;
+       }
+       if (free != pa->pa_free) {
+               printk(KERN_ERR "pa %p: logic %lu, phys. %lu, len %lu\n",
+                       pa, (unsigned long) pa->pa_lstart,
+                       (unsigned long) pa->pa_pstart,
+                       (unsigned long) pa->pa_len);
+               printk(KERN_ERR "free %u, pa_free %u\n", free, pa->pa_free);
+       }
+       BUG_ON(free != pa->pa_free);
+       atomic_add(free, &sbi->s_mb_discarded);
+
+       return err;
+}
+
+static int ext4_mb_release_group_pa(struct ext4_buddy *e4b,
+                               struct ext4_prealloc_space *pa)
+{
+       struct ext4_allocation_context ac;
+       struct super_block *sb = e4b->bd_sb;
+       ext4_group_t group;
+       ext4_grpblk_t bit;
+
+       ac.ac_op = EXT4_MB_HISTORY_DISCARD;
+
+       BUG_ON(pa->pa_deleted == 0);
+       ext4_get_group_no_and_offset(sb, pa->pa_pstart, &group, &bit);
+       BUG_ON(group != e4b->bd_group && pa->pa_len != 0);
+       mb_free_blocks(pa->pa_inode, e4b, bit, pa->pa_len);
+       atomic_add(pa->pa_len, &EXT4_SB(sb)->s_mb_discarded);
+
+       ac.ac_sb = sb;
+       ac.ac_inode = NULL;
+       ac.ac_b_ex.fe_group = group;
+       ac.ac_b_ex.fe_start = bit;
+       ac.ac_b_ex.fe_len = pa->pa_len;
+       ac.ac_b_ex.fe_logical = 0;
+       ext4_mb_store_history(&ac);
+
+       return 0;
+}
+
+/*
+ * releases all preallocations in given group
+ *
+ * first, we need to decide discard policy:
+ * - when do we discard
+ *   1) ENOSPC
+ * - how many do we discard
+ *   1) how many requested
+ */
+static int ext4_mb_discard_group_preallocations(struct super_block *sb,
+                                       ext4_group_t group, int needed)
+{
+       struct ext4_group_info *grp = ext4_get_group_info(sb, group);
+       struct buffer_head *bitmap_bh = NULL;
+       struct ext4_prealloc_space *pa, *tmp;
+       struct list_head list;
+       struct ext4_buddy e4b;
+       int err;
+       int busy = 0;
+       int free = 0;
+
+       mb_debug("discard preallocation for group %lu\n", group);
+
+       if (list_empty(&grp->bb_prealloc_list))
+               return 0;
+
+       bitmap_bh = read_block_bitmap(sb, group);
+       if (bitmap_bh == NULL) {
+               /* error handling here */
+               ext4_mb_release_desc(&e4b);
+               BUG_ON(bitmap_bh == NULL);
+       }
+
+       err = ext4_mb_load_buddy(sb, group, &e4b);
+       BUG_ON(err != 0); /* error handling here */
+
+       if (needed == 0)
+               needed = EXT4_BLOCKS_PER_GROUP(sb) + 1;
+
+       grp = ext4_get_group_info(sb, group);
+       INIT_LIST_HEAD(&list);
+
+repeat:
+       ext4_lock_group(sb, group);
+       list_for_each_entry_safe(pa, tmp,
+                               &grp->bb_prealloc_list, pa_group_list) {
+               spin_lock(&pa->pa_lock);
+               if (atomic_read(&pa->pa_count)) {
+                       spin_unlock(&pa->pa_lock);
+                       busy = 1;
+                       continue;
+               }
+               if (pa->pa_deleted) {
+                       spin_unlock(&pa->pa_lock);
+                       continue;
+               }
+
+               /* seems this one can be freed ... */
+               pa->pa_deleted = 1;
+
+               /* we can trust pa_free ... */
+               free += pa->pa_free;
+
+               spin_unlock(&pa->pa_lock);
+
+               list_del(&pa->pa_group_list);
+               list_add(&pa->u.pa_tmp_list, &list);
+       }
+
+       /* if we still need more blocks and some PAs were used, try again */
+       if (free < needed && busy) {
+               busy = 0;
+               ext4_unlock_group(sb, group);
+               /*
+                * Yield the CPU here so that we don't get soft lockup
+                * in non preempt case.
+                */
+               yield();
+               goto repeat;
+       }
+
+       /* found anything to free? */
+       if (list_empty(&list)) {
+               BUG_ON(free != 0);
+               goto out;
+       }
+
+       /* now free all selected PAs */
+       list_for_each_entry_safe(pa, tmp, &list, u.pa_tmp_list) {
+
+               /* remove from object (inode or locality group) */
+               spin_lock(pa->pa_obj_lock);
+               list_del_rcu(&pa->pa_inode_list);
+               spin_unlock(pa->pa_obj_lock);
+
+               if (pa->pa_linear)
+                       ext4_mb_release_group_pa(&e4b, pa);
+               else
+                       ext4_mb_release_inode_pa(&e4b, bitmap_bh, pa);
+
+               list_del(&pa->u.pa_tmp_list);
+               call_rcu(&(pa)->u.pa_rcu, ext4_mb_pa_callback);
+       }
+
+out:
+       ext4_unlock_group(sb, group);
+       ext4_mb_release_desc(&e4b);
+       put_bh(bitmap_bh);
+       return free;
+}
+
+/*
+ * releases all non-used preallocated blocks for given inode
+ *
+ * It's important to discard preallocations under i_data_sem
+ * We don't want another block to be served from the prealloc
+ * space when we are discarding the inode prealloc space.
+ *
+ * FIXME!! Make sure it is valid at all the call sites
+ */
+void ext4_mb_discard_inode_preallocations(struct inode *inode)
+{
+       struct ext4_inode_info *ei = EXT4_I(inode);
+       struct super_block *sb = inode->i_sb;
+       struct buffer_head *bitmap_bh = NULL;
+       struct ext4_prealloc_space *pa, *tmp;
+       ext4_group_t group = 0;
+       struct list_head list;
+       struct ext4_buddy e4b;
+       int err;
+
+       if (!test_opt(sb, MBALLOC) || !S_ISREG(inode->i_mode)) {
+               /*BUG_ON(!list_empty(&ei->i_prealloc_list));*/
+               return;
+       }
+
+       mb_debug("discard preallocation for inode %lu\n", inode->i_ino);
+
+       INIT_LIST_HEAD(&list);
+
+repeat:
+       /* first, collect all pa's in the inode */
+       spin_lock(&ei->i_prealloc_lock);
+       while (!list_empty(&ei->i_prealloc_list)) {
+               pa = list_entry(ei->i_prealloc_list.next,
+                               struct ext4_prealloc_space, pa_inode_list);
+               BUG_ON(pa->pa_obj_lock != &ei->i_prealloc_lock);
+               spin_lock(&pa->pa_lock);
+               if (atomic_read(&pa->pa_count)) {
+                       /* this shouldn't happen often - nobody should
+                        * use preallocation while we're discarding it */
+                       spin_unlock(&pa->pa_lock);
+                       spin_unlock(&ei->i_prealloc_lock);
+                       printk(KERN_ERR "uh-oh! used pa while discarding\n");
+                       WARN_ON(1);
+                       schedule_timeout_uninterruptible(HZ);
+                       goto repeat;
+
+               }
+               if (pa->pa_deleted == 0) {
+                       pa->pa_deleted = 1;
+                       spin_unlock(&pa->pa_lock);
+                       list_del_rcu(&pa->pa_inode_list);
+                       list_add(&pa->u.pa_tmp_list, &list);
+                       continue;
+               }
+
+               /* someone is deleting pa right now */
+               spin_unlock(&pa->pa_lock);
+               spin_unlock(&ei->i_prealloc_lock);
+
+               /* we have to wait here because pa_deleted
+                * doesn't mean pa is already unlinked from
+                * the list. as we might be called from
+                * ->clear_inode() the inode will get freed
+                * and concurrent thread which is unlinking
+                * pa from inode's list may access already
+                * freed memory, bad-bad-bad */
+
+               /* XXX: if this happens too often, we can
+                * add a flag to force wait only in case
+                * of ->clear_inode(), but not in case of
+                * regular truncate */
+               schedule_timeout_uninterruptible(HZ);
+               goto repeat;
+       }
+       spin_unlock(&ei->i_prealloc_lock);
+
+       list_for_each_entry_safe(pa, tmp, &list, u.pa_tmp_list) {
+               BUG_ON(pa->pa_linear != 0);
+               ext4_get_group_no_and_offset(sb, pa->pa_pstart, &group, NULL);
+
+               err = ext4_mb_load_buddy(sb, group, &e4b);
+               BUG_ON(err != 0); /* error handling here */
+
+               bitmap_bh = read_block_bitmap(sb, group);
+               if (bitmap_bh == NULL) {
+                       /* error handling here */
+                       ext4_mb_release_desc(&e4b);
+                       BUG_ON(bitmap_bh == NULL);
+               }
+
+               ext4_lock_group(sb, group);
+               list_del(&pa->pa_group_list);
+               ext4_mb_release_inode_pa(&e4b, bitmap_bh, pa);
+               ext4_unlock_group(sb, group);
+
+               ext4_mb_release_desc(&e4b);
+               put_bh(bitmap_bh);
+
+               list_del(&pa->u.pa_tmp_list);
+               call_rcu(&(pa)->u.pa_rcu, ext4_mb_pa_callback);
+       }
+}
+
+/*
+ * finds all preallocated spaces and return blocks being freed to them
+ * if preallocated space becomes full (no block is used from the space)
+ * then the function frees space in buddy
+ * XXX: at the moment, truncate (which is the only way to free blocks)
+ * discards all preallocations
+ */
+static void ext4_mb_return_to_preallocation(struct inode *inode,
+                                       struct ext4_buddy *e4b,
+                                       sector_t block, int count)
+{
+       BUG_ON(!list_empty(&EXT4_I(inode)->i_prealloc_list));
+}
+#ifdef MB_DEBUG
+static void ext4_mb_show_ac(struct ext4_allocation_context *ac)
+{
+       struct super_block *sb = ac->ac_sb;
+       ext4_group_t i;
+
+       printk(KERN_ERR "EXT4-fs: Can't allocate:"
+                       " Allocation context details:\n");
+       printk(KERN_ERR "EXT4-fs: status %d flags %d\n",
+                       ac->ac_status, ac->ac_flags);
+       printk(KERN_ERR "EXT4-fs: orig %lu/%lu/%lu@%lu, goal %lu/%lu/%lu@%lu, "
+                       "best %lu/%lu/%lu@%lu cr %d\n",
+                       (unsigned long)ac->ac_o_ex.fe_group,
+                       (unsigned long)ac->ac_o_ex.fe_start,
+                       (unsigned long)ac->ac_o_ex.fe_len,
+                       (unsigned long)ac->ac_o_ex.fe_logical,
+                       (unsigned long)ac->ac_g_ex.fe_group,
+                       (unsigned long)ac->ac_g_ex.fe_start,
+                       (unsigned long)ac->ac_g_ex.fe_len,
+                       (unsigned long)ac->ac_g_ex.fe_logical,
+                       (unsigned long)ac->ac_b_ex.fe_group,
+                       (unsigned long)ac->ac_b_ex.fe_start,
+                       (unsigned long)ac->ac_b_ex.fe_len,
+                       (unsigned long)ac->ac_b_ex.fe_logical,
+                       (int)ac->ac_criteria);
+       printk(KERN_ERR "EXT4-fs: %lu scanned, %d found\n", ac->ac_ex_scanned,
+               ac->ac_found);
+       printk(KERN_ERR "EXT4-fs: groups: \n");
+       for (i = 0; i < EXT4_SB(sb)->s_groups_count; i++) {
+               struct ext4_group_info *grp = ext4_get_group_info(sb, i);
+               struct ext4_prealloc_space *pa;
+               ext4_grpblk_t start;
+               struct list_head *cur;
+               ext4_lock_group(sb, i);
+               list_for_each(cur, &grp->bb_prealloc_list) {
+                       pa = list_entry(cur, struct ext4_prealloc_space,
+                                       pa_group_list);
+                       spin_lock(&pa->pa_lock);
+                       ext4_get_group_no_and_offset(sb, pa->pa_pstart,
+                                                    NULL, &start);
+                       spin_unlock(&pa->pa_lock);
+                       printk(KERN_ERR "PA:%lu:%d:%u \n", i,
+                                                       start, pa->pa_len);
+               }
+               ext4_lock_group(sb, i);
+
+               if (grp->bb_free == 0)
+                       continue;
+               printk(KERN_ERR "%lu: %d/%d \n",
+                      i, grp->bb_free, grp->bb_fragments);
+       }
+       printk(KERN_ERR "\n");
+}
+#else
+static inline void ext4_mb_show_ac(struct ext4_allocation_context *ac)
+{
+       return;
+}
+#endif
+
+/*
+ * We use locality group preallocation for small size file. The size of the
+ * file is determined by the current size or the resulting size after
+ * allocation which ever is larger
+ *
+ * One can tune this size via /proc/fs/ext4/<partition>/stream_req
+ */
+static void ext4_mb_group_or_file(struct ext4_allocation_context *ac)
+{
+       struct ext4_sb_info *sbi = EXT4_SB(ac->ac_sb);
+       int bsbits = ac->ac_sb->s_blocksize_bits;
+       loff_t size, isize;
+
+       if (!(ac->ac_flags & EXT4_MB_HINT_DATA))
+               return;
+
+       size = ac->ac_o_ex.fe_logical + ac->ac_o_ex.fe_len;
+       isize = i_size_read(ac->ac_inode) >> bsbits;
+       size = max(size, isize);
+
+       /* don't use group allocation for large files */
+       if (size >= sbi->s_mb_stream_request)
+               return;
+
+       if (unlikely(ac->ac_flags & EXT4_MB_HINT_GOAL_ONLY))
+               return;
+
+       BUG_ON(ac->ac_lg != NULL);
+       /*
+        * locality group prealloc space are per cpu. The reason for having
+        * per cpu locality group is to reduce the contention between block
+        * request from multiple CPUs.
+        */
+       ac->ac_lg = &sbi->s_locality_groups[get_cpu()];
+       put_cpu();
+
+       /* we're going to use group allocation */
+       ac->ac_flags |= EXT4_MB_HINT_GROUP_ALLOC;
+
+       /* serialize all allocations in the group */
+       mutex_lock(&ac->ac_lg->lg_mutex);
+}
+
+static int ext4_mb_initialize_context(struct ext4_allocation_context *ac,
+                               struct ext4_allocation_request *ar)
+{
+       struct super_block *sb = ar->inode->i_sb;
+       struct ext4_sb_info *sbi = EXT4_SB(sb);
+       struct ext4_super_block *es = sbi->s_es;
+       ext4_group_t group;
+       unsigned long len;
+       unsigned long goal;
+       ext4_grpblk_t block;
+
+       /* we can't allocate > group size */
+       len = ar->len;
+
+       /* just a dirty hack to filter too big requests  */
+       if (len >= EXT4_BLOCKS_PER_GROUP(sb) - 10)
+               len = EXT4_BLOCKS_PER_GROUP(sb) - 10;
+
+       /* start searching from the goal */
+       goal = ar->goal;
+       if (goal < le32_to_cpu(es->s_first_data_block) ||
+                       goal >= ext4_blocks_count(es))
+               goal = le32_to_cpu(es->s_first_data_block);
+       ext4_get_group_no_and_offset(sb, goal, &group, &block);
+
+       /* set up allocation goals */
+       ac->ac_b_ex.fe_logical = ar->logical;
+       ac->ac_b_ex.fe_group = 0;
+       ac->ac_b_ex.fe_start = 0;
+       ac->ac_b_ex.fe_len = 0;
+       ac->ac_status = AC_STATUS_CONTINUE;
+       ac->ac_groups_scanned = 0;
+       ac->ac_ex_scanned = 0;
+       ac->ac_found = 0;
+       ac->ac_sb = sb;
+       ac->ac_inode = ar->inode;
+       ac->ac_o_ex.fe_logical = ar->logical;
+       ac->ac_o_ex.fe_group = group;
+       ac->ac_o_ex.fe_start = block;
+       ac->ac_o_ex.fe_len = len;
+       ac->ac_g_ex.fe_logical = ar->logical;
+       ac->ac_g_ex.fe_group = group;
+       ac->ac_g_ex.fe_start = block;
+       ac->ac_g_ex.fe_len = len;
+       ac->ac_f_ex.fe_len = 0;
+       ac->ac_flags = ar->flags;
+       ac->ac_2order = 0;
+       ac->ac_criteria = 0;
+       ac->ac_pa = NULL;
+       ac->ac_bitmap_page = NULL;
+       ac->ac_buddy_page = NULL;
+       ac->ac_lg = NULL;
+
+       /* we have to define context: we'll we work with a file or
+        * locality group. this is a policy, actually */
+       ext4_mb_group_or_file(ac);
+
+       mb_debug("init ac: %u blocks @ %u, goal %u, flags %x, 2^%d, "
+                       "left: %u/%u, right %u/%u to %swritable\n",
+                       (unsigned) ar->len, (unsigned) ar->logical,
+                       (unsigned) ar->goal, ac->ac_flags, ac->ac_2order,
+                       (unsigned) ar->lleft, (unsigned) ar->pleft,
+                       (unsigned) ar->lright, (unsigned) ar->pright,
+                       atomic_read(&ar->inode->i_writecount) ? "" : "non-");
+       return 0;
+
+}
+
+/*
+ * release all resource we used in allocation
+ */
+static int ext4_mb_release_context(struct ext4_allocation_context *ac)
+{
+       if (ac->ac_pa) {
+               if (ac->ac_pa->pa_linear) {
+                       /* see comment in ext4_mb_use_group_pa() */
+                       spin_lock(&ac->ac_pa->pa_lock);
+                       ac->ac_pa->pa_pstart += ac->ac_b_ex.fe_len;
+                       ac->ac_pa->pa_lstart += ac->ac_b_ex.fe_len;
+                       ac->ac_pa->pa_free -= ac->ac_b_ex.fe_len;
+                       ac->ac_pa->pa_len -= ac->ac_b_ex.fe_len;
+                       spin_unlock(&ac->ac_pa->pa_lock);
+               }
+               ext4_mb_put_pa(ac, ac->ac_sb, ac->ac_pa);
+       }
+       if (ac->ac_bitmap_page)
+               page_cache_release(ac->ac_bitmap_page);
+       if (ac->ac_buddy_page)
+               page_cache_release(ac->ac_buddy_page);
+       if (ac->ac_flags & EXT4_MB_HINT_GROUP_ALLOC)
+               mutex_unlock(&ac->ac_lg->lg_mutex);
+       ext4_mb_collect_stats(ac);
+       return 0;
+}
+
+static int ext4_mb_discard_preallocations(struct super_block *sb, int needed)
+{
+       ext4_group_t i;
+       int ret;
+       int freed = 0;
+
+       for (i = 0; i < EXT4_SB(sb)->s_groups_count && needed > 0; i++) {
+               ret = ext4_mb_discard_group_preallocations(sb, i, needed);
+               freed += ret;
+               needed -= ret;
+       }
+
+       return freed;
+}
+
+/*
+ * Main entry point into mballoc to allocate blocks
+ * it tries to use preallocation first, then falls back
+ * to usual allocation
+ */
+ext4_fsblk_t ext4_mb_new_blocks(handle_t *handle,
+                                struct ext4_allocation_request *ar, int *errp)
+{
+       struct ext4_allocation_context ac;
+       struct ext4_sb_info *sbi;
+       struct super_block *sb;
+       ext4_fsblk_t block = 0;
+       int freed;
+       int inquota;
+
+       sb = ar->inode->i_sb;
+       sbi = EXT4_SB(sb);
+
+       if (!test_opt(sb, MBALLOC)) {
+               block = ext4_new_blocks_old(handle, ar->inode, ar->goal,
+                                           &(ar->len), errp);
+               return block;
+       }
+
+       while (ar->len && DQUOT_ALLOC_BLOCK(ar->inode, ar->len)) {
+               ar->flags |= EXT4_MB_HINT_NOPREALLOC;
+               ar->len--;
+       }
+       if (ar->len == 0) {
+               *errp = -EDQUOT;
+               return 0;
+       }
+       inquota = ar->len;
+
+       ext4_mb_poll_new_transaction(sb, handle);
+
+       *errp = ext4_mb_initialize_context(&ac, ar);
+       if (*errp) {
+               ar->len = 0;
+               goto out;
+       }
+
+       ac.ac_op = EXT4_MB_HISTORY_PREALLOC;
+       if (!ext4_mb_use_preallocated(&ac)) {
+
+               ac.ac_op = EXT4_MB_HISTORY_ALLOC;
+               ext4_mb_normalize_request(&ac, ar);
+
+repeat:
+               /* allocate space in core */
+               ext4_mb_regular_allocator(&ac);
+
+               /* as we've just preallocated more space than
+                * user requested orinally, we store allocated
+                * space in a special descriptor */
+               if (ac.ac_status == AC_STATUS_FOUND &&
+                               ac.ac_o_ex.fe_len < ac.ac_b_ex.fe_len)
+                       ext4_mb_new_preallocation(&ac);
+       }
+
+       if (likely(ac.ac_status == AC_STATUS_FOUND)) {
+               ext4_mb_mark_diskspace_used(&ac, handle);
+               *errp = 0;
+               block = ext4_grp_offs_to_block(sb, &ac.ac_b_ex);
+               ar->len = ac.ac_b_ex.fe_len;
+       } else {
+               freed  = ext4_mb_discard_preallocations(sb, ac.ac_o_ex.fe_len);
+               if (freed)
+                       goto repeat;
+               *errp = -ENOSPC;
+               ac.ac_b_ex.fe_len = 0;
+               ar->len = 0;
+               ext4_mb_show_ac(&ac);
+       }
+
+       ext4_mb_release_context(&ac);
+
+out:
+       if (ar->len < inquota)
+               DQUOT_FREE_BLOCK(ar->inode, inquota - ar->len);
+
+       return block;
+}
+static void ext4_mb_poll_new_transaction(struct super_block *sb,
+                                               handle_t *handle)
+{
+       struct ext4_sb_info *sbi = EXT4_SB(sb);
+
+       if (sbi->s_last_transaction == handle->h_transaction->t_tid)
+               return;
+
+       /* new transaction! time to close last one and free blocks for
+        * committed transaction. we know that only transaction can be
+        * active, so previos transaction can be being logged and we
+        * know that transaction before previous is known to be already
+        * logged. this means that now we may free blocks freed in all
+        * transactions before previous one. hope I'm clear enough ... */
+
+       spin_lock(&sbi->s_md_lock);
+       if (sbi->s_last_transaction != handle->h_transaction->t_tid) {
+               mb_debug("new transaction %lu, old %lu\n",
+                               (unsigned long) handle->h_transaction->t_tid,
+                               (unsigned long) sbi->s_last_transaction);
+               list_splice_init(&sbi->s_closed_transaction,
+                               &sbi->s_committed_transaction);
+               list_splice_init(&sbi->s_active_transaction,
+                               &sbi->s_closed_transaction);
+               sbi->s_last_transaction = handle->h_transaction->t_tid;
+       }
+       spin_unlock(&sbi->s_md_lock);
+
+       ext4_mb_free_committed_blocks(sb);
+}
+
+static int ext4_mb_free_metadata(handle_t *handle, struct ext4_buddy *e4b,
+                         ext4_group_t group, ext4_grpblk_t block, int count)
+{
+       struct ext4_group_info *db = e4b->bd_info;
+       struct super_block *sb = e4b->bd_sb;
+       struct ext4_sb_info *sbi = EXT4_SB(sb);
+       struct ext4_free_metadata *md;
+       int i;
+
+       BUG_ON(e4b->bd_bitmap_page == NULL);
+       BUG_ON(e4b->bd_buddy_page == NULL);
+
+       ext4_lock_group(sb, group);
+       for (i = 0; i < count; i++) {
+               md = db->bb_md_cur;
+               if (md && db->bb_tid != handle->h_transaction->t_tid) {
+                       db->bb_md_cur = NULL;
+                       md = NULL;
+               }
+
+               if (md == NULL) {
+                       ext4_unlock_group(sb, group);
+                       md = kmalloc(sizeof(*md), GFP_NOFS);
+                       if (md == NULL)
+                               return -ENOMEM;
+                       md->num = 0;
+                       md->group = group;
+
+                       ext4_lock_group(sb, group);
+                       if (db->bb_md_cur == NULL) {
+                               spin_lock(&sbi->s_md_lock);
+                               list_add(&md->list, &sbi->s_active_transaction);
+                               spin_unlock(&sbi->s_md_lock);
+                               /* protect buddy cache from being freed,
+                                * otherwise we'll refresh it from
+                                * on-disk bitmap and lose not-yet-available
+                                * blocks */
+                               page_cache_get(e4b->bd_buddy_page);
+                               page_cache_get(e4b->bd_bitmap_page);
+                               db->bb_md_cur = md;
+                               db->bb_tid = handle->h_transaction->t_tid;
+                               mb_debug("new md 0x%p for group %lu\n",
+                                               md, md->group);
+                       } else {
+                               kfree(md);
+                               md = db->bb_md_cur;
+                       }
+               }
+
+               BUG_ON(md->num >= EXT4_BB_MAX_BLOCKS);
+               md->blocks[md->num] = block + i;
+               md->num++;
+               if (md->num == EXT4_BB_MAX_BLOCKS) {
+                       /* no more space, put full container on a sb's list */
+                       db->bb_md_cur = NULL;
+               }
+       }
+       ext4_unlock_group(sb, group);
+       return 0;
+}
+
+/*
+ * Main entry point into mballoc to free blocks
+ */
+void ext4_mb_free_blocks(handle_t *handle, struct inode *inode,
+                       unsigned long block, unsigned long count,
+                       int metadata, unsigned long *freed)
+{
+       struct buffer_head *bitmap_bh = 0;
+       struct super_block *sb = inode->i_sb;
+       struct ext4_allocation_context ac;
+       struct ext4_group_desc *gdp;
+       struct ext4_super_block *es;
+       unsigned long overflow;
+       ext4_grpblk_t bit;
+       struct buffer_head *gd_bh;
+       ext4_group_t block_group;
+       struct ext4_sb_info *sbi;
+       struct ext4_buddy e4b;
+       int err = 0;
+       int ret;
+
+       *freed = 0;
+
+       ext4_mb_poll_new_transaction(sb, handle);
+
+       sbi = EXT4_SB(sb);
+       es = EXT4_SB(sb)->s_es;
+       if (block < le32_to_cpu(es->s_first_data_block) ||
+           block + count < block ||
+           block + count > ext4_blocks_count(es)) {
+               ext4_error(sb, __FUNCTION__,
+                           "Freeing blocks not in datazone - "
+                           "block = %lu, count = %lu", block, count);
+               goto error_return;
+       }
+
+       ext4_debug("freeing block %lu\n", block);
+
+       ac.ac_op = EXT4_MB_HISTORY_FREE;
+       ac.ac_inode = inode;
+       ac.ac_sb = sb;
+
+do_more:
+       overflow = 0;
+       ext4_get_group_no_and_offset(sb, block, &block_group, &bit);
+
+       /*
+        * Check to see if we are freeing blocks across a group
+        * boundary.
+        */
+       if (bit + count > EXT4_BLOCKS_PER_GROUP(sb)) {
+               overflow = bit + count - EXT4_BLOCKS_PER_GROUP(sb);
+               count -= overflow;
+       }
+       bitmap_bh = read_block_bitmap(sb, block_group);
+       if (!bitmap_bh)
+               goto error_return;
+       gdp = ext4_get_group_desc(sb, block_group, &gd_bh);
+       if (!gdp)
+               goto error_return;
+
+       if (in_range(ext4_block_bitmap(sb, gdp), block, count) ||
+           in_range(ext4_inode_bitmap(sb, gdp), block, count) ||
+           in_range(block, ext4_inode_table(sb, gdp),
+                     EXT4_SB(sb)->s_itb_per_group) ||
+           in_range(block + count - 1, ext4_inode_table(sb, gdp),
+                     EXT4_SB(sb)->s_itb_per_group)) {
+
+               ext4_error(sb, __FUNCTION__,
+                          "Freeing blocks in system zone - "
+                          "Block = %lu, count = %lu", block, count);
+       }
+
+       BUFFER_TRACE(bitmap_bh, "getting write access");
+       err = ext4_journal_get_write_access(handle, bitmap_bh);
+       if (err)
+               goto error_return;
+
+       /*
+        * We are about to modify some metadata.  Call the journal APIs
+        * to unshare ->b_data if a currently-committing transaction is
+        * using it
+        */
+       BUFFER_TRACE(gd_bh, "get_write_access");
+       err = ext4_journal_get_write_access(handle, gd_bh);
+       if (err)
+               goto error_return;
+
+       err = ext4_mb_load_buddy(sb, block_group, &e4b);
+       if (err)
+               goto error_return;
+
+#ifdef AGGRESSIVE_CHECK
+       {
+               int i;
+               for (i = 0; i < count; i++)
+                       BUG_ON(!mb_test_bit(bit + i, bitmap_bh->b_data));
+       }
+#endif
+       mb_clear_bits(sb_bgl_lock(sbi, block_group), bitmap_bh->b_data,
+                       bit, count);
+
+       /* We dirtied the bitmap block */
+       BUFFER_TRACE(bitmap_bh, "dirtied bitmap block");
+       err = ext4_journal_dirty_metadata(handle, bitmap_bh);
+
+       ac.ac_b_ex.fe_group = block_group;
+       ac.ac_b_ex.fe_start = bit;
+       ac.ac_b_ex.fe_len = count;
+       ext4_mb_store_history(&ac);
+
+       if (metadata) {
+               /* blocks being freed are metadata. these blocks shouldn't
+                * be used until this transaction is committed */
+               ext4_mb_free_metadata(handle, &e4b, block_group, bit, count);
+       } else {
+               ext4_lock_group(sb, block_group);
+               err = mb_free_blocks(inode, &e4b, bit, count);
+               ext4_mb_return_to_preallocation(inode, &e4b, block, count);
+               ext4_unlock_group(sb, block_group);
+               BUG_ON(err != 0);
+       }
+
+       spin_lock(sb_bgl_lock(sbi, block_group));
+       gdp->bg_free_blocks_count =
+               cpu_to_le16(le16_to_cpu(gdp->bg_free_blocks_count) + count);
+       gdp->bg_checksum = ext4_group_desc_csum(sbi, block_group, gdp);
+       spin_unlock(sb_bgl_lock(sbi, block_group));
+       percpu_counter_add(&sbi->s_freeblocks_counter, count);
+
+       ext4_mb_release_desc(&e4b);
+
+       *freed += count;
+
+       /* And the group descriptor block */
+       BUFFER_TRACE(gd_bh, "dirtied group descriptor block");
+       ret = ext4_journal_dirty_metadata(handle, gd_bh);
+       if (!err)
+               err = ret;
+
+       if (overflow && !err) {
+               block += count;
+               count = overflow;
+               put_bh(bitmap_bh);
+               goto do_more;
+       }
+       sb->s_dirt = 1;
+error_return:
+       brelse(bitmap_bh);
+       ext4_std_error(sb, err);
+       return;
+}
diff --git a/fs/ext4/migrate.c b/fs/ext4/migrate.c
new file mode 100644 (file)
index 0000000..3ebc233
--- /dev/null
@@ -0,0 +1,560 @@
+/*
+ * Copyright IBM Corporation, 2007
+ * Author Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of version 2.1 of the GNU Lesser General Public License
+ * as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it would be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+ *
+ */
+
+#include <linux/module.h>
+#include <linux/ext4_jbd2.h>
+#include <linux/ext4_fs_extents.h>
+
+/*
+ * The contiguous blocks details which can be
+ * represented by a single extent
+ */
+struct list_blocks_struct {
+       ext4_lblk_t first_block, last_block;
+       ext4_fsblk_t first_pblock, last_pblock;
+};
+
+static int finish_range(handle_t *handle, struct inode *inode,
+                               struct list_blocks_struct *lb)
+
+{
+       int retval = 0, needed;
+       struct ext4_extent newext;
+       struct ext4_ext_path *path;
+       if (lb->first_pblock == 0)
+               return 0;
+
+       /* Add the extent to temp inode*/
+       newext.ee_block = cpu_to_le32(lb->first_block);
+       newext.ee_len   = cpu_to_le16(lb->last_block - lb->first_block + 1);
+       ext4_ext_store_pblock(&newext, lb->first_pblock);
+       path = ext4_ext_find_extent(inode, lb->first_block, NULL);
+
+       if (IS_ERR(path)) {
+               retval = PTR_ERR(path);
+               goto err_out;
+       }
+
+       /*
+        * Calculate the credit needed to inserting this extent
+        * Since we are doing this in loop we may accumalate extra
+        * credit. But below we try to not accumalate too much
+        * of them by restarting the journal.
+        */
+       needed = ext4_ext_calc_credits_for_insert(inode, path);
+
+       /*
+        * Make sure the credit we accumalated is not really high
+        */
+       if (needed && handle->h_buffer_credits >= EXT4_RESERVE_TRANS_BLOCKS) {
+               retval = ext4_journal_restart(handle, needed);
+               if (retval)
+                       goto err_out;
+       }
+       if (needed) {
+               retval = ext4_journal_extend(handle, needed);
+               if (retval != 0) {
+                       /*
+                        * IF not able to extend the journal restart the journal
+                        */
+                       retval = ext4_journal_restart(handle, needed);
+                       if (retval)
+                               goto err_out;
+               }
+       }
+       retval = ext4_ext_insert_extent(handle, inode, path, &newext);
+err_out:
+       lb->first_pblock = 0;
+       return retval;
+}
+
+static int update_extent_range(handle_t *handle, struct inode *inode,
+                               ext4_fsblk_t pblock, ext4_lblk_t blk_num,
+                               struct list_blocks_struct *lb)
+{
+       int retval;
+       /*
+        * See if we can add on to the existing range (if it exists)
+        */
+       if (lb->first_pblock &&
+               (lb->last_pblock+1 == pblock) &&
+               (lb->last_block+1 == blk_num)) {
+               lb->last_pblock = pblock;
+               lb->last_block = blk_num;
+               return 0;
+       }
+       /*
+        * Start a new range.
+        */
+       retval = finish_range(handle, inode, lb);
+       lb->first_pblock = lb->last_pblock = pblock;
+       lb->first_block = lb->last_block = blk_num;
+
+       return retval;
+}
+
+static int update_ind_extent_range(handle_t *handle, struct inode *inode,
+                                  ext4_fsblk_t pblock, ext4_lblk_t *blk_nump,
+                                  struct list_blocks_struct *lb)
+{
+       struct buffer_head *bh;
+       __le32 *i_data;
+       int i, retval = 0;
+       ext4_lblk_t blk_count = *blk_nump;
+       unsigned long max_entries = inode->i_sb->s_blocksize >> 2;
+
+       if (!pblock) {
+               /* Only update the file block number */
+               *blk_nump += max_entries;
+               return 0;
+       }
+
+       bh = sb_bread(inode->i_sb, pblock);
+       if (!bh)
+               return -EIO;
+
+       i_data = (__le32 *)bh->b_data;
+       for (i = 0; i < max_entries; i++, blk_count++) {
+               if (i_data[i]) {
+                       retval = update_extent_range(handle, inode,
+                                               le32_to_cpu(i_data[i]),
+                                               blk_count, lb);
+                       if (retval)
+                               break;
+               }
+       }
+
+       /* Update the file block number */
+       *blk_nump = blk_count;
+       put_bh(bh);
+       return retval;
+
+}
+
+static int update_dind_extent_range(handle_t *handle, struct inode *inode,
+                                   ext4_fsblk_t pblock, ext4_lblk_t *blk_nump,
+                                   struct list_blocks_struct *lb)
+{
+       struct buffer_head *bh;
+       __le32 *i_data;
+       int i, retval = 0;
+       ext4_lblk_t blk_count = *blk_nump;
+       unsigned long max_entries = inode->i_sb->s_blocksize >> 2;
+
+       if (!pblock) {
+               /* Only update the file block number */
+               *blk_nump += max_entries * max_entries;
+               return 0;
+       }
+       bh = sb_bread(inode->i_sb, pblock);
+       if (!bh)
+               return -EIO;
+
+       i_data = (__le32 *)bh->b_data;
+       for (i = 0; i < max_entries; i++) {
+               if (i_data[i]) {
+                       retval = update_ind_extent_range(handle, inode,
+                                               le32_to_cpu(i_data[i]),
+                                               &blk_count, lb);
+                       if (retval)
+                               break;
+               } else {
+                       /* Only update the file block number */
+                       blk_count += max_entries;
+               }
+       }
+
+       /* Update the file block number */
+       *blk_nump = blk_count;
+       put_bh(bh);
+       return retval;
+
+}
+
+static int update_tind_extent_range(handle_t *handle, struct inode *inode,
+                                    ext4_fsblk_t pblock, ext4_lblk_t *blk_nump,
+                                    struct list_blocks_struct *lb)
+{
+       struct buffer_head *bh;
+       __le32 *i_data;
+       int i, retval = 0;
+       ext4_lblk_t blk_count = *blk_nump;
+       unsigned long max_entries = inode->i_sb->s_blocksize >> 2;
+
+       if (!pblock) {
+               /* Only update the file block number */
+               *blk_nump += max_entries * max_entries * max_entries;
+               return 0;
+       }
+       bh = sb_bread(inode->i_sb, pblock);
+       if (!bh)
+               return -EIO;
+
+       i_data = (__le32 *)bh->b_data;
+       for (i = 0; i < max_entries; i++) {
+               if (i_data[i]) {
+                       retval = update_dind_extent_range(handle, inode,
+                                               le32_to_cpu(i_data[i]),
+                                               &blk_count, lb);
+                       if (retval)
+                               break;
+               } else
+                       /* Only update the file block number */
+                       blk_count += max_entries * max_entries;
+       }
+       /* Update the file block number */
+       *blk_nump = blk_count;
+       put_bh(bh);
+       return retval;
+
+}
+
+static int free_dind_blocks(handle_t *handle,
+                               struct inode *inode, __le32 i_data)
+{
+       int i;
+       __le32 *tmp_idata;
+       struct buffer_head *bh;
+       unsigned long max_entries = inode->i_sb->s_blocksize >> 2;
+
+       bh = sb_bread(inode->i_sb, le32_to_cpu(i_data));
+       if (!bh)
+               return -EIO;
+
+       tmp_idata = (__le32 *)bh->b_data;
+       for (i = 0; i < max_entries; i++) {
+               if (tmp_idata[i])
+                       ext4_free_blocks(handle, inode,
+                                       le32_to_cpu(tmp_idata[i]), 1, 1);
+       }
+       put_bh(bh);
+       ext4_free_blocks(handle, inode, le32_to_cpu(i_data), 1, 1);
+       return 0;
+}
+
+static int free_tind_blocks(handle_t *handle,
+                               struct inode *inode, __le32 i_data)
+{
+       int i, retval = 0;
+       __le32 *tmp_idata;
+       struct buffer_head *bh;
+       unsigned long max_entries = inode->i_sb->s_blocksize >> 2;
+
+       bh = sb_bread(inode->i_sb, le32_to_cpu(i_data));
+       if (!bh)
+               return -EIO;
+
+       tmp_idata = (__le32 *)bh->b_data;
+       for (i = 0; i < max_entries; i++) {
+               if (tmp_idata[i]) {
+                       retval = free_dind_blocks(handle,
+                                       inode, tmp_idata[i]);
+                       if (retval) {
+                               put_bh(bh);
+                               return retval;
+                       }
+               }
+       }
+       put_bh(bh);
+       ext4_free_blocks(handle, inode, le32_to_cpu(i_data), 1, 1);
+       return 0;
+}
+
+static int free_ind_block(handle_t *handle, struct inode *inode)
+{
+       int retval;
+       struct ext4_inode_info *ei = EXT4_I(inode);
+
+       if (ei->i_data[EXT4_IND_BLOCK])
+               ext4_free_blocks(handle, inode,
+                               le32_to_cpu(ei->i_data[EXT4_IND_BLOCK]), 1, 1);
+
+       if (ei->i_data[EXT4_DIND_BLOCK]) {
+               retval = free_dind_blocks(handle, inode,
+                                               ei->i_data[EXT4_DIND_BLOCK]);
+               if (retval)
+                       return retval;
+       }
+
+       if (ei->i_data[EXT4_TIND_BLOCK]) {
+               retval = free_tind_blocks(handle, inode,
+                                               ei->i_data[EXT4_TIND_BLOCK]);
+               if (retval)
+                       return retval;
+       }
+       return 0;
+}
+
+static int ext4_ext_swap_inode_data(handle_t *handle, struct inode *inode,
+                               struct inode *tmp_inode, int retval)
+{
+       struct ext4_inode_info *ei = EXT4_I(inode);
+       struct ext4_inode_info *tmp_ei = EXT4_I(tmp_inode);
+
+       retval = free_ind_block(handle, inode);
+       if (retval)
+               goto err_out;
+
+       /*
+        * One credit accounted for writing the
+        * i_data field of the original inode
+        */
+       retval = ext4_journal_extend(handle, 1);
+       if (retval != 0) {
+               retval = ext4_journal_restart(handle, 1);
+               if (retval)
+                       goto err_out;
+       }
+
+       /*
+        * We have the extent map build with the tmp inode.
+        * Now copy the i_data across
+        */
+       ei->i_flags |= EXT4_EXTENTS_FL;
+       memcpy(ei->i_data, tmp_ei->i_data, sizeof(ei->i_data));
+
+       /*
+        * Update i_blocks with the new blocks that got
+        * allocated while adding extents for extent index
+        * blocks.
+        *
+        * While converting to extents we need not
+        * update the orignal inode i_blocks for extent blocks
+        * via quota APIs. The quota update happened via tmp_inode already.
+        */
+       spin_lock(&inode->i_lock);
+       inode->i_blocks += tmp_inode->i_blocks;
+       spin_unlock(&inode->i_lock);
+
+       ext4_mark_inode_dirty(handle, inode);
+err_out:
+       return retval;
+}
+
+static int free_ext_idx(handle_t *handle, struct inode *inode,
+                                       struct ext4_extent_idx *ix)
+{
+       int i, retval = 0;
+       ext4_fsblk_t block;
+       struct buffer_head *bh;
+       struct ext4_extent_header *eh;
+
+       block = idx_pblock(ix);
+       bh = sb_bread(inode->i_sb, block);
+       if (!bh)
+               return -EIO;
+
+       eh = (struct ext4_extent_header *)bh->b_data;
+       if (eh->eh_depth != 0) {
+               ix = EXT_FIRST_INDEX(eh);
+               for (i = 0; i < le16_to_cpu(eh->eh_entries); i++, ix++) {
+                       retval = free_ext_idx(handle, inode, ix);
+                       if (retval)
+                               break;
+               }
+       }
+       put_bh(bh);
+       ext4_free_blocks(handle, inode, block, 1, 1);
+       return retval;
+}
+
+/*
+ * Free the extent meta data blocks only
+ */
+static int free_ext_block(handle_t *handle, struct inode *inode)
+{
+       int i, retval = 0;
+       struct ext4_inode_info *ei = EXT4_I(inode);
+       struct ext4_extent_header *eh = (struct ext4_extent_header *)ei->i_data;
+       struct ext4_extent_idx *ix;
+       if (eh->eh_depth == 0)
+               /*
+                * No extra blocks allocated for extent meta data
+                */
+               return 0;
+       ix = EXT_FIRST_INDEX(eh);
+       for (i = 0; i < le16_to_cpu(eh->eh_entries); i++, ix++) {
+               retval = free_ext_idx(handle, inode, ix);
+               if (retval)
+                       return retval;
+       }
+       return retval;
+
+}
+
+int ext4_ext_migrate(struct inode *inode, struct file *filp,
+                               unsigned int cmd, unsigned long arg)
+{
+       handle_t *handle;
+       int retval = 0, i;
+       __le32 *i_data;
+       ext4_lblk_t blk_count = 0;
+       struct ext4_inode_info *ei;
+       struct inode *tmp_inode = NULL;
+       struct list_blocks_struct lb;
+       unsigned long max_entries;
+
+       if (!test_opt(inode->i_sb, EXTENTS))
+               /*
+                * if mounted with noextents we don't allow the migrate
+                */
+               return -EINVAL;
+
+       if ((EXT4_I(inode)->i_flags & EXT4_EXTENTS_FL))
+               return -EINVAL;
+
+       down_write(&EXT4_I(inode)->i_data_sem);
+       handle = ext4_journal_start(inode,
+                                       EXT4_DATA_TRANS_BLOCKS(inode->i_sb) +
+                                       EXT4_INDEX_EXTRA_TRANS_BLOCKS + 3 +
+                                       2 * EXT4_QUOTA_INIT_BLOCKS(inode->i_sb)
+                                       + 1);
+       if (IS_ERR(handle)) {
+               retval = PTR_ERR(handle);
+               goto err_out;
+       }
+       tmp_inode = ext4_new_inode(handle,
+                               inode->i_sb->s_root->d_inode,
+                               S_IFREG);
+       if (IS_ERR(tmp_inode)) {
+               retval = -ENOMEM;
+               ext4_journal_stop(handle);
+               tmp_inode = NULL;
+               goto err_out;
+       }
+       i_size_write(tmp_inode, i_size_read(inode));
+       /*
+        * We don't want the inode to be reclaimed
+        * if we got interrupted in between. We have
+        * this tmp inode carrying reference to the
+        * data blocks of the original file. We set
+        * the i_nlink to zero at the last stage after
+        * switching the original file to extent format
+        */
+       tmp_inode->i_nlink = 1;
+
+       ext4_ext_tree_init(handle, tmp_inode);
+       ext4_orphan_add(handle, tmp_inode);
+       ext4_journal_stop(handle);
+
+       ei = EXT4_I(inode);
+       i_data = ei->i_data;
+       memset(&lb, 0, sizeof(lb));
+
+       /* 32 bit block address 4 bytes */
+       max_entries = inode->i_sb->s_blocksize >> 2;
+
+       /*
+        * start with one credit accounted for
+        * superblock modification.
+        *
+        * For the tmp_inode we already have commited the
+        * trascation that created the inode. Later as and
+        * when we add extents we extent the journal
+        */
+       handle = ext4_journal_start(inode, 1);
+       for (i = 0; i < EXT4_NDIR_BLOCKS; i++, blk_count++) {
+               if (i_data[i]) {
+                       retval = update_extent_range(handle, tmp_inode,
+                                               le32_to_cpu(i_data[i]),
+                                               blk_count, &lb);
+                       if (retval)
+                               goto err_out;
+               }
+       }
+       if (i_data[EXT4_IND_BLOCK]) {
+               retval = update_ind_extent_range(handle, tmp_inode,
+                                       le32_to_cpu(i_data[EXT4_IND_BLOCK]),
+                                       &blk_count, &lb);
+                       if (retval)
+                               goto err_out;
+       } else
+               blk_count +=  max_entries;
+       if (i_data[EXT4_DIND_BLOCK]) {
+               retval = update_dind_extent_range(handle, tmp_inode,
+                                       le32_to_cpu(i_data[EXT4_DIND_BLOCK]),
+                                       &blk_count, &lb);
+                       if (retval)
+                               goto err_out;
+       } else
+               blk_count += max_entries * max_entries;
+       if (i_data[EXT4_TIND_BLOCK]) {
+               retval = update_tind_extent_range(handle, tmp_inode,
+                                       le32_to_cpu(i_data[EXT4_TIND_BLOCK]),
+                                       &blk_count, &lb);
+                       if (retval)
+                               goto err_out;
+       }
+       /*
+        * Build the last extent
+        */
+       retval = finish_range(handle, tmp_inode, &lb);
+err_out:
+       /*
+        * We are either freeing extent information or indirect
+        * blocks. During this we touch superblock, group descriptor
+        * and block bitmap. Later we mark the tmp_inode dirty
+        * via ext4_ext_tree_init. So allocate a credit of 4
+        * We may update quota (user and group).
+        *
+        * FIXME!! we may be touching bitmaps in different block groups.
+        */
+       if (ext4_journal_extend(handle,
+                       4 + 2*EXT4_QUOTA_TRANS_BLOCKS(inode->i_sb)) != 0)
+               ext4_journal_restart(handle,
+                               4 + 2*EXT4_QUOTA_TRANS_BLOCKS(inode->i_sb));
+       if (retval)
+               /*
+                * Failure case delete the extent information with the
+                * tmp_inode
+                */
+               free_ext_block(handle, tmp_inode);
+       else
+               retval = ext4_ext_swap_inode_data(handle, inode,
+                                                       tmp_inode, retval);
+
+       /*
+        * Mark the tmp_inode as of size zero
+        */
+       i_size_write(tmp_inode, 0);
+
+       /*
+        * set the  i_blocks count to zero
+        * so that the ext4_delete_inode does the
+        * right job
+        *
+        * We don't need to take the i_lock because
+        * the inode is not visible to user space.
+        */
+       tmp_inode->i_blocks = 0;
+
+       /* Reset the extent details */
+       ext4_ext_tree_init(handle, tmp_inode);
+
+       /*
+        * Set the i_nlink to zero so that
+        * generic_drop_inode really deletes the
+        * inode
+        */
+       tmp_inode->i_nlink = 0;
+
+       ext4_journal_stop(handle);
+
+       up_write(&EXT4_I(inode)->i_data_sem);
+
+       if (tmp_inode)
+               iput(tmp_inode);
+
+       return retval;
+}
index 94ee6f315dc17580c6de82f65d3f58735a29b870..67b6d8a1ceff3423df8741ec426e640a8c3e8a2a 100644 (file)
@@ -51,7 +51,7 @@
 
 static struct buffer_head *ext4_append(handle_t *handle,
                                        struct inode *inode,
-                                       u32 *block, int *err)
+                                       ext4_lblk_t *block, int *err)
 {
        struct buffer_head *bh;
 
@@ -144,8 +144,8 @@ struct dx_map_entry
        u16 size;
 };
 
-static inline unsigned dx_get_block (struct dx_entry *entry);
-static void dx_set_block (struct dx_entry *entry, unsigned value);
+static inline ext4_lblk_t dx_get_block(struct dx_entry *entry);
+static void dx_set_block(struct dx_entry *entry, ext4_lblk_t value);
 static inline unsigned dx_get_hash (struct dx_entry *entry);
 static void dx_set_hash (struct dx_entry *entry, unsigned value);
 static unsigned dx_get_count (struct dx_entry *entries);
@@ -166,7 +166,8 @@ static void dx_sort_map(struct dx_map_entry *map, unsigned count);
 static struct ext4_dir_entry_2 *dx_move_dirents (char *from, char *to,
                struct dx_map_entry *offsets, int count);
 static struct ext4_dir_entry_2* dx_pack_dirents (char *base, int size);
-static void dx_insert_block (struct dx_frame *frame, u32 hash, u32 block);
+static void dx_insert_block(struct dx_frame *frame,
+                                       u32 hash, ext4_lblk_t block);
 static int ext4_htree_next_block(struct inode *dir, __u32 hash,
                                 struct dx_frame *frame,
                                 struct dx_frame *frames,
@@ -181,12 +182,12 @@ static int ext4_dx_add_entry(handle_t *handle, struct dentry *dentry,
  * Mask them off for now.
  */
 
-static inline unsigned dx_get_block (struct dx_entry *entry)
+static inline ext4_lblk_t dx_get_block(struct dx_entry *entry)
 {
        return le32_to_cpu(entry->block) & 0x00ffffff;
 }
 
-static inline void dx_set_block (struct dx_entry *entry, unsigned value)
+static inline void dx_set_block(struct dx_entry *entry, ext4_lblk_t value)
 {
        entry->block = cpu_to_le32(value);
 }
@@ -243,8 +244,8 @@ static void dx_show_index (char * label, struct dx_entry *entries)
        int i, n = dx_get_count (entries);
        printk("%s index ", label);
        for (i = 0; i < n; i++) {
-               printk("%x->%u ", i? dx_get_hash(entries + i) :
-                               0, dx_get_block(entries + i));
+               printk("%x->%lu ", i? dx_get_hash(entries + i) :
+                               0, (unsigned long)dx_get_block(entries + i));
        }
        printk("\n");
 }
@@ -280,7 +281,7 @@ static struct stats dx_show_leaf(struct dx_hash_info *hinfo, struct ext4_dir_ent
                        space += EXT4_DIR_REC_LEN(de->name_len);
                        names++;
                }
-               de = (struct ext4_dir_entry_2 *) ((char *) de + le16_to_cpu(de->rec_len));
+               de = ext4_next_entry(de);
        }
        printk("(%i)\n", names);
        return (struct stats) { names, space, 1 };
@@ -297,7 +298,8 @@ struct stats dx_show_entries(struct dx_hash_info *hinfo, struct inode *dir,
        printk("%i indexed blocks...\n", count);
        for (i = 0; i < count; i++, entries++)
        {
-               u32 block = dx_get_block(entries), hash = i? dx_get_hash(entries): 0;
+               ext4_lblk_t block = dx_get_block(entries);
+               ext4_lblk_t hash  = i ? dx_get_hash(entries): 0;
                u32 range = i < count - 1? (dx_get_hash(entries + 1) - hash): ~hash;
                struct stats stats;
                printk("%s%3u:%03u hash %8x/%8x ",levels?"":"   ", i, block, hash, range);
@@ -551,7 +553,8 @@ static int ext4_htree_next_block(struct inode *dir, __u32 hash,
  */
 static inline struct ext4_dir_entry_2 *ext4_next_entry(struct ext4_dir_entry_2 *p)
 {
-       return (struct ext4_dir_entry_2 *)((char*)p + le16_to_cpu(p->rec_len));
+       return (struct ext4_dir_entry_2 *)((char *)p +
+               ext4_rec_len_from_disk(p->rec_len));
 }
 
 /*
@@ -560,7 +563,7 @@ static inline struct ext4_dir_entry_2 *ext4_next_entry(struct ext4_dir_entry_2 *
  * into the tree.  If there is an error it is returned in err.
  */
 static int htree_dirblock_to_tree(struct file *dir_file,
-                                 struct inode *dir, int block,
+                                 struct inode *dir, ext4_lblk_t block,
                                  struct dx_hash_info *hinfo,
                                  __u32 start_hash, __u32 start_minor_hash)
 {
@@ -568,7 +571,8 @@ static int htree_dirblock_to_tree(struct file *dir_file,
        struct ext4_dir_entry_2 *de, *top;
        int err, count = 0;
 
-       dxtrace(printk("In htree dirblock_to_tree: block %d\n", block));
+       dxtrace(printk(KERN_INFO "In htree dirblock_to_tree: block %lu\n",
+                                                       (unsigned long)block));
        if (!(bh = ext4_bread (NULL, dir, block, 0, &err)))
                return err;
 
@@ -620,9 +624,9 @@ int ext4_htree_fill_tree(struct file *dir_file, __u32 start_hash,
        struct ext4_dir_entry_2 *de;
        struct dx_frame frames[2], *frame;
        struct inode *dir;
-       int block, err;
+       ext4_lblk_t block;
        int count = 0;
-       int ret;
+       int ret, err;
        __u32 hashval;
 
        dxtrace(printk("In htree_fill_tree, start hash: %x:%x\n", start_hash,
@@ -720,7 +724,7 @@ static int dx_make_map (struct ext4_dir_entry_2 *de, int size,
                        cond_resched();
                }
                /* XXX: do we need to check rec_len == 0 case? -Chris */
-               de = (struct ext4_dir_entry_2 *) ((char *) de + le16_to_cpu(de->rec_len));
+               de = ext4_next_entry(de);
        }
        return count;
 }
@@ -752,7 +756,7 @@ static void dx_sort_map (struct dx_map_entry *map, unsigned count)
        } while(more);
 }
 
-static void dx_insert_block(struct dx_frame *frame, u32 hash, u32 block)
+static void dx_insert_block(struct dx_frame *frame, u32 hash, ext4_lblk_t block)
 {
        struct dx_entry *entries = frame->entries;
        struct dx_entry *old = frame->at, *new = old + 1;
@@ -820,7 +824,7 @@ static inline int search_dirblock(struct buffer_head * bh,
                        return 1;
                }
                /* prevent looping on a bad block */
-               de_len = le16_to_cpu(de->rec_len);
+               de_len = ext4_rec_len_from_disk(de->rec_len);
                if (de_len <= 0)
                        return -1;
                offset += de_len;
@@ -847,23 +851,20 @@ static struct buffer_head * ext4_find_entry (struct dentry *dentry,
        struct super_block * sb;
        struct buffer_head * bh_use[NAMEI_RA_SIZE];
        struct buffer_head * bh, *ret = NULL;
-       unsigned long start, block, b;
+       ext4_lblk_t start, block, b;
        int ra_max = 0;         /* Number of bh's in the readahead
                                   buffer, bh_use[] */
        int ra_ptr = 0;         /* Current index into readahead
                                   buffer */
        int num = 0;
-       int nblocks, i, err;
+       ext4_lblk_t  nblocks;
+       int i, err;
        struct inode *dir = dentry->d_parent->d_inode;
        int namelen;
-       const u8 *name;
-       unsigned blocksize;
 
        *res_dir = NULL;
        sb = dir->i_sb;
-       blocksize = sb->s_blocksize;
        namelen = dentry->d_name.len;
-       name = dentry->d_name.name;
        if (namelen > EXT4_NAME_LEN)
                return NULL;
        if (is_dx(dir)) {
@@ -914,7 +915,8 @@ restart:
                if (!buffer_uptodate(bh)) {
                        /* read error, skip block & hope for the best */
                        ext4_error(sb, __FUNCTION__, "reading directory #%lu "
-                                  "offset %lu", dir->i_ino, block);
+                                  "offset %lu", dir->i_ino,
+                                  (unsigned long)block);
                        brelse(bh);
                        goto next;
                }
@@ -961,7 +963,7 @@ static struct buffer_head * ext4_dx_find_entry(struct dentry *dentry,
        struct dx_frame frames[2], *frame;
        struct ext4_dir_entry_2 *de, *top;
        struct buffer_head *bh;
-       unsigned long block;
+       ext4_lblk_t block;
        int retval;
        int namelen = dentry->d_name.len;
        const u8 *name = dentry->d_name.name;
@@ -1128,7 +1130,7 @@ dx_move_dirents(char *from, char *to, struct dx_map_entry *map, int count)
                rec_len = EXT4_DIR_REC_LEN(de->name_len);
                memcpy (to, de, rec_len);
                ((struct ext4_dir_entry_2 *) to)->rec_len =
-                               cpu_to_le16(rec_len);
+                               ext4_rec_len_to_disk(rec_len);
                de->inode = 0;
                map++;
                to += rec_len;
@@ -1147,13 +1149,12 @@ static struct ext4_dir_entry_2* dx_pack_dirents(char *base, int size)
 
        prev = to = de;
        while ((char*)de < base + size) {
-               next = (struct ext4_dir_entry_2 *) ((char *) de +
-                                                   le16_to_cpu(de->rec_len));
+               next = ext4_next_entry(de);
                if (de->inode && de->name_len) {
                        rec_len = EXT4_DIR_REC_LEN(de->name_len);
                        if (de > to)
                                memmove(to, de, rec_len);
-                       to->rec_len = cpu_to_le16(rec_len);
+                       to->rec_len = ext4_rec_len_to_disk(rec_len);
                        prev = to;
                        to = (struct ext4_dir_entry_2 *) (((char *) to) + rec_len);
                }
@@ -1174,7 +1175,7 @@ static struct ext4_dir_entry_2 *do_split(handle_t *handle, struct inode *dir,
        unsigned blocksize = dir->i_sb->s_blocksize;
        unsigned count, continued;
        struct buffer_head *bh2;
-       u32 newblock;
+       ext4_lblk_t newblock;
        u32 hash2;
        struct dx_map_entry *map;
        char *data1 = (*bh)->b_data, *data2;
@@ -1221,14 +1222,15 @@ static struct ext4_dir_entry_2 *do_split(handle_t *handle, struct inode *dir,
        split = count - move;
        hash2 = map[split].hash;
        continued = hash2 == map[split - 1].hash;
-       dxtrace(printk("Split block %i at %x, %i/%i\n",
-               dx_get_block(frame->at), hash2, split, count-split));
+       dxtrace(printk(KERN_INFO "Split block %lu at %x, %i/%i\n",
+                       (unsigned long)dx_get_block(frame->at),
+                                       hash2, split, count-split));
 
        /* Fancy dance to stay within two buffers */
        de2 = dx_move_dirents(data1, data2, map + split, count - split);
        de = dx_pack_dirents(data1,blocksize);
-       de->rec_len = cpu_to_le16(data1 + blocksize - (char *) de);
-       de2->rec_len = cpu_to_le16(data2 + blocksize - (char *) de2);
+       de->rec_len = ext4_rec_len_to_disk(data1 + blocksize - (char *) de);
+       de2->rec_len = ext4_rec_len_to_disk(data2 + blocksize - (char *) de2);
        dxtrace(dx_show_leaf (hinfo, (struct ext4_dir_entry_2 *) data1, blocksize, 1));
        dxtrace(dx_show_leaf (hinfo, (struct ext4_dir_entry_2 *) data2, blocksize, 1));
 
@@ -1297,7 +1299,7 @@ static int add_dirent_to_buf(handle_t *handle, struct dentry *dentry,
                                return -EEXIST;
                        }
                        nlen = EXT4_DIR_REC_LEN(de->name_len);
-                       rlen = le16_to_cpu(de->rec_len);
+                       rlen = ext4_rec_len_from_disk(de->rec_len);
                        if ((de->inode? rlen - nlen: rlen) >= reclen)
                                break;
                        de = (struct ext4_dir_entry_2 *)((char *)de + rlen);
@@ -1316,11 +1318,11 @@ static int add_dirent_to_buf(handle_t *handle, struct dentry *dentry,
 
        /* By now the buffer is marked for journaling */
        nlen = EXT4_DIR_REC_LEN(de->name_len);
-       rlen = le16_to_cpu(de->rec_len);
+       rlen = ext4_rec_len_from_disk(de->rec_len);
        if (de->inode) {
                struct ext4_dir_entry_2 *de1 = (struct ext4_dir_entry_2 *)((char *)de + nlen);
-               de1->rec_len = cpu_to_le16(rlen - nlen);
-               de->rec_len = cpu_to_le16(nlen);
+               de1->rec_len = ext4_rec_len_to_disk(rlen - nlen);
+               de->rec_len = ext4_rec_len_to_disk(nlen);
                de = de1;
        }
        de->file_type = EXT4_FT_UNKNOWN;
@@ -1374,7 +1376,7 @@ static int make_indexed_dir(handle_t *handle, struct dentry *dentry,
        int             retval;
        unsigned        blocksize;
        struct dx_hash_info hinfo;
-       u32             block;
+       ext4_lblk_t  block;
        struct fake_dirent *fde;
 
        blocksize =  dir->i_sb->s_blocksize;
@@ -1397,17 +1399,18 @@ static int make_indexed_dir(handle_t *handle, struct dentry *dentry,
 
        /* The 0th block becomes the root, move the dirents out */
        fde = &root->dotdot;
-       de = (struct ext4_dir_entry_2 *)((char *)fde + le16_to_cpu(fde->rec_len));
+       de = (struct ext4_dir_entry_2 *)((char *)fde +
+               ext4_rec_len_from_disk(fde->rec_len));
        len = ((char *) root) + blocksize - (char *) de;
        memcpy (data1, de, len);
        de = (struct ext4_dir_entry_2 *) data1;
        top = data1 + len;
-       while ((char *)(de2=(void*)de+le16_to_cpu(de->rec_len)) < top)
+       while ((char *)(de2 = ext4_next_entry(de)) < top)
                de = de2;
-       de->rec_len = cpu_to_le16(data1 + blocksize - (char *) de);
+       de->rec_len = ext4_rec_len_to_disk(data1 + blocksize - (char *) de);
        /* Initialize the root; the dot dirents already exist */
        de = (struct ext4_dir_entry_2 *) (&root->dotdot);
-       de->rec_len = cpu_to_le16(blocksize - EXT4_DIR_REC_LEN(2));
+       de->rec_len = ext4_rec_len_to_disk(blocksize - EXT4_DIR_REC_LEN(2));
        memset (&root->info, 0, sizeof(root->info));
        root->info.info_length = sizeof(root->info);
        root->info.hash_version = EXT4_SB(dir->i_sb)->s_def_hash_version;
@@ -1454,7 +1457,7 @@ static int ext4_add_entry (handle_t *handle, struct dentry *dentry,
        int     retval;
        int     dx_fallback=0;
        unsigned blocksize;
-       u32 block, blocks;
+       ext4_lblk_t block, blocks;
 
        sb = dir->i_sb;
        blocksize = sb->s_blocksize;
@@ -1487,7 +1490,7 @@ static int ext4_add_entry (handle_t *handle, struct dentry *dentry,
                return retval;
        de = (struct ext4_dir_entry_2 *) bh->b_data;
        de->inode = 0;
-       de->rec_len = cpu_to_le16(blocksize);
+       de->rec_len = ext4_rec_len_to_disk(blocksize);
        return add_dirent_to_buf(handle, dentry, inode, de, bh);
 }
 
@@ -1531,7 +1534,7 @@ static int ext4_dx_add_entry(handle_t *handle, struct dentry *dentry,
                       dx_get_count(entries), dx_get_limit(entries)));
        /* Need to split index? */
        if (dx_get_count(entries) == dx_get_limit(entries)) {
-               u32 newblock;
+               ext4_lblk_t newblock;
                unsigned icount = dx_get_count(entries);
                int levels = frame - frames;
                struct dx_entry *entries2;
@@ -1550,7 +1553,7 @@ static int ext4_dx_add_entry(handle_t *handle, struct dentry *dentry,
                        goto cleanup;
                node2 = (struct dx_node *)(bh2->b_data);
                entries2 = node2->entries;
-               node2->fake.rec_len = cpu_to_le16(sb->s_blocksize);
+               node2->fake.rec_len = ext4_rec_len_to_disk(sb->s_blocksize);
                node2->fake.inode = 0;
                BUFFER_TRACE(frame->bh, "get_write_access");
                err = ext4_journal_get_write_access(handle, frame->bh);
@@ -1648,9 +1651,9 @@ static int ext4_delete_entry (handle_t *handle,
                        BUFFER_TRACE(bh, "get_write_access");
                        ext4_journal_get_write_access(handle, bh);
                        if (pde)
-                               pde->rec_len =
-                                       cpu_to_le16(le16_to_cpu(pde->rec_len) +
-                                                   le16_to_cpu(de->rec_len));
+                               pde->rec_len = ext4_rec_len_to_disk(
+                                       ext4_rec_len_from_disk(pde->rec_len) +
+                                       ext4_rec_len_from_disk(de->rec_len));
                        else
                                de->inode = 0;
                        dir->i_version++;
@@ -1658,10 +1661,9 @@ static int ext4_delete_entry (handle_t *handle,
                        ext4_journal_dirty_metadata(handle, bh);
                        return 0;
                }
-               i += le16_to_cpu(de->rec_len);
+               i += ext4_rec_len_from_disk(de->rec_len);
                pde = de;
-               de = (struct ext4_dir_entry_2 *)
-                       ((char *) de + le16_to_cpu(de->rec_len));
+               de = ext4_next_entry(de);
        }
        return -ENOENT;
 }
@@ -1824,13 +1826,13 @@ retry:
        de = (struct ext4_dir_entry_2 *) dir_block->b_data;
        de->inode = cpu_to_le32(inode->i_ino);
        de->name_len = 1;
-       de->rec_len = cpu_to_le16(EXT4_DIR_REC_LEN(de->name_len));
+       de->rec_len = ext4_rec_len_to_disk(EXT4_DIR_REC_LEN(de->name_len));
        strcpy (de->name, ".");
        ext4_set_de_type(dir->i_sb, de, S_IFDIR);
-       de = (struct ext4_dir_entry_2 *)
-                       ((char *) de + le16_to_cpu(de->rec_len));
+       de = ext4_next_entry(de);
        de->inode = cpu_to_le32(dir->i_ino);
-       de->rec_len = cpu_to_le16(inode->i_sb->s_blocksize-EXT4_DIR_REC_LEN(1));
+       de->rec_len = ext4_rec_len_to_disk(inode->i_sb->s_blocksize -
+                                               EXT4_DIR_REC_LEN(1));
        de->name_len = 2;
        strcpy (de->name, "..");
        ext4_set_de_type(dir->i_sb, de, S_IFDIR);
@@ -1882,8 +1884,7 @@ static int empty_dir (struct inode * inode)
                return 1;
        }
        de = (struct ext4_dir_entry_2 *) bh->b_data;
-       de1 = (struct ext4_dir_entry_2 *)
-                       ((char *) de + le16_to_cpu(de->rec_len));
+       de1 = ext4_next_entry(de);
        if (le32_to_cpu(de->inode) != inode->i_ino ||
                        !le32_to_cpu(de1->inode) ||
                        strcmp (".", de->name) ||
@@ -1894,9 +1895,9 @@ static int empty_dir (struct inode * inode)
                brelse (bh);
                return 1;
        }
-       offset = le16_to_cpu(de->rec_len) + le16_to_cpu(de1->rec_len);
-       de = (struct ext4_dir_entry_2 *)
-                       ((char *) de1 + le16_to_cpu(de1->rec_len));
+       offset = ext4_rec_len_from_disk(de->rec_len) +
+                ext4_rec_len_from_disk(de1->rec_len);
+       de = ext4_next_entry(de1);
        while (offset < inode->i_size ) {
                if (!bh ||
                        (void *) de >= (void *) (bh->b_data+sb->s_blocksize)) {
@@ -1925,9 +1926,8 @@ static int empty_dir (struct inode * inode)
                        brelse (bh);
                        return 0;
                }
-               offset += le16_to_cpu(de->rec_len);
-               de = (struct ext4_dir_entry_2 *)
-                               ((char *) de + le16_to_cpu(de->rec_len));
+               offset += ext4_rec_len_from_disk(de->rec_len);
+               de = ext4_next_entry(de);
        }
        brelse (bh);
        return 1;
@@ -2282,8 +2282,7 @@ retry:
 }
 
 #define PARENT_INO(buffer) \
-       ((struct ext4_dir_entry_2 *) ((char *) buffer + \
-       le16_to_cpu(((struct ext4_dir_entry_2 *) buffer)->rec_len)))->inode
+       (ext4_next_entry((struct ext4_dir_entry_2 *)(buffer))->inode)
 
 /*
  * Anybody can rename anything with this: the permission checks are left to the
index bd8a52bb39997576ab2d93e7f3c795262640503e..4fbba60816f461a2f08303ea306934f27ede641e 100644 (file)
@@ -28,7 +28,7 @@ static int verify_group_input(struct super_block *sb,
        struct ext4_super_block *es = sbi->s_es;
        ext4_fsblk_t start = ext4_blocks_count(es);
        ext4_fsblk_t end = start + input->blocks_count;
-       unsigned group = input->group;
+       ext4_group_t group = input->group;
        ext4_fsblk_t itend = input->inode_table + sbi->s_itb_per_group;
        unsigned overhead = ext4_bg_has_super(sb, group) ?
                (1 + ext4_bg_num_gdb(sb, group) +
@@ -206,7 +206,7 @@ static int setup_new_group_blocks(struct super_block *sb,
        }
 
        if (ext4_bg_has_super(sb, input->group)) {
-               ext4_debug("mark backup superblock %#04lx (+0)\n", start);
+               ext4_debug("mark backup superblock %#04llx (+0)\n", start);
                ext4_set_bit(0, bh->b_data);
        }
 
@@ -215,7 +215,7 @@ static int setup_new_group_blocks(struct super_block *sb,
             i < gdblocks; i++, block++, bit++) {
                struct buffer_head *gdb;
 
-               ext4_debug("update backup group %#04lx (+%d)\n", block, bit);
+               ext4_debug("update backup group %#04llx (+%d)\n", block, bit);
 
                if ((err = extend_or_restart_transaction(handle, 1, bh)))
                        goto exit_bh;
@@ -243,7 +243,7 @@ static int setup_new_group_blocks(struct super_block *sb,
             i < reserved_gdb; i++, block++, bit++) {
                struct buffer_head *gdb;
 
-               ext4_debug("clear reserved block %#04lx (+%d)\n", block, bit);
+               ext4_debug("clear reserved block %#04llx (+%d)\n", block, bit);
 
                if ((err = extend_or_restart_transaction(handle, 1, bh)))
                        goto exit_bh;
@@ -256,10 +256,10 @@ static int setup_new_group_blocks(struct super_block *sb,
                ext4_set_bit(bit, bh->b_data);
                brelse(gdb);
        }
-       ext4_debug("mark block bitmap %#04x (+%ld)\n", input->block_bitmap,
+       ext4_debug("mark block bitmap %#04llx (+%llu)\n", input->block_bitmap,
                   input->block_bitmap - start);
        ext4_set_bit(input->block_bitmap - start, bh->b_data);
-       ext4_debug("mark inode bitmap %#04x (+%ld)\n", input->inode_bitmap,
+       ext4_debug("mark inode bitmap %#04llx (+%llu)\n", input->inode_bitmap,
                   input->inode_bitmap - start);
        ext4_set_bit(input->inode_bitmap - start, bh->b_data);
 
@@ -268,7 +268,7 @@ static int setup_new_group_blocks(struct super_block *sb,
             i < sbi->s_itb_per_group; i++, bit++, block++) {
                struct buffer_head *it;
 
-               ext4_debug("clear inode block %#04lx (+%d)\n", block, bit);
+               ext4_debug("clear inode block %#04llx (+%d)\n", block, bit);
 
                if ((err = extend_or_restart_transaction(handle, 1, bh)))
                        goto exit_bh;
@@ -291,7 +291,7 @@ static int setup_new_group_blocks(struct super_block *sb,
        brelse(bh);
 
        /* Mark unused entries in inode bitmap used */
-       ext4_debug("clear inode bitmap %#04x (+%ld)\n",
+       ext4_debug("clear inode bitmap %#04llx (+%llu)\n",
                   input->inode_bitmap, input->inode_bitmap - start);
        if (IS_ERR(bh = bclean(handle, sb, input->inode_bitmap))) {
                err = PTR_ERR(bh);
@@ -357,7 +357,7 @@ static int verify_reserved_gdb(struct super_block *sb,
                               struct buffer_head *primary)
 {
        const ext4_fsblk_t blk = primary->b_blocknr;
-       const unsigned long end = EXT4_SB(sb)->s_groups_count;
+       const ext4_group_t end = EXT4_SB(sb)->s_groups_count;
        unsigned three = 1;
        unsigned five = 5;
        unsigned seven = 7;
@@ -656,12 +656,12 @@ static void update_backups(struct super_block *sb,
                           int blk_off, char *data, int size)
 {
        struct ext4_sb_info *sbi = EXT4_SB(sb);
-       const unsigned long last = sbi->s_groups_count;
+       const ext4_group_t last = sbi->s_groups_count;
        const int bpg = EXT4_BLOCKS_PER_GROUP(sb);
        unsigned three = 1;
        unsigned five = 5;
        unsigned seven = 7;
-       unsigned group;
+       ext4_group_t group;
        int rest = sb->s_blocksize - size;
        handle_t *handle;
        int err = 0, err2;
@@ -716,7 +716,7 @@ static void update_backups(struct super_block *sb,
 exit_err:
        if (err) {
                ext4_warning(sb, __FUNCTION__,
-                            "can't update backup for group %d (err %d), "
+                            "can't update backup for group %lu (err %d), "
                             "forcing fsck on next reboot", group, err);
                sbi->s_mount_state &= ~EXT4_VALID_FS;
                sbi->s_es->s_state &= cpu_to_le16(~EXT4_VALID_FS);
@@ -952,7 +952,7 @@ int ext4_group_extend(struct super_block *sb, struct ext4_super_block *es,
                      ext4_fsblk_t n_blocks_count)
 {
        ext4_fsblk_t o_blocks_count;
-       unsigned long o_groups_count;
+       ext4_group_t o_groups_count;
        ext4_grpblk_t last;
        ext4_grpblk_t add;
        struct buffer_head * bh;
@@ -1054,7 +1054,7 @@ int ext4_group_extend(struct super_block *sb, struct ext4_super_block *es,
        ext4_journal_dirty_metadata(handle, EXT4_SB(sb)->s_sbh);
        sb->s_dirt = 1;
        unlock_super(sb);
-       ext4_debug("freeing blocks %lu through %llu\n", o_blocks_count,
+       ext4_debug("freeing blocks %llu through %llu\n", o_blocks_count,
                   o_blocks_count + add);
        ext4_free_blocks_sb(handle, sb, o_blocks_count, add, &freed_blocks);
        ext4_debug("freed blocks %llu through %llu\n", o_blocks_count,
index 1ca0f546c466193e46c519879e8329a18c1b1af1..055a0cd0168e5029ee87bb3ce8c55153fae3b171 100644 (file)
@@ -373,6 +373,66 @@ void ext4_update_dynamic_rev(struct super_block *sb)
         */
 }
 
+int ext4_update_compat_feature(handle_t *handle,
+                                       struct super_block *sb, __u32 compat)
+{
+       int err = 0;
+       if (!EXT4_HAS_COMPAT_FEATURE(sb, compat)) {
+               err = ext4_journal_get_write_access(handle,
+                               EXT4_SB(sb)->s_sbh);
+               if (err)
+                       return err;
+               EXT4_SET_COMPAT_FEATURE(sb, compat);
+               sb->s_dirt = 1;
+               handle->h_sync = 1;
+               BUFFER_TRACE(EXT4_SB(sb)->s_sbh,
+                                       "call ext4_journal_dirty_met adata");
+               err = ext4_journal_dirty_metadata(handle,
+                               EXT4_SB(sb)->s_sbh);
+       }
+       return err;
+}
+
+int ext4_update_rocompat_feature(handle_t *handle,
+                                       struct super_block *sb, __u32 rocompat)
+{
+       int err = 0;
+       if (!EXT4_HAS_RO_COMPAT_FEATURE(sb, rocompat)) {
+               err = ext4_journal_get_write_access(handle,
+                               EXT4_SB(sb)->s_sbh);
+               if (err)
+                       return err;
+               EXT4_SET_RO_COMPAT_FEATURE(sb, rocompat);
+               sb->s_dirt = 1;
+               handle->h_sync = 1;
+               BUFFER_TRACE(EXT4_SB(sb)->s_sbh,
+                                       "call ext4_journal_dirty_met adata");
+               err = ext4_journal_dirty_metadata(handle,
+                               EXT4_SB(sb)->s_sbh);
+       }
+       return err;
+}
+
+int ext4_update_incompat_feature(handle_t *handle,
+                                       struct super_block *sb, __u32 incompat)
+{
+       int err = 0;
+       if (!EXT4_HAS_INCOMPAT_FEATURE(sb, incompat)) {
+               err = ext4_journal_get_write_access(handle,
+                               EXT4_SB(sb)->s_sbh);
+               if (err)
+                       return err;
+               EXT4_SET_INCOMPAT_FEATURE(sb, incompat);
+               sb->s_dirt = 1;
+               handle->h_sync = 1;
+               BUFFER_TRACE(EXT4_SB(sb)->s_sbh,
+                                       "call ext4_journal_dirty_met adata");
+               err = ext4_journal_dirty_metadata(handle,
+                               EXT4_SB(sb)->s_sbh);
+       }
+       return err;
+}
+
 /*
  * Open the external journal device
  */
@@ -443,6 +503,7 @@ static void ext4_put_super (struct super_block * sb)
        struct ext4_super_block *es = sbi->s_es;
        int i;
 
+       ext4_mb_release(sb);
        ext4_ext_release(sb);
        ext4_xattr_put_super(sb);
        jbd2_journal_destroy(sbi->s_journal);
@@ -509,6 +570,8 @@ static struct inode *ext4_alloc_inode(struct super_block *sb)
        ei->i_block_alloc_info = NULL;
        ei->vfs_inode.i_version = 1;
        memset(&ei->i_cached_extent, 0, sizeof(struct ext4_ext_cache));
+       INIT_LIST_HEAD(&ei->i_prealloc_list);
+       spin_lock_init(&ei->i_prealloc_lock);
        return &ei->vfs_inode;
 }
 
@@ -533,7 +596,7 @@ static void init_once(struct kmem_cache *cachep, void *foo)
 #ifdef CONFIG_EXT4DEV_FS_XATTR
        init_rwsem(&ei->xattr_sem);
 #endif
-       mutex_init(&ei->truncate_mutex);
+       init_rwsem(&ei->i_data_sem);
        inode_init_once(&ei->vfs_inode);
 }
 
@@ -605,18 +668,20 @@ static inline void ext4_show_quota_options(struct seq_file *seq, struct super_bl
  */
 static int ext4_show_options(struct seq_file *seq, struct vfsmount *vfs)
 {
+       int def_errors;
+       unsigned long def_mount_opts;
        struct super_block *sb = vfs->mnt_sb;
        struct ext4_sb_info *sbi = EXT4_SB(sb);
        struct ext4_super_block *es = sbi->s_es;
-       unsigned long def_mount_opts;
 
        def_mount_opts = le32_to_cpu(es->s_default_mount_opts);
+       def_errors     = le16_to_cpu(es->s_errors);
 
        if (sbi->s_sb_block != 1)
                seq_printf(seq, ",sb=%llu", sbi->s_sb_block);
        if (test_opt(sb, MINIX_DF))
                seq_puts(seq, ",minixdf");
-       if (test_opt(sb, GRPID))
+       if (test_opt(sb, GRPID) && !(def_mount_opts & EXT4_DEFM_BSDGROUPS))
                seq_puts(seq, ",grpid");
        if (!test_opt(sb, GRPID) && (def_mount_opts & EXT4_DEFM_BSDGROUPS))
                seq_puts(seq, ",nogrpid");
@@ -628,34 +693,33 @@ static int ext4_show_options(struct seq_file *seq, struct vfsmount *vfs)
            le16_to_cpu(es->s_def_resgid) != EXT4_DEF_RESGID) {
                seq_printf(seq, ",resgid=%u", sbi->s_resgid);
        }
-       if (test_opt(sb, ERRORS_CONT)) {
-               int def_errors = le16_to_cpu(es->s_errors);
-
+       if (test_opt(sb, ERRORS_RO)) {
                if (def_errors == EXT4_ERRORS_PANIC ||
-                   def_errors == EXT4_ERRORS_RO) {
-                       seq_puts(seq, ",errors=continue");
+                   def_errors == EXT4_ERRORS_CONTINUE) {
+                       seq_puts(seq, ",errors=remount-ro");
                }
        }
-       if (test_opt(sb, ERRORS_RO))
-               seq_puts(seq, ",errors=remount-ro");
-       if (test_opt(sb, ERRORS_PANIC))
+       if (test_opt(sb, ERRORS_CONT) && def_errors != EXT4_ERRORS_CONTINUE)
+               seq_puts(seq, ",errors=continue");
+       if (test_opt(sb, ERRORS_PANIC) && def_errors != EXT4_ERRORS_PANIC)
                seq_puts(seq, ",errors=panic");
-       if (test_opt(sb, NO_UID32))
+       if (test_opt(sb, NO_UID32) && !(def_mount_opts & EXT4_DEFM_UID16))
                seq_puts(seq, ",nouid32");
-       if (test_opt(sb, DEBUG))
+       if (test_opt(sb, DEBUG) && !(def_mount_opts & EXT4_DEFM_DEBUG))
                seq_puts(seq, ",debug");
        if (test_opt(sb, OLDALLOC))
                seq_puts(seq, ",oldalloc");
-#ifdef CONFIG_EXT4_FS_XATTR
-       if (test_opt(sb, XATTR_USER))
+#ifdef CONFIG_EXT4DEV_FS_XATTR
+       if (test_opt(sb, XATTR_USER) &&
+               !(def_mount_opts & EXT4_DEFM_XATTR_USER))
                seq_puts(seq, ",user_xattr");
        if (!test_opt(sb, XATTR_USER) &&
            (def_mount_opts & EXT4_DEFM_XATTR_USER)) {
                seq_puts(seq, ",nouser_xattr");
        }
 #endif
-#ifdef CONFIG_EXT4_FS_POSIX_ACL
-       if (test_opt(sb, POSIX_ACL))
+#ifdef CONFIG_EXT4DEV_FS_POSIX_ACL
+       if (test_opt(sb, POSIX_ACL) && !(def_mount_opts & EXT4_DEFM_ACL))
                seq_puts(seq, ",acl");
        if (!test_opt(sb, POSIX_ACL) && (def_mount_opts & EXT4_DEFM_ACL))
                seq_puts(seq, ",noacl");
@@ -672,7 +736,17 @@ static int ext4_show_options(struct seq_file *seq, struct vfsmount *vfs)
                seq_puts(seq, ",nobh");
        if (!test_opt(sb, EXTENTS))
                seq_puts(seq, ",noextents");
+       if (!test_opt(sb, MBALLOC))
+               seq_puts(seq, ",nomballoc");
+       if (test_opt(sb, I_VERSION))
+               seq_puts(seq, ",i_version");
 
+       if (sbi->s_stripe)
+               seq_printf(seq, ",stripe=%lu", sbi->s_stripe);
+       /*
+        * journal mode get enabled in different ways
+        * So just print the value even if we didn't specify it
+        */
        if (test_opt(sb, DATA_FLAGS) == EXT4_MOUNT_JOURNAL_DATA)
                seq_puts(seq, ",data=journal");
        else if (test_opt(sb, DATA_FLAGS) == EXT4_MOUNT_ORDERED_DATA)
@@ -681,7 +755,6 @@ static int ext4_show_options(struct seq_file *seq, struct vfsmount *vfs)
                seq_puts(seq, ",data=writeback");
 
        ext4_show_quota_options(seq, sb);
-
        return 0;
 }
 
@@ -809,11 +882,13 @@ enum {
        Opt_user_xattr, Opt_nouser_xattr, Opt_acl, Opt_noacl,
        Opt_reservation, Opt_noreservation, Opt_noload, Opt_nobh, Opt_bh,
        Opt_commit, Opt_journal_update, Opt_journal_inum, Opt_journal_dev,
+       Opt_journal_checksum, Opt_journal_async_commit,
        Opt_abort, Opt_data_journal, Opt_data_ordered, Opt_data_writeback,
        Opt_usrjquota, Opt_grpjquota, Opt_offusrjquota, Opt_offgrpjquota,
        Opt_jqfmt_vfsold, Opt_jqfmt_vfsv0, Opt_quota, Opt_noquota,
        Opt_ignore, Opt_barrier, Opt_err, Opt_resize, Opt_usrquota,
-       Opt_grpquota, Opt_extents, Opt_noextents,
+       Opt_grpquota, Opt_extents, Opt_noextents, Opt_i_version,
+       Opt_mballoc, Opt_nomballoc, Opt_stripe,
 };
 
 static match_table_t tokens = {
@@ -848,6 +923,8 @@ static match_table_t tokens = {
        {Opt_journal_update, "journal=update"},
        {Opt_journal_inum, "journal=%u"},
        {Opt_journal_dev, "journal_dev=%u"},
+       {Opt_journal_checksum, "journal_checksum"},
+       {Opt_journal_async_commit, "journal_async_commit"},
        {Opt_abort, "abort"},
        {Opt_data_journal, "data=journal"},
        {Opt_data_ordered, "data=ordered"},
@@ -865,6 +942,10 @@ static match_table_t tokens = {
        {Opt_barrier, "barrier=%u"},
        {Opt_extents, "extents"},
        {Opt_noextents, "noextents"},
+       {Opt_i_version, "i_version"},
+       {Opt_mballoc, "mballoc"},
+       {Opt_nomballoc, "nomballoc"},
+       {Opt_stripe, "stripe=%u"},
        {Opt_err, NULL},
        {Opt_resize, "resize"},
 };
@@ -1035,6 +1116,13 @@ static int parse_options (char *options, struct super_block *sb,
                                return 0;
                        *journal_devnum = option;
                        break;
+               case Opt_journal_checksum:
+                       set_opt(sbi->s_mount_opt, JOURNAL_CHECKSUM);
+                       break;
+               case Opt_journal_async_commit:
+                       set_opt(sbi->s_mount_opt, JOURNAL_ASYNC_COMMIT);
+                       set_opt(sbi->s_mount_opt, JOURNAL_CHECKSUM);
+                       break;
                case Opt_noload:
                        set_opt (sbi->s_mount_opt, NOLOAD);
                        break;
@@ -1203,6 +1291,23 @@ clear_qf_name:
                case Opt_noextents:
                        clear_opt (sbi->s_mount_opt, EXTENTS);
                        break;
+               case Opt_i_version:
+                       set_opt(sbi->s_mount_opt, I_VERSION);
+                       sb->s_flags |= MS_I_VERSION;
+                       break;
+               case Opt_mballoc:
+                       set_opt(sbi->s_mount_opt, MBALLOC);
+                       break;
+               case Opt_nomballoc:
+                       clear_opt(sbi->s_mount_opt, MBALLOC);
+                       break;
+               case Opt_stripe:
+                       if (match_int(&args[0], &option))
+                               return 0;
+                       if (option < 0)
+                               return 0;
+                       sbi->s_stripe = option;
+                       break;
                default:
                        printk (KERN_ERR
                                "EXT4-fs: Unrecognized mount option \"%s\" "
@@ -1364,7 +1469,7 @@ static int ext4_check_descriptors (struct super_block * sb)
        struct ext4_group_desc * gdp = NULL;
        int desc_block = 0;
        int flexbg_flag = 0;
-       int i;
+       ext4_group_t i;
 
        if (EXT4_HAS_INCOMPAT_FEATURE(sb, EXT4_FEATURE_INCOMPAT_FLEX_BG))
                flexbg_flag = 1;
@@ -1386,7 +1491,7 @@ static int ext4_check_descriptors (struct super_block * sb)
                if (block_bitmap < first_block || block_bitmap > last_block)
                {
                        ext4_error (sb, "ext4_check_descriptors",
-                                   "Block bitmap for group %d"
+                                   "Block bitmap for group %lu"
                                    " not in group (block %llu)!",
                                    i, block_bitmap);
                        return 0;
@@ -1395,7 +1500,7 @@ static int ext4_check_descriptors (struct super_block * sb)
                if (inode_bitmap < first_block || inode_bitmap > last_block)
                {
                        ext4_error (sb, "ext4_check_descriptors",
-                                   "Inode bitmap for group %d"
+                                   "Inode bitmap for group %lu"
                                    " not in group (block %llu)!",
                                    i, inode_bitmap);
                        return 0;
@@ -1405,17 +1510,16 @@ static int ext4_check_descriptors (struct super_block * sb)
                    inode_table + sbi->s_itb_per_group - 1 > last_block)
                {
                        ext4_error (sb, "ext4_check_descriptors",
-                                   "Inode table for group %d"
+                                   "Inode table for group %lu"
                                    " not in group (block %llu)!",
                                    i, inode_table);
                        return 0;
                }
                if (!ext4_group_desc_csum_verify(sbi, i, gdp)) {
                        ext4_error(sb, __FUNCTION__,
-                                  "Checksum for group %d failed (%u!=%u)\n", i,
-                                  le16_to_cpu(ext4_group_desc_csum(sbi, i,
-                                                                   gdp)),
-                                  le16_to_cpu(gdp->bg_checksum));
+                                  "Checksum for group %lu failed (%u!=%u)\n",
+                                   i, le16_to_cpu(ext4_group_desc_csum(sbi, i,
+                                   gdp)), le16_to_cpu(gdp->bg_checksum));
                        return 0;
                }
                if (!flexbg_flag)
@@ -1429,7 +1533,6 @@ static int ext4_check_descriptors (struct super_block * sb)
        return 1;
 }
 
-
 /* ext4_orphan_cleanup() walks a singly-linked list of inodes (starting at
  * the superblock) which were deleted from all directories, but held open by
  * a process at the time of a crash.  We walk the list and try to delete these
@@ -1542,20 +1645,95 @@ static void ext4_orphan_cleanup (struct super_block * sb,
 #endif
        sb->s_flags = s_flags; /* Restore MS_RDONLY status */
 }
+/*
+ * Maximal extent format file size.
+ * Resulting logical blkno at s_maxbytes must fit in our on-disk
+ * extent format containers, within a sector_t, and within i_blocks
+ * in the vfs.  ext4 inode has 48 bits of i_block in fsblock units,
+ * so that won't be a limiting factor.
+ *
+ * Note, this does *not* consider any metadata overhead for vfs i_blocks.
+ */
+static loff_t ext4_max_size(int blkbits)
+{
+       loff_t res;
+       loff_t upper_limit = MAX_LFS_FILESIZE;
+
+       /* small i_blocks in vfs inode? */
+       if (sizeof(blkcnt_t) < sizeof(u64)) {
+               /*
+                * CONFIG_LSF is not enabled implies the inode
+                * i_block represent total blocks in 512 bytes
+                * 32 == size of vfs inode i_blocks * 8
+                */
+               upper_limit = (1LL << 32) - 1;
+
+               /* total blocks in file system block size */
+               upper_limit >>= (blkbits - 9);
+               upper_limit <<= blkbits;
+       }
+
+       /* 32-bit extent-start container, ee_block */
+       res = 1LL << 32;
+       res <<= blkbits;
+       res -= 1;
+
+       /* Sanity check against vm- & vfs- imposed limits */
+       if (res > upper_limit)
+               res = upper_limit;
+
+       return res;
+}
 
 /*
- * Maximal file size.  There is a direct, and {,double-,triple-}indirect
- * block limit, and also a limit of (2^32 - 1) 512-byte sectors in i_blocks.
- * We need to be 1 filesystem block less than the 2^32 sector limit.
+ * Maximal bitmap file size.  There is a direct, and {,double-,triple-}indirect
+ * block limit, and also a limit of (2^48 - 1) 512-byte sectors in i_blocks.
+ * We need to be 1 filesystem block less than the 2^48 sector limit.
  */
-static loff_t ext4_max_size(int bits)
+static loff_t ext4_max_bitmap_size(int bits)
 {
        loff_t res = EXT4_NDIR_BLOCKS;
-       /* This constant is calculated to be the largest file size for a
-        * dense, 4k-blocksize file such that the total number of
+       int meta_blocks;
+       loff_t upper_limit;
+       /* This is calculated to be the largest file size for a
+        * dense, bitmapped file such that the total number of
         * sectors in the file, including data and all indirect blocks,
-        * does not exceed 2^32. */
-       const loff_t upper_limit = 0x1ff7fffd000LL;
+        * does not exceed 2^48 -1
+        * __u32 i_blocks_lo and _u16 i_blocks_high representing the
+        * total number of  512 bytes blocks of the file
+        */
+
+       if (sizeof(blkcnt_t) < sizeof(u64)) {
+               /*
+                * CONFIG_LSF is not enabled implies the inode
+                * i_block represent total blocks in 512 bytes
+                * 32 == size of vfs inode i_blocks * 8
+                */
+               upper_limit = (1LL << 32) - 1;
+
+               /* total blocks in file system block size */
+               upper_limit >>= (bits - 9);
+
+       } else {
+               /*
+                * We use 48 bit ext4_inode i_blocks
+                * With EXT4_HUGE_FILE_FL set the i_blocks
+                * represent total number of blocks in
+                * file system block size
+                */
+               upper_limit = (1LL << 48) - 1;
+
+       }
+
+       /* indirect blocks */
+       meta_blocks = 1;
+       /* double indirect blocks */
+       meta_blocks += 1 + (1LL << (bits-2));
+       /* tripple indirect blocks */
+       meta_blocks += 1 + (1LL << (bits-2)) + (1LL << (2*(bits-2)));
+
+       upper_limit -= meta_blocks;
+       upper_limit <<= bits;
 
        res += 1LL << (bits-2);
        res += 1LL << (2*(bits-2));
@@ -1563,6 +1741,10 @@ static loff_t ext4_max_size(int bits)
        res <<= bits;
        if (res > upper_limit)
                res = upper_limit;
+
+       if (res > MAX_LFS_FILESIZE)
+               res = MAX_LFS_FILESIZE;
+
        return res;
 }
 
@@ -1570,7 +1752,7 @@ static ext4_fsblk_t descriptor_loc(struct super_block *sb,
                                ext4_fsblk_t logical_sb_block, int nr)
 {
        struct ext4_sb_info *sbi = EXT4_SB(sb);
-       unsigned long bg, first_meta_bg;
+       ext4_group_t bg, first_meta_bg;
        int has_super = 0;
 
        first_meta_bg = le32_to_cpu(sbi->s_es->s_first_meta_bg);
@@ -1584,8 +1766,39 @@ static ext4_fsblk_t descriptor_loc(struct super_block *sb,
        return (has_super + ext4_group_first_block_no(sb, bg));
 }
 
+/**
+ * ext4_get_stripe_size: Get the stripe size.
+ * @sbi: In memory super block info
+ *
+ * If we have specified it via mount option, then
+ * use the mount option value. If the value specified at mount time is
+ * greater than the blocks per group use the super block value.
+ * If the super block value is greater than blocks per group return 0.
+ * Allocator needs it be less than blocks per group.
+ *
+ */
+static unsigned long ext4_get_stripe_size(struct ext4_sb_info *sbi)
+{
+       unsigned long stride = le16_to_cpu(sbi->s_es->s_raid_stride);
+       unsigned long stripe_width =
+                       le32_to_cpu(sbi->s_es->s_raid_stripe_width);
+
+       if (sbi->s_stripe && sbi->s_stripe <= sbi->s_blocks_per_group)
+               return sbi->s_stripe;
+
+       if (stripe_width <= sbi->s_blocks_per_group)
+               return stripe_width;
+
+       if (stride <= sbi->s_blocks_per_group)
+               return stride;
+
+       return 0;
+}
 
 static int ext4_fill_super (struct super_block *sb, void *data, int silent)
+                               __releases(kernel_sem)
+                               __acquires(kernel_sem)
+
 {
        struct buffer_head * bh;
        struct ext4_super_block *es = NULL;
@@ -1599,7 +1812,6 @@ static int ext4_fill_super (struct super_block *sb, void *data, int silent)
        unsigned long def_mount_opts;
        struct inode *root;
        int blocksize;
-       int hblock;
        int db_count;
        int i;
        int needs_recovery;
@@ -1624,6 +1836,11 @@ static int ext4_fill_super (struct super_block *sb, void *data, int silent)
                goto out_fail;
        }
 
+       if (!sb_set_blocksize(sb, blocksize)) {
+               printk(KERN_ERR "EXT4-fs: bad blocksize %d.\n", blocksize);
+               goto out_fail;
+       }
+
        /*
         * The ext4 superblock will not be buffer aligned for other than 1kB
         * block sizes.  We need to calculate the offset from buffer start.
@@ -1674,10 +1891,10 @@ static int ext4_fill_super (struct super_block *sb, void *data, int silent)
 
        if (le16_to_cpu(sbi->s_es->s_errors) == EXT4_ERRORS_PANIC)
                set_opt(sbi->s_mount_opt, ERRORS_PANIC);
-       else if (le16_to_cpu(sbi->s_es->s_errors) == EXT4_ERRORS_RO)
-               set_opt(sbi->s_mount_opt, ERRORS_RO);
-       else
+       else if (le16_to_cpu(sbi->s_es->s_errors) == EXT4_ERRORS_CONTINUE)
                set_opt(sbi->s_mount_opt, ERRORS_CONT);
+       else
+               set_opt(sbi->s_mount_opt, ERRORS_RO);
 
        sbi->s_resuid = le16_to_cpu(es->s_def_resuid);
        sbi->s_resgid = le16_to_cpu(es->s_def_resgid);
@@ -1689,6 +1906,11 @@ static int ext4_fill_super (struct super_block *sb, void *data, int silent)
         * User -o noextents to turn it off
         */
        set_opt(sbi->s_mount_opt, EXTENTS);
+       /*
+        * turn on mballoc feature by default in ext4 filesystem
+        * User -o nomballoc to turn it off
+        */
+       set_opt(sbi->s_mount_opt, MBALLOC);
 
        if (!parse_options ((char *) data, sb, &journal_inum, &journal_devnum,
                            NULL, 0))
@@ -1723,6 +1945,19 @@ static int ext4_fill_super (struct super_block *sb, void *data, int silent)
                       sb->s_id, le32_to_cpu(features));
                goto failed_mount;
        }
+       if (EXT4_HAS_RO_COMPAT_FEATURE(sb, EXT4_FEATURE_RO_COMPAT_HUGE_FILE)) {
+               /*
+                * Large file size enabled file system can only be
+                * mount if kernel is build with CONFIG_LSF
+                */
+               if (sizeof(root->i_blocks) < sizeof(u64) &&
+                               !(sb->s_flags & MS_RDONLY)) {
+                       printk(KERN_ERR "EXT4-fs: %s: Filesystem with huge "
+                                       "files cannot be mounted read-write "
+                                       "without CONFIG_LSF.\n", sb->s_id);
+                       goto failed_mount;
+               }
+       }
        blocksize = BLOCK_SIZE << le32_to_cpu(es->s_log_block_size);
 
        if (blocksize < EXT4_MIN_BLOCK_SIZE ||
@@ -1733,20 +1968,16 @@ static int ext4_fill_super (struct super_block *sb, void *data, int silent)
                goto failed_mount;
        }
 
-       hblock = bdev_hardsect_size(sb->s_bdev);
        if (sb->s_blocksize != blocksize) {
-               /*
-                * Make sure the blocksize for the filesystem is larger
-                * than the hardware sectorsize for the machine.
-                */
-               if (blocksize < hblock) {
-                       printk(KERN_ERR "EXT4-fs: blocksize %d too small for "
-                              "device blocksize %d.\n", blocksize, hblock);
+
+               /* Validate the filesystem blocksize */
+               if (!sb_set_blocksize(sb, blocksize)) {
+                       printk(KERN_ERR "EXT4-fs: bad block size %d.\n",
+                                       blocksize);
                        goto failed_mount;
                }
 
                brelse (bh);
-               sb_set_blocksize(sb, blocksize);
                logical_sb_block = sb_block * EXT4_MIN_BLOCK_SIZE;
                offset = do_div(logical_sb_block, blocksize);
                bh = sb_bread(sb, logical_sb_block);
@@ -1764,6 +1995,7 @@ static int ext4_fill_super (struct super_block *sb, void *data, int silent)
                }
        }
 
+       sbi->s_bitmap_maxbytes = ext4_max_bitmap_size(sb->s_blocksize_bits);
        sb->s_maxbytes = ext4_max_size(sb->s_blocksize_bits);
 
        if (le32_to_cpu(es->s_rev_level) == EXT4_GOOD_OLD_REV) {
@@ -1838,6 +2070,17 @@ static int ext4_fill_super (struct super_block *sb, void *data, int silent)
 
        if (EXT4_BLOCKS_PER_GROUP(sb) == 0)
                goto cantfind_ext4;
+
+       /* ensure blocks_count calculation below doesn't sign-extend */
+       if (ext4_blocks_count(es) + EXT4_BLOCKS_PER_GROUP(sb) <
+           le32_to_cpu(es->s_first_data_block) + 1) {
+               printk(KERN_WARNING "EXT4-fs: bad geometry: block count %llu, "
+                      "first data block %u, blocks per group %lu\n",
+                       ext4_blocks_count(es),
+                       le32_to_cpu(es->s_first_data_block),
+                       EXT4_BLOCKS_PER_GROUP(sb));
+               goto failed_mount;
+       }
        blocks_count = (ext4_blocks_count(es) -
                        le32_to_cpu(es->s_first_data_block) +
                        EXT4_BLOCKS_PER_GROUP(sb) - 1);
@@ -1900,6 +2143,8 @@ static int ext4_fill_super (struct super_block *sb, void *data, int silent)
        sbi->s_rsv_window_head.rsv_goal_size = 0;
        ext4_rsv_window_add(sb, &sbi->s_rsv_window_head);
 
+       sbi->s_stripe = ext4_get_stripe_size(sbi);
+
        /*
         * set up enough so that it can read an inode
         */
@@ -1944,6 +2189,21 @@ static int ext4_fill_super (struct super_block *sb, void *data, int silent)
                goto failed_mount4;
        }
 
+       if (test_opt(sb, JOURNAL_ASYNC_COMMIT)) {
+               jbd2_journal_set_features(sbi->s_journal,
+                               JBD2_FEATURE_COMPAT_CHECKSUM, 0,
+                               JBD2_FEATURE_INCOMPAT_ASYNC_COMMIT);
+       } else if (test_opt(sb, JOURNAL_CHECKSUM)) {
+               jbd2_journal_set_features(sbi->s_journal,
+                               JBD2_FEATURE_COMPAT_CHECKSUM, 0, 0);
+               jbd2_journal_clear_features(sbi->s_journal, 0, 0,
+                               JBD2_FEATURE_INCOMPAT_ASYNC_COMMIT);
+       } else {
+               jbd2_journal_clear_features(sbi->s_journal,
+                               JBD2_FEATURE_COMPAT_CHECKSUM, 0,
+                               JBD2_FEATURE_INCOMPAT_ASYNC_COMMIT);
+       }
+
        /* We have now updated the journal if required, so we can
         * validate the data journaling mode. */
        switch (test_opt(sb, DATA_FLAGS)) {
@@ -2044,6 +2304,7 @@ static int ext4_fill_super (struct super_block *sb, void *data, int silent)
                "writeback");
 
        ext4_ext_init(sb);
+       ext4_mb_init(sb, needs_recovery);
 
        lock_kernel();
        return 0;
@@ -2673,7 +2934,7 @@ static int ext4_statfs (struct dentry * dentry, struct kstatfs * buf)
        if (test_opt(sb, MINIX_DF)) {
                sbi->s_overhead_last = 0;
        } else if (sbi->s_blocks_last != ext4_blocks_count(es)) {
-               unsigned long ngroups = sbi->s_groups_count, i;
+               ext4_group_t ngroups = sbi->s_groups_count, i;
                ext4_fsblk_t overhead = 0;
                smp_rmb();
 
@@ -2909,7 +3170,7 @@ static ssize_t ext4_quota_read(struct super_block *sb, int type, char *data,
                               size_t len, loff_t off)
 {
        struct inode *inode = sb_dqopt(sb)->files[type];
-       sector_t blk = off >> EXT4_BLOCK_SIZE_BITS(sb);
+       ext4_lblk_t blk = off >> EXT4_BLOCK_SIZE_BITS(sb);
        int err = 0;
        int offset = off & (sb->s_blocksize - 1);
        int tocopy;
@@ -2947,7 +3208,7 @@ static ssize_t ext4_quota_write(struct super_block *sb, int type,
                                const char *data, size_t len, loff_t off)
 {
        struct inode *inode = sb_dqopt(sb)->files[type];
-       sector_t blk = off >> EXT4_BLOCK_SIZE_BITS(sb);
+       ext4_lblk_t blk = off >> EXT4_BLOCK_SIZE_BITS(sb);
        int err = 0;
        int offset = off & (sb->s_blocksize - 1);
        int tocopy;
@@ -3002,7 +3263,6 @@ out:
                i_size_write(inode, off+len-towrite);
                EXT4_I(inode)->i_disksize = inode->i_size;
        }
-       inode->i_version++;
        inode->i_mtime = inode->i_ctime = CURRENT_TIME;
        ext4_mark_inode_dirty(handle, inode);
        mutex_unlock(&inode->i_mutex);
@@ -3027,9 +3287,15 @@ static struct file_system_type ext4dev_fs_type = {
 
 static int __init init_ext4_fs(void)
 {
-       int err = init_ext4_xattr();
+       int err;
+
+       err = init_ext4_mballoc();
        if (err)
                return err;
+
+       err = init_ext4_xattr();
+       if (err)
+               goto out2;
        err = init_inodecache();
        if (err)
                goto out1;
@@ -3041,6 +3307,8 @@ out:
        destroy_inodecache();
 out1:
        exit_ext4_xattr();
+out2:
+       exit_ext4_mballoc();
        return err;
 }
 
@@ -3049,6 +3317,7 @@ static void __exit exit_ext4_fs(void)
        unregister_filesystem(&ext4dev_fs_type);
        destroy_inodecache();
        exit_ext4_xattr();
+       exit_ext4_mballoc();
 }
 
 MODULE_AUTHOR("Remy Card, Stephen Tweedie, Andrew Morton, Andreas Dilger, Theodore Ts'o and others");
index 86387302c2a9b45591e71c8e6900fdde92c09eaf..d7962139c0108a329386b86d61088cc4dedb3eb1 100644 (file)
@@ -480,7 +480,7 @@ ext4_xattr_release_block(handle_t *handle, struct inode *inode,
                ea_bdebug(bh, "refcount now=0; freeing");
                if (ce)
                        mb_cache_entry_free(ce);
-               ext4_free_blocks(handle, inode, bh->b_blocknr, 1);
+               ext4_free_blocks(handle, inode, bh->b_blocknr, 1, 1);
                get_bh(bh);
                ext4_forget(handle, 1, inode, bh, bh->b_blocknr);
        } else {
@@ -821,7 +821,7 @@ inserted:
                        new_bh = sb_getblk(sb, block);
                        if (!new_bh) {
 getblk_failed:
-                               ext4_free_blocks(handle, inode, block, 1);
+                               ext4_free_blocks(handle, inode, block, 1, 1);
                                error = -EIO;
                                goto cleanup;
                        }
index ed35383d0b6c4f9b53fcc536da045cacfaf6c123..276ffd6b6fdd61cee59073bf6df95624fead672a 100644 (file)
@@ -1276,6 +1276,11 @@ void file_update_time(struct file *file)
                sync_it = 1;
        }
 
+       if (IS_I_VERSION(inode)) {
+               inode_inc_iversion(inode);
+               sync_it = 1;
+       }
+
        if (sync_it)
                mark_inode_dirty_sync(inode);
 }
index 3fccde7ba008cb7b760f8128f9b571038e83b67c..1b7f282c1ae9e9940ca57009e3d3c6f73054433b 100644 (file)
@@ -232,7 +232,8 @@ __flush_batch(journal_t *journal, struct buffer_head **bhs, int *batch_count)
  * Called under jbd_lock_bh_state(jh2bh(jh)), and drops it
  */
 static int __process_buffer(journal_t *journal, struct journal_head *jh,
-                       struct buffer_head **bhs, int *batch_count)
+                       struct buffer_head **bhs, int *batch_count,
+                       transaction_t *transaction)
 {
        struct buffer_head *bh = jh2bh(jh);
        int ret = 0;
@@ -250,6 +251,7 @@ static int __process_buffer(journal_t *journal, struct journal_head *jh,
                transaction_t *t = jh->b_transaction;
                tid_t tid = t->t_tid;
 
+               transaction->t_chp_stats.cs_forced_to_close++;
                spin_unlock(&journal->j_list_lock);
                jbd_unlock_bh_state(bh);
                jbd2_log_start_commit(journal, tid);
@@ -279,6 +281,7 @@ static int __process_buffer(journal_t *journal, struct journal_head *jh,
                bhs[*batch_count] = bh;
                __buffer_relink_io(jh);
                jbd_unlock_bh_state(bh);
+               transaction->t_chp_stats.cs_written++;
                (*batch_count)++;
                if (*batch_count == NR_BATCH) {
                        spin_unlock(&journal->j_list_lock);
@@ -322,6 +325,8 @@ int jbd2_log_do_checkpoint(journal_t *journal)
        if (!journal->j_checkpoint_transactions)
                goto out;
        transaction = journal->j_checkpoint_transactions;
+       if (transaction->t_chp_stats.cs_chp_time == 0)
+               transaction->t_chp_stats.cs_chp_time = jiffies;
        this_tid = transaction->t_tid;
 restart:
        /*
@@ -346,7 +351,8 @@ restart:
                                retry = 1;
                                break;
                        }
-                       retry = __process_buffer(journal, jh, bhs,&batch_count);
+                       retry = __process_buffer(journal, jh, bhs, &batch_count,
+                                                transaction);
                        if (!retry && lock_need_resched(&journal->j_list_lock)){
                                spin_unlock(&journal->j_list_lock);
                                retry = 1;
@@ -602,15 +608,15 @@ int __jbd2_journal_remove_checkpoint(struct journal_head *jh)
 
        /*
         * There is one special case to worry about: if we have just pulled the
-        * buffer off a committing transaction's forget list, then even if the
-        * checkpoint list is empty, the transaction obviously cannot be
-        * dropped!
+        * buffer off a running or committing transaction's checkpoing list,
+        * then even if the checkpoint list is empty, the transaction obviously
+        * cannot be dropped!
         *
-        * The locking here around j_committing_transaction is a bit sleazy.
+        * The locking here around t_state is a bit sleazy.
         * See the comment at the end of jbd2_journal_commit_transaction().
         */
-       if (transaction == journal->j_committing_transaction) {
-               JBUFFER_TRACE(jh, "belongs to committing transaction");
+       if (transaction->t_state != T_FINISHED) {
+               JBUFFER_TRACE(jh, "belongs to running/committing transaction");
                goto out;
        }
 
index 6986f334c643291181cd56c01c42a8820ad110f0..da8d0eb3b7b9c8933091561e7e44852fd814de66 100644 (file)
@@ -20,6 +20,8 @@
 #include <linux/slab.h>
 #include <linux/mm.h>
 #include <linux/pagemap.h>
+#include <linux/jiffies.h>
+#include <linux/crc32.h>
 
 /*
  * Default IO end handler for temporary BJ_IO buffer_heads.
@@ -92,19 +94,23 @@ static int inverted_lock(journal_t *journal, struct buffer_head *bh)
        return 1;
 }
 
-/* Done it all: now write the commit record.  We should have
+/*
+ * Done it all: now submit the commit record.  We should have
  * cleaned up our previous buffers by now, so if we are in abort
  * mode we can now just skip the rest of the journal write
  * entirely.
  *
  * Returns 1 if the journal needs to be aborted or 0 on success
  */
-static int journal_write_commit_record(journal_t *journal,
-                                       transaction_t *commit_transaction)
+static int journal_submit_commit_record(journal_t *journal,
+                                       transaction_t *commit_transaction,
+                                       struct buffer_head **cbh,
+                                       __u32 crc32_sum)
 {
        struct journal_head *descriptor;
+       struct commit_header *tmp;
        struct buffer_head *bh;
-       int i, ret;
+       int ret;
        int barrier_done = 0;
 
        if (is_journal_aborted(journal))
@@ -116,21 +122,33 @@ static int journal_write_commit_record(journal_t *journal,
 
        bh = jh2bh(descriptor);
 
-       /* AKPM: buglet - add `i' to tmp! */
-       for (i = 0; i < bh->b_size; i += 512) {
-               journal_header_t *tmp = (journal_header_t*)bh->b_data;
-               tmp->h_magic = cpu_to_be32(JBD2_MAGIC_NUMBER);
-               tmp->h_blocktype = cpu_to_be32(JBD2_COMMIT_BLOCK);
-               tmp->h_sequence = cpu_to_be32(commit_transaction->t_tid);
+       tmp = (struct commit_header *)bh->b_data;
+       tmp->h_magic = cpu_to_be32(JBD2_MAGIC_NUMBER);
+       tmp->h_blocktype = cpu_to_be32(JBD2_COMMIT_BLOCK);
+       tmp->h_sequence = cpu_to_be32(commit_transaction->t_tid);
+
+       if (JBD2_HAS_COMPAT_FEATURE(journal,
+                                   JBD2_FEATURE_COMPAT_CHECKSUM)) {
+               tmp->h_chksum_type      = JBD2_CRC32_CHKSUM;
+               tmp->h_chksum_size      = JBD2_CRC32_CHKSUM_SIZE;
+               tmp->h_chksum[0]        = cpu_to_be32(crc32_sum);
        }
 
-       JBUFFER_TRACE(descriptor, "write commit block");
+       JBUFFER_TRACE(descriptor, "submit commit block");
+       lock_buffer(bh);
+
        set_buffer_dirty(bh);
-       if (journal->j_flags & JBD2_BARRIER) {
+       set_buffer_uptodate(bh);
+       bh->b_end_io = journal_end_buffer_io_sync;
+
+       if (journal->j_flags & JBD2_BARRIER &&
+               !JBD2_HAS_COMPAT_FEATURE(journal,
+                                        JBD2_FEATURE_INCOMPAT_ASYNC_COMMIT)) {
                set_buffer_ordered(bh);
                barrier_done = 1;
        }
-       ret = sync_dirty_buffer(bh);
+       ret = submit_bh(WRITE, bh);
+
        /* is it possible for another commit to fail at roughly
         * the same time as this one?  If so, we don't want to
         * trust the barrier flag in the super, but instead want
@@ -151,14 +169,72 @@ static int journal_write_commit_record(journal_t *journal,
                clear_buffer_ordered(bh);
                set_buffer_uptodate(bh);
                set_buffer_dirty(bh);
-               ret = sync_dirty_buffer(bh);
+               ret = submit_bh(WRITE, bh);
        }
-       put_bh(bh);             /* One for getblk() */
-       jbd2_journal_put_journal_head(descriptor);
+       *cbh = bh;
+       return ret;
+}
+
+/*
+ * This function along with journal_submit_commit_record
+ * allows to write the commit record asynchronously.
+ */
+static int journal_wait_on_commit_record(struct buffer_head *bh)
+{
+       int ret = 0;
+
+       clear_buffer_dirty(bh);
+       wait_on_buffer(bh);
+
+       if (unlikely(!buffer_uptodate(bh)))
+               ret = -EIO;
+       put_bh(bh);            /* One for getblk() */
+       jbd2_journal_put_journal_head(bh2jh(bh));
 
-       return (ret == -EIO);
+       return ret;
 }
 
+/*
+ * Wait for all submitted IO to complete.
+ */
+static int journal_wait_on_locked_list(journal_t *journal,
+                                      transaction_t *commit_transaction)
+{
+       int ret = 0;
+       struct journal_head *jh;
+
+       while (commit_transaction->t_locked_list) {
+               struct buffer_head *bh;
+
+               jh = commit_transaction->t_locked_list->b_tprev;
+               bh = jh2bh(jh);
+               get_bh(bh);
+               if (buffer_locked(bh)) {
+                       spin_unlock(&journal->j_list_lock);
+                       wait_on_buffer(bh);
+                       if (unlikely(!buffer_uptodate(bh)))
+                               ret = -EIO;
+                       spin_lock(&journal->j_list_lock);
+               }
+               if (!inverted_lock(journal, bh)) {
+                       put_bh(bh);
+                       spin_lock(&journal->j_list_lock);
+                       continue;
+               }
+               if (buffer_jbd(bh) && jh->b_jlist == BJ_Locked) {
+                       __jbd2_journal_unfile_buffer(jh);
+                       jbd_unlock_bh_state(bh);
+                       jbd2_journal_remove_journal_head(bh);
+                       put_bh(bh);
+               } else {
+                       jbd_unlock_bh_state(bh);
+               }
+               put_bh(bh);
+               cond_resched_lock(&journal->j_list_lock);
+       }
+       return ret;
+  }
+
 static void journal_do_submit_data(struct buffer_head **wbuf, int bufs)
 {
        int i;
@@ -274,7 +350,21 @@ write_out_data:
        journal_do_submit_data(wbuf, bufs);
 }
 
-static inline void write_tag_block(int tag_bytes, journal_block_tag_t *tag,
+static __u32 jbd2_checksum_data(__u32 crc32_sum, struct buffer_head *bh)
+{
+       struct page *page = bh->b_page;
+       char *addr;
+       __u32 checksum;
+
+       addr = kmap_atomic(page, KM_USER0);
+       checksum = crc32_be(crc32_sum,
+               (void *)(addr + offset_in_page(bh->b_data)), bh->b_size);
+       kunmap_atomic(addr, KM_USER0);
+
+       return checksum;
+}
+
+static void write_tag_block(int tag_bytes, journal_block_tag_t *tag,
                                   unsigned long long block)
 {
        tag->t_blocknr = cpu_to_be32(block & (u32)~0);
@@ -290,6 +380,7 @@ static inline void write_tag_block(int tag_bytes, journal_block_tag_t *tag,
  */
 void jbd2_journal_commit_transaction(journal_t *journal)
 {
+       struct transaction_stats_s stats;
        transaction_t *commit_transaction;
        struct journal_head *jh, *new_jh, *descriptor;
        struct buffer_head **wbuf = journal->j_wbuf;
@@ -305,6 +396,8 @@ void jbd2_journal_commit_transaction(journal_t *journal)
        int tag_flag;
        int i;
        int tag_bytes = journal_tag_bytes(journal);
+       struct buffer_head *cbh = NULL; /* For transactional checksums */
+       __u32 crc32_sum = ~0;
 
        /*
         * First job: lock down the current transaction and wait for
@@ -337,6 +430,11 @@ void jbd2_journal_commit_transaction(journal_t *journal)
        spin_lock(&journal->j_state_lock);
        commit_transaction->t_state = T_LOCKED;
 
+       stats.u.run.rs_wait = commit_transaction->t_max_wait;
+       stats.u.run.rs_locked = jiffies;
+       stats.u.run.rs_running = jbd2_time_diff(commit_transaction->t_start,
+                                               stats.u.run.rs_locked);
+
        spin_lock(&commit_transaction->t_handle_lock);
        while (commit_transaction->t_updates) {
                DEFINE_WAIT(wait);
@@ -407,6 +505,10 @@ void jbd2_journal_commit_transaction(journal_t *journal)
         */
        jbd2_journal_switch_revoke_table(journal);
 
+       stats.u.run.rs_flushing = jiffies;
+       stats.u.run.rs_locked = jbd2_time_diff(stats.u.run.rs_locked,
+                                              stats.u.run.rs_flushing);
+
        commit_transaction->t_state = T_FLUSH;
        journal->j_committing_transaction = commit_transaction;
        journal->j_running_transaction = NULL;
@@ -440,38 +542,15 @@ void jbd2_journal_commit_transaction(journal_t *journal)
        journal_submit_data_buffers(journal, commit_transaction);
 
        /*
-        * Wait for all previously submitted IO to complete.
+        * Wait for all previously submitted IO to complete if commit
+        * record is to be written synchronously.
         */
        spin_lock(&journal->j_list_lock);
-       while (commit_transaction->t_locked_list) {
-               struct buffer_head *bh;
+       if (!JBD2_HAS_INCOMPAT_FEATURE(journal,
+               JBD2_FEATURE_INCOMPAT_ASYNC_COMMIT))
+               err = journal_wait_on_locked_list(journal,
+                                               commit_transaction);
 
-               jh = commit_transaction->t_locked_list->b_tprev;
-               bh = jh2bh(jh);
-               get_bh(bh);
-               if (buffer_locked(bh)) {
-                       spin_unlock(&journal->j_list_lock);
-                       wait_on_buffer(bh);
-                       if (unlikely(!buffer_uptodate(bh)))
-                               err = -EIO;
-                       spin_lock(&journal->j_list_lock);
-               }
-               if (!inverted_lock(journal, bh)) {
-                       put_bh(bh);
-                       spin_lock(&journal->j_list_lock);
-                       continue;
-               }
-               if (buffer_jbd(bh) && jh->b_jlist == BJ_Locked) {
-                       __jbd2_journal_unfile_buffer(jh);
-                       jbd_unlock_bh_state(bh);
-                       jbd2_journal_remove_journal_head(bh);
-                       put_bh(bh);
-               } else {
-                       jbd_unlock_bh_state(bh);
-               }
-               put_bh(bh);
-               cond_resched_lock(&journal->j_list_lock);
-       }
        spin_unlock(&journal->j_list_lock);
 
        if (err)
@@ -498,6 +577,12 @@ void jbd2_journal_commit_transaction(journal_t *journal)
         */
        commit_transaction->t_state = T_COMMIT;
 
+       stats.u.run.rs_logging = jiffies;
+       stats.u.run.rs_flushing = jbd2_time_diff(stats.u.run.rs_flushing,
+                                                stats.u.run.rs_logging);
+       stats.u.run.rs_blocks = commit_transaction->t_outstanding_credits;
+       stats.u.run.rs_blocks_logged = 0;
+
        descriptor = NULL;
        bufs = 0;
        while (commit_transaction->t_buffers) {
@@ -639,6 +724,15 @@ void jbd2_journal_commit_transaction(journal_t *journal)
 start_journal_io:
                        for (i = 0; i < bufs; i++) {
                                struct buffer_head *bh = wbuf[i];
+                               /*
+                                * Compute checksum.
+                                */
+                               if (JBD2_HAS_COMPAT_FEATURE(journal,
+                                       JBD2_FEATURE_COMPAT_CHECKSUM)) {
+                                       crc32_sum =
+                                           jbd2_checksum_data(crc32_sum, bh);
+                               }
+
                                lock_buffer(bh);
                                clear_buffer_dirty(bh);
                                set_buffer_uptodate(bh);
@@ -646,6 +740,7 @@ start_journal_io:
                                submit_bh(WRITE, bh);
                        }
                        cond_resched();
+                       stats.u.run.rs_blocks_logged += bufs;
 
                        /* Force a new descriptor to be generated next
                            time round the loop. */
@@ -654,6 +749,23 @@ start_journal_io:
                }
        }
 
+       /* Done it all: now write the commit record asynchronously. */
+
+       if (JBD2_HAS_INCOMPAT_FEATURE(journal,
+               JBD2_FEATURE_INCOMPAT_ASYNC_COMMIT)) {
+               err = journal_submit_commit_record(journal, commit_transaction,
+                                                &cbh, crc32_sum);
+               if (err)
+                       __jbd2_journal_abort_hard(journal);
+
+               spin_lock(&journal->j_list_lock);
+               err = journal_wait_on_locked_list(journal,
+                                               commit_transaction);
+               spin_unlock(&journal->j_list_lock);
+               if (err)
+                       __jbd2_journal_abort_hard(journal);
+       }
+
        /* Lo and behold: we have just managed to send a transaction to
            the log.  Before we can commit it, wait for the IO so far to
            complete.  Control buffers being written are on the
@@ -753,8 +865,14 @@ wait_for_iobuf:
 
        jbd_debug(3, "JBD: commit phase 6\n");
 
-       if (journal_write_commit_record(journal, commit_transaction))
-               err = -EIO;
+       if (!JBD2_HAS_INCOMPAT_FEATURE(journal,
+               JBD2_FEATURE_INCOMPAT_ASYNC_COMMIT)) {
+               err = journal_submit_commit_record(journal, commit_transaction,
+                                               &cbh, crc32_sum);
+               if (err)
+                       __jbd2_journal_abort_hard(journal);
+       }
+       err = journal_wait_on_commit_record(cbh);
 
        if (err)
                jbd2_journal_abort(journal, err);
@@ -816,6 +934,7 @@ restart_loop:
                cp_transaction = jh->b_cp_transaction;
                if (cp_transaction) {
                        JBUFFER_TRACE(jh, "remove from old cp transaction");
+                       cp_transaction->t_chp_stats.cs_dropped++;
                        __jbd2_journal_remove_checkpoint(jh);
                }
 
@@ -867,10 +986,10 @@ restart_loop:
        }
        spin_unlock(&journal->j_list_lock);
        /*
-        * This is a bit sleazy.  We borrow j_list_lock to protect
-        * journal->j_committing_transaction in __jbd2_journal_remove_checkpoint.
-        * Really, __jbd2_journal_remove_checkpoint should be using j_state_lock but
-        * it's a bit hassle to hold that across __jbd2_journal_remove_checkpoint
+        * This is a bit sleazy.  We use j_list_lock to protect transition
+        * of a transaction into T_FINISHED state and calling
+        * __jbd2_journal_drop_transaction(). Otherwise we could race with
+        * other checkpointing code processing the transaction...
         */
        spin_lock(&journal->j_state_lock);
        spin_lock(&journal->j_list_lock);
@@ -890,6 +1009,36 @@ restart_loop:
 
        J_ASSERT(commit_transaction->t_state == T_COMMIT);
 
+       commit_transaction->t_start = jiffies;
+       stats.u.run.rs_logging = jbd2_time_diff(stats.u.run.rs_logging,
+                                               commit_transaction->t_start);
+
+       /*
+        * File the transaction for history
+        */
+       stats.ts_type = JBD2_STATS_RUN;
+       stats.ts_tid = commit_transaction->t_tid;
+       stats.u.run.rs_handle_count = commit_transaction->t_handle_count;
+       spin_lock(&journal->j_history_lock);
+       memcpy(journal->j_history + journal->j_history_cur, &stats,
+                       sizeof(stats));
+       if (++journal->j_history_cur == journal->j_history_max)
+               journal->j_history_cur = 0;
+
+       /*
+        * Calculate overall stats
+        */
+       journal->j_stats.ts_tid++;
+       journal->j_stats.u.run.rs_wait += stats.u.run.rs_wait;
+       journal->j_stats.u.run.rs_running += stats.u.run.rs_running;
+       journal->j_stats.u.run.rs_locked += stats.u.run.rs_locked;
+       journal->j_stats.u.run.rs_flushing += stats.u.run.rs_flushing;
+       journal->j_stats.u.run.rs_logging += stats.u.run.rs_logging;
+       journal->j_stats.u.run.rs_handle_count += stats.u.run.rs_handle_count;
+       journal->j_stats.u.run.rs_blocks += stats.u.run.rs_blocks;
+       journal->j_stats.u.run.rs_blocks_logged += stats.u.run.rs_blocks_logged;
+       spin_unlock(&journal->j_history_lock);
+
        commit_transaction->t_state = T_FINISHED;
        J_ASSERT(commit_transaction == journal->j_committing_transaction);
        journal->j_commit_sequence = commit_transaction->t_tid;
index 6ddc5531587c7c6598180f105e46645f13d74337..96ba846992e934f476511bbfd40152868f211d8b 100644 (file)
@@ -36,6 +36,7 @@
 #include <linux/poison.h>
 #include <linux/proc_fs.h>
 #include <linux/debugfs.h>
+#include <linux/seq_file.h>
 
 #include <asm/uaccess.h>
 #include <asm/page.h>
@@ -640,6 +641,312 @@ struct journal_head *jbd2_journal_get_descriptor_buffer(journal_t *journal)
        return jbd2_journal_add_journal_head(bh);
 }
 
+struct jbd2_stats_proc_session {
+       journal_t *journal;
+       struct transaction_stats_s *stats;
+       int start;
+       int max;
+};
+
+static void *jbd2_history_skip_empty(struct jbd2_stats_proc_session *s,
+                                       struct transaction_stats_s *ts,
+                                       int first)
+{
+       if (ts == s->stats + s->max)
+               ts = s->stats;
+       if (!first && ts == s->stats + s->start)
+               return NULL;
+       while (ts->ts_type == 0) {
+               ts++;
+               if (ts == s->stats + s->max)
+                       ts = s->stats;
+               if (ts == s->stats + s->start)
+                       return NULL;
+       }
+       return ts;
+
+}
+
+static void *jbd2_seq_history_start(struct seq_file *seq, loff_t *pos)
+{
+       struct jbd2_stats_proc_session *s = seq->private;
+       struct transaction_stats_s *ts;
+       int l = *pos;
+
+       if (l == 0)
+               return SEQ_START_TOKEN;
+       ts = jbd2_history_skip_empty(s, s->stats + s->start, 1);
+       if (!ts)
+               return NULL;
+       l--;
+       while (l) {
+               ts = jbd2_history_skip_empty(s, ++ts, 0);
+               if (!ts)
+                       break;
+               l--;
+       }
+       return ts;
+}
+
+static void *jbd2_seq_history_next(struct seq_file *seq, void *v, loff_t *pos)
+{
+       struct jbd2_stats_proc_session *s = seq->private;
+       struct transaction_stats_s *ts = v;
+
+       ++*pos;
+       if (v == SEQ_START_TOKEN)
+               return jbd2_history_skip_empty(s, s->stats + s->start, 1);
+       else
+               return jbd2_history_skip_empty(s, ++ts, 0);
+}
+
+static int jbd2_seq_history_show(struct seq_file *seq, void *v)
+{
+       struct transaction_stats_s *ts = v;
+       if (v == SEQ_START_TOKEN) {
+               seq_printf(seq, "%-4s %-5s %-5s %-5s %-5s %-5s %-5s %-6s %-5s "
+                               "%-5s %-5s %-5s %-5s %-5s\n", "R/C", "tid",
+                               "wait", "run", "lock", "flush", "log", "hndls",
+                               "block", "inlog", "ctime", "write", "drop",
+                               "close");
+               return 0;
+       }
+       if (ts->ts_type == JBD2_STATS_RUN)
+               seq_printf(seq, "%-4s %-5lu %-5u %-5u %-5u %-5u %-5u "
+                               "%-6lu %-5lu %-5lu\n", "R", ts->ts_tid,
+                               jiffies_to_msecs(ts->u.run.rs_wait),
+                               jiffies_to_msecs(ts->u.run.rs_running),
+                               jiffies_to_msecs(ts->u.run.rs_locked),
+                               jiffies_to_msecs(ts->u.run.rs_flushing),
+                               jiffies_to_msecs(ts->u.run.rs_logging),
+                               ts->u.run.rs_handle_count,
+                               ts->u.run.rs_blocks,
+                               ts->u.run.rs_blocks_logged);
+       else if (ts->ts_type == JBD2_STATS_CHECKPOINT)
+               seq_printf(seq, "%-4s %-5lu %48s %-5u %-5lu %-5lu %-5lu\n",
+                               "C", ts->ts_tid, " ",
+                               jiffies_to_msecs(ts->u.chp.cs_chp_time),
+                               ts->u.chp.cs_written, ts->u.chp.cs_dropped,
+                               ts->u.chp.cs_forced_to_close);
+       else
+               J_ASSERT(0);
+       return 0;
+}
+
+static void jbd2_seq_history_stop(struct seq_file *seq, void *v)
+{
+}
+
+static struct seq_operations jbd2_seq_history_ops = {
+       .start  = jbd2_seq_history_start,
+       .next   = jbd2_seq_history_next,
+       .stop   = jbd2_seq_history_stop,
+       .show   = jbd2_seq_history_show,
+};
+
+static int jbd2_seq_history_open(struct inode *inode, struct file *file)
+{
+       journal_t *journal = PDE(inode)->data;
+       struct jbd2_stats_proc_session *s;
+       int rc, size;
+
+       s = kmalloc(sizeof(*s), GFP_KERNEL);
+       if (s == NULL)
+               return -ENOMEM;
+       size = sizeof(struct transaction_stats_s) * journal->j_history_max;
+       s->stats = kmalloc(size, GFP_KERNEL);
+       if (s->stats == NULL) {
+               kfree(s);
+               return -ENOMEM;
+       }
+       spin_lock(&journal->j_history_lock);
+       memcpy(s->stats, journal->j_history, size);
+       s->max = journal->j_history_max;
+       s->start = journal->j_history_cur % s->max;
+       spin_unlock(&journal->j_history_lock);
+
+       rc = seq_open(file, &jbd2_seq_history_ops);
+       if (rc == 0) {
+               struct seq_file *m = file->private_data;
+               m->private = s;
+       } else {
+               kfree(s->stats);
+               kfree(s);
+       }
+       return rc;
+
+}
+
+static int jbd2_seq_history_release(struct inode *inode, struct file *file)
+{
+       struct seq_file *seq = file->private_data;
+       struct jbd2_stats_proc_session *s = seq->private;
+
+       kfree(s->stats);
+       kfree(s);
+       return seq_release(inode, file);
+}
+
+static struct file_operations jbd2_seq_history_fops = {
+       .owner          = THIS_MODULE,
+       .open           = jbd2_seq_history_open,
+       .read           = seq_read,
+       .llseek         = seq_lseek,
+       .release        = jbd2_seq_history_release,
+};
+
+static void *jbd2_seq_info_start(struct seq_file *seq, loff_t *pos)
+{
+       return *pos ? NULL : SEQ_START_TOKEN;
+}
+
+static void *jbd2_seq_info_next(struct seq_file *seq, void *v, loff_t *pos)
+{
+       return NULL;
+}
+
+static int jbd2_seq_info_show(struct seq_file *seq, void *v)
+{
+       struct jbd2_stats_proc_session *s = seq->private;
+
+       if (v != SEQ_START_TOKEN)
+               return 0;
+       seq_printf(seq, "%lu transaction, each upto %u blocks\n",
+                       s->stats->ts_tid,
+                       s->journal->j_max_transaction_buffers);
+       if (s->stats->ts_tid == 0)
+               return 0;
+       seq_printf(seq, "average: \n  %ums waiting for transaction\n",
+           jiffies_to_msecs(s->stats->u.run.rs_wait / s->stats->ts_tid));
+       seq_printf(seq, "  %ums running transaction\n",
+           jiffies_to_msecs(s->stats->u.run.rs_running / s->stats->ts_tid));
+       seq_printf(seq, "  %ums transaction was being locked\n",
+           jiffies_to_msecs(s->stats->u.run.rs_locked / s->stats->ts_tid));
+       seq_printf(seq, "  %ums flushing data (in ordered mode)\n",
+           jiffies_to_msecs(s->stats->u.run.rs_flushing / s->stats->ts_tid));
+       seq_printf(seq, "  %ums logging transaction\n",
+           jiffies_to_msecs(s->stats->u.run.rs_logging / s->stats->ts_tid));
+       seq_printf(seq, "  %lu handles per transaction\n",
+           s->stats->u.run.rs_handle_count / s->stats->ts_tid);
+       seq_printf(seq, "  %lu blocks per transaction\n",
+           s->stats->u.run.rs_blocks / s->stats->ts_tid);
+       seq_printf(seq, "  %lu logged blocks per transaction\n",
+           s->stats->u.run.rs_blocks_logged / s->stats->ts_tid);
+       return 0;
+}
+
+static void jbd2_seq_info_stop(struct seq_file *seq, void *v)
+{
+}
+
+static struct seq_operations jbd2_seq_info_ops = {
+       .start  = jbd2_seq_info_start,
+       .next   = jbd2_seq_info_next,
+       .stop   = jbd2_seq_info_stop,
+       .show   = jbd2_seq_info_show,
+};
+
+static int jbd2_seq_info_open(struct inode *inode, struct file *file)
+{
+       journal_t *journal = PDE(inode)->data;
+       struct jbd2_stats_proc_session *s;
+       int rc, size;
+
+       s = kmalloc(sizeof(*s), GFP_KERNEL);
+       if (s == NULL)
+               return -ENOMEM;
+       size = sizeof(struct transaction_stats_s);
+       s->stats = kmalloc(size, GFP_KERNEL);
+       if (s->stats == NULL) {
+               kfree(s);
+               return -ENOMEM;
+       }
+       spin_lock(&journal->j_history_lock);
+       memcpy(s->stats, &journal->j_stats, size);
+       s->journal = journal;
+       spin_unlock(&journal->j_history_lock);
+
+       rc = seq_open(file, &jbd2_seq_info_ops);
+       if (rc == 0) {
+               struct seq_file *m = file->private_data;
+               m->private = s;
+       } else {
+               kfree(s->stats);
+               kfree(s);
+       }
+       return rc;
+
+}
+
+static int jbd2_seq_info_release(struct inode *inode, struct file *file)
+{
+       struct seq_file *seq = file->private_data;
+       struct jbd2_stats_proc_session *s = seq->private;
+       kfree(s->stats);
+       kfree(s);
+       return seq_release(inode, file);
+}
+
+static struct file_operations jbd2_seq_info_fops = {
+       .owner          = THIS_MODULE,
+       .open           = jbd2_seq_info_open,
+       .read           = seq_read,
+       .llseek         = seq_lseek,
+       .release        = jbd2_seq_info_release,
+};
+
+static struct proc_dir_entry *proc_jbd2_stats;
+
+static void jbd2_stats_proc_init(journal_t *journal)
+{
+       char name[BDEVNAME_SIZE];
+
+       snprintf(name, sizeof(name) - 1, "%s", bdevname(journal->j_dev, name));
+       journal->j_proc_entry = proc_mkdir(name, proc_jbd2_stats);
+       if (journal->j_proc_entry) {
+               struct proc_dir_entry *p;
+               p = create_proc_entry("history", S_IRUGO,
+                               journal->j_proc_entry);
+               if (p) {
+                       p->proc_fops = &jbd2_seq_history_fops;
+                       p->data = journal;
+                       p = create_proc_entry("info", S_IRUGO,
+                                               journal->j_proc_entry);
+                       if (p) {
+                               p->proc_fops = &jbd2_seq_info_fops;
+                               p->data = journal;
+                       }
+               }
+       }
+}
+
+static void jbd2_stats_proc_exit(journal_t *journal)
+{
+       char name[BDEVNAME_SIZE];
+
+       snprintf(name, sizeof(name) - 1, "%s", bdevname(journal->j_dev, name));
+       remove_proc_entry("info", journal->j_proc_entry);
+       remove_proc_entry("history", journal->j_proc_entry);
+       remove_proc_entry(name, proc_jbd2_stats);
+}
+
+static void journal_init_stats(journal_t *journal)
+{
+       int size;
+
+       if (!proc_jbd2_stats)
+               return;
+
+       journal->j_history_max = 100;
+       size = sizeof(struct transaction_stats_s) * journal->j_history_max;
+       journal->j_history = kzalloc(size, GFP_KERNEL);
+       if (!journal->j_history) {
+               journal->j_history_max = 0;
+               return;
+       }
+       spin_lock_init(&journal->j_history_lock);
+}
+
 /*
  * Management for journal control blocks: functions to create and
  * destroy journal_t structures, and to initialise and read existing
@@ -681,6 +988,9 @@ static journal_t * journal_init_common (void)
                kfree(journal);
                goto fail;
        }
+
+       journal_init_stats(journal);
+
        return journal;
 fail:
        return NULL;
@@ -735,6 +1045,7 @@ journal_t * jbd2_journal_init_dev(struct block_device *bdev,
        journal->j_fs_dev = fs_dev;
        journal->j_blk_offset = start;
        journal->j_maxlen = len;
+       jbd2_stats_proc_init(journal);
 
        bh = __getblk(journal->j_dev, start, journal->j_blocksize);
        J_ASSERT(bh != NULL);
@@ -773,6 +1084,7 @@ journal_t * jbd2_journal_init_inode (struct inode *inode)
 
        journal->j_maxlen = inode->i_size >> inode->i_sb->s_blocksize_bits;
        journal->j_blocksize = inode->i_sb->s_blocksize;
+       jbd2_stats_proc_init(journal);
 
        /* journal descriptor can store up to n blocks -bzzz */
        n = journal->j_blocksize / sizeof(journal_block_tag_t);
@@ -1153,6 +1465,8 @@ void jbd2_journal_destroy(journal_t *journal)
                brelse(journal->j_sb_buffer);
        }
 
+       if (journal->j_proc_entry)
+               jbd2_stats_proc_exit(journal);
        if (journal->j_inode)
                iput(journal->j_inode);
        if (journal->j_revoke)
@@ -1264,6 +1578,32 @@ int jbd2_journal_set_features (journal_t *journal, unsigned long compat,
        return 1;
 }
 
+/*
+ * jbd2_journal_clear_features () - Clear a given journal feature in the
+ *                                 superblock
+ * @journal: Journal to act on.
+ * @compat: bitmask of compatible features
+ * @ro: bitmask of features that force read-only mount
+ * @incompat: bitmask of incompatible features
+ *
+ * Clear a given journal feature as present on the
+ * superblock.
+ */
+void jbd2_journal_clear_features(journal_t *journal, unsigned long compat,
+                               unsigned long ro, unsigned long incompat)
+{
+       journal_superblock_t *sb;
+
+       jbd_debug(1, "Clear features 0x%lx/0x%lx/0x%lx\n",
+                 compat, ro, incompat);
+
+       sb = journal->j_superblock;
+
+       sb->s_feature_compat    &= ~cpu_to_be32(compat);
+       sb->s_feature_ro_compat &= ~cpu_to_be32(ro);
+       sb->s_feature_incompat  &= ~cpu_to_be32(incompat);
+}
+EXPORT_SYMBOL(jbd2_journal_clear_features);
 
 /**
  * int jbd2_journal_update_format () - Update on-disk journal structure.
@@ -1633,7 +1973,7 @@ static int journal_init_jbd2_journal_head_cache(void)
        jbd2_journal_head_cache = kmem_cache_create("jbd2_journal_head",
                                sizeof(struct journal_head),
                                0,              /* offset */
-                               0,              /* flags */
+                               SLAB_TEMPORARY, /* flags */
                                NULL);          /* ctor */
        retval = 0;
        if (jbd2_journal_head_cache == 0) {
@@ -1900,6 +2240,28 @@ static void __exit jbd2_remove_debugfs_entry(void)
 
 #endif
 
+#ifdef CONFIG_PROC_FS
+
+#define JBD2_STATS_PROC_NAME "fs/jbd2"
+
+static void __init jbd2_create_jbd_stats_proc_entry(void)
+{
+       proc_jbd2_stats = proc_mkdir(JBD2_STATS_PROC_NAME, NULL);
+}
+
+static void __exit jbd2_remove_jbd_stats_proc_entry(void)
+{
+       if (proc_jbd2_stats)
+               remove_proc_entry(JBD2_STATS_PROC_NAME, NULL);
+}
+
+#else
+
+#define jbd2_create_jbd_stats_proc_entry() do {} while (0)
+#define jbd2_remove_jbd_stats_proc_entry() do {} while (0)
+
+#endif
+
 struct kmem_cache *jbd2_handle_cache;
 
 static int __init journal_init_handle_cache(void)
@@ -1907,7 +2269,7 @@ static int __init journal_init_handle_cache(void)
        jbd2_handle_cache = kmem_cache_create("jbd2_journal_handle",
                                sizeof(handle_t),
                                0,              /* offset */
-                               0,              /* flags */
+                               SLAB_TEMPORARY, /* flags */
                                NULL);          /* ctor */
        if (jbd2_handle_cache == NULL) {
                printk(KERN_EMERG "JBD: failed to create handle cache\n");
@@ -1955,6 +2317,7 @@ static int __init journal_init(void)
        if (ret != 0)
                jbd2_journal_destroy_caches();
        jbd2_create_debugfs_entry();
+       jbd2_create_jbd_stats_proc_entry();
        return ret;
 }
 
@@ -1966,6 +2329,7 @@ static void __exit journal_exit(void)
                printk(KERN_EMERG "JBD: leaked %d journal_heads!\n", n);
 #endif
        jbd2_remove_debugfs_entry();
+       jbd2_remove_jbd_stats_proc_entry();
        jbd2_journal_destroy_caches();
 }
 
index d0ce627539ef11710993c0c4236dc4821e0eacdf..921680663fa2771319837ad1ff24d4c8f9d8c55e 100644 (file)
@@ -21,6 +21,7 @@
 #include <linux/jbd2.h>
 #include <linux/errno.h>
 #include <linux/slab.h>
+#include <linux/crc32.h>
 #endif
 
 /*
@@ -316,6 +317,37 @@ static inline unsigned long long read_tag_block(int tag_bytes, journal_block_tag
        return block;
 }
 
+/*
+ * calc_chksums calculates the checksums for the blocks described in the
+ * descriptor block.
+ */
+static int calc_chksums(journal_t *journal, struct buffer_head *bh,
+                       unsigned long *next_log_block, __u32 *crc32_sum)
+{
+       int i, num_blks, err;
+       unsigned long io_block;
+       struct buffer_head *obh;
+
+       num_blks = count_tags(journal, bh);
+       /* Calculate checksum of the descriptor block. */
+       *crc32_sum = crc32_be(*crc32_sum, (void *)bh->b_data, bh->b_size);
+
+       for (i = 0; i < num_blks; i++) {
+               io_block = (*next_log_block)++;
+               wrap(journal, *next_log_block);
+               err = jread(&obh, journal, io_block);
+               if (err) {
+                       printk(KERN_ERR "JBD: IO error %d recovering block "
+                               "%lu in log\n", err, io_block);
+                       return 1;
+               } else {
+                       *crc32_sum = crc32_be(*crc32_sum, (void *)obh->b_data,
+                                    obh->b_size);
+               }
+       }
+       return 0;
+}
+
 static int do_one_pass(journal_t *journal,
                        struct recovery_info *info, enum passtype pass)
 {
@@ -328,6 +360,7 @@ static int do_one_pass(journal_t *journal,
        unsigned int            sequence;
        int                     blocktype;
        int                     tag_bytes = journal_tag_bytes(journal);
+       __u32                   crc32_sum = ~0; /* Transactional Checksums */
 
        /* Precompute the maximum metadata descriptors in a descriptor block */
        int                     MAX_BLOCKS_PER_DESC;
@@ -419,12 +452,26 @@ static int do_one_pass(journal_t *journal,
                switch(blocktype) {
                case JBD2_DESCRIPTOR_BLOCK:
                        /* If it is a valid descriptor block, replay it
-                        * in pass REPLAY; otherwise, just skip over the
-                        * blocks it describes. */
+                        * in pass REPLAY; if journal_checksums enabled, then
+                        * calculate checksums in PASS_SCAN, otherwise,
+                        * just skip over the blocks it describes. */
                        if (pass != PASS_REPLAY) {
+                               if (pass == PASS_SCAN &&
+                                   JBD2_HAS_COMPAT_FEATURE(journal,
+                                           JBD2_FEATURE_COMPAT_CHECKSUM) &&
+                                   !info->end_transaction) {
+                                       if (calc_chksums(journal, bh,
+                                                       &next_log_block,
+                                                       &crc32_sum)) {
+                                               put_bh(bh);
+                                               break;
+                                       }
+                                       put_bh(bh);
+                                       continue;
+                               }
                                next_log_block += count_tags(journal, bh);
                                wrap(journal, next_log_block);
-                               brelse(bh);
+                               put_bh(bh);
                                continue;
                        }
 
@@ -516,9 +563,96 @@ static int do_one_pass(journal_t *journal,
                        continue;
 
                case JBD2_COMMIT_BLOCK:
-                       /* Found an expected commit block: not much to
-                        * do other than move on to the next sequence
+                       /*     How to differentiate between interrupted commit
+                        *               and journal corruption ?
+                        *
+                        * {nth transaction}
+                        *        Checksum Verification Failed
+                        *                       |
+                        *               ____________________
+                        *              |                    |
+                        *      async_commit             sync_commit
+                        *              |                    |
+                        *              | GO TO NEXT    "Journal Corruption"
+                        *              | TRANSACTION
+                        *              |
+                        * {(n+1)th transanction}
+                        *              |
+                        *       _______|______________
+                        *      |                     |
+                        * Commit block found   Commit block not found
+                        *      |                     |
+                        * "Journal Corruption"       |
+                        *               _____________|_________
+                        *              |                       |
+                        *      nth trans corrupt       OR   nth trans
+                        *      and (n+1)th interrupted     interrupted
+                        *      before commit block
+                        *      could reach the disk.
+                        *      (Cannot find the difference in above
+                        *       mentioned conditions. Hence assume
+                        *       "Interrupted Commit".)
+                        */
+
+                       /* Found an expected commit block: if checksums
+                        * are present verify them in PASS_SCAN; else not
+                        * much to do other than move on to the next sequence
                         * number. */
+                       if (pass == PASS_SCAN &&
+                           JBD2_HAS_COMPAT_FEATURE(journal,
+                                   JBD2_FEATURE_COMPAT_CHECKSUM)) {
+                               int chksum_err, chksum_seen;
+                               struct commit_header *cbh =
+                                       (struct commit_header *)bh->b_data;
+                               unsigned found_chksum =
+                                       be32_to_cpu(cbh->h_chksum[0]);
+
+                               chksum_err = chksum_seen = 0;
+
+                               if (info->end_transaction) {
+                                       printk(KERN_ERR "JBD: Transaction %u "
+                                               "found to be corrupt.\n",
+                                               next_commit_ID - 1);
+                                       brelse(bh);
+                                       break;
+                               }
+
+                               if (crc32_sum == found_chksum &&
+                                   cbh->h_chksum_type == JBD2_CRC32_CHKSUM &&
+                                   cbh->h_chksum_size ==
+                                               JBD2_CRC32_CHKSUM_SIZE)
+                                      chksum_seen = 1;
+                               else if (!(cbh->h_chksum_type == 0 &&
+                                            cbh->h_chksum_size == 0 &&
+                                            found_chksum == 0 &&
+                                            !chksum_seen))
+                               /*
+                                * If fs is mounted using an old kernel and then
+                                * kernel with journal_chksum is used then we
+                                * get a situation where the journal flag has
+                                * checksum flag set but checksums are not
+                                * present i.e chksum = 0, in the individual
+                                * commit blocks.
+                                * Hence to avoid checksum failures, in this
+                                * situation, this extra check is added.
+                                */
+                                               chksum_err = 1;
+
+                               if (chksum_err) {
+                                       info->end_transaction = next_commit_ID;
+
+                                       if (!JBD2_HAS_COMPAT_FEATURE(journal,
+                                          JBD2_FEATURE_INCOMPAT_ASYNC_COMMIT)){
+                                               printk(KERN_ERR
+                                                      "JBD: Transaction %u "
+                                                      "found to be corrupt.\n",
+                                                      next_commit_ID);
+                                               brelse(bh);
+                                               break;
+                                       }
+                               }
+                               crc32_sum = ~0;
+                       }
                        brelse(bh);
                        next_commit_ID++;
                        continue;
@@ -554,9 +688,10 @@ static int do_one_pass(journal_t *journal,
         * transaction marks the end of the valid log.
         */
 
-       if (pass == PASS_SCAN)
-               info->end_transaction = next_commit_ID;
-       else {
+       if (pass == PASS_SCAN) {
+               if (!info->end_transaction)
+                       info->end_transaction = next_commit_ID;
+       } else {
                /* It's really bad news if different passes end up at
                 * different places (but possible due to IO errors). */
                if (info->end_transaction != next_commit_ID) {
index 3595fd432d5b55b25806e0bf48d03a2756de040a..df36f42e19e11687b32bd08741c5969db325e116 100644 (file)
@@ -171,13 +171,15 @@ int __init jbd2_journal_init_revoke_caches(void)
 {
        jbd2_revoke_record_cache = kmem_cache_create("jbd2_revoke_record",
                                           sizeof(struct jbd2_revoke_record_s),
-                                          0, SLAB_HWCACHE_ALIGN, NULL);
+                                          0,
+                                          SLAB_HWCACHE_ALIGN|SLAB_TEMPORARY,
+                                          NULL);
        if (jbd2_revoke_record_cache == 0)
                return -ENOMEM;
 
        jbd2_revoke_table_cache = kmem_cache_create("jbd2_revoke_table",
                                           sizeof(struct jbd2_revoke_table_s),
-                                          0, 0, NULL);
+                                          0, SLAB_TEMPORARY, NULL);
        if (jbd2_revoke_table_cache == 0) {
                kmem_cache_destroy(jbd2_revoke_record_cache);
                jbd2_revoke_record_cache = NULL;
index b1fcf2b3dca3e9154c2f1aa4ded8c796e1cbaac3..b9b0b6f899b91b2000cc4f34e72c97931670fe53 100644 (file)
@@ -54,11 +54,13 @@ jbd2_get_transaction(journal_t *journal, transaction_t *transaction)
        spin_lock_init(&transaction->t_handle_lock);
 
        /* Set up the commit timer for the new transaction. */
-       journal->j_commit_timer.expires = transaction->t_expires;
+       journal->j_commit_timer.expires = round_jiffies(transaction->t_expires);
        add_timer(&journal->j_commit_timer);
 
        J_ASSERT(journal->j_running_transaction == NULL);
        journal->j_running_transaction = transaction;
+       transaction->t_max_wait = 0;
+       transaction->t_start = jiffies;
 
        return transaction;
 }
@@ -85,6 +87,7 @@ static int start_this_handle(journal_t *journal, handle_t *handle)
        int nblocks = handle->h_buffer_credits;
        transaction_t *new_transaction = NULL;
        int ret = 0;
+       unsigned long ts = jiffies;
 
        if (nblocks > journal->j_max_transaction_buffers) {
                printk(KERN_ERR "JBD: %s wants too many credits (%d > %d)\n",
@@ -217,6 +220,12 @@ repeat_locked:
        /* OK, account for the buffers that this operation expects to
         * use and add the handle to the running transaction. */
 
+       if (time_after(transaction->t_start, ts)) {
+               ts = jbd2_time_diff(ts, transaction->t_start);
+               if (ts > transaction->t_max_wait)
+                       transaction->t_max_wait = ts;
+       }
+
        handle->h_transaction = transaction;
        transaction->t_outstanding_credits += nblocks;
        transaction->t_updates++;
@@ -232,6 +241,8 @@ out:
        return ret;
 }
 
+static struct lock_class_key jbd2_handle_key;
+
 /* Allocate a new handle.  This should probably be in a slab... */
 static handle_t *new_handle(int nblocks)
 {
@@ -242,6 +253,9 @@ static handle_t *new_handle(int nblocks)
        handle->h_buffer_credits = nblocks;
        handle->h_ref = 1;
 
+       lockdep_init_map(&handle->h_lockdep_map, "jbd2_handle",
+                                               &jbd2_handle_key, 0);
+
        return handle;
 }
 
@@ -284,7 +298,11 @@ handle_t *jbd2_journal_start(journal_t *journal, int nblocks)
                jbd2_free_handle(handle);
                current->journal_info = NULL;
                handle = ERR_PTR(err);
+               goto out;
        }
+
+       lock_acquire(&handle->h_lockdep_map, 0, 0, 0, 2, _THIS_IP_);
+out:
        return handle;
 }
 
@@ -1164,7 +1182,7 @@ int jbd2_journal_dirty_metadata(handle_t *handle, struct buffer_head *bh)
        }
 
        /* That test should have eliminated the following case: */
-       J_ASSERT_JH(jh, jh->b_frozen_data == 0);
+       J_ASSERT_JH(jh, jh->b_frozen_data == NULL);
 
        JBUFFER_TRACE(jh, "file as BJ_Metadata");
        spin_lock(&journal->j_list_lock);
@@ -1410,6 +1428,8 @@ int jbd2_journal_stop(handle_t *handle)
                spin_unlock(&journal->j_state_lock);
        }
 
+       lock_release(&handle->h_lockdep_map, 1, _THIS_IP_);
+
        jbd2_free_handle(handle);
        return err;
 }
@@ -1512,7 +1532,7 @@ void __jbd2_journal_temp_unlink_buffer(struct journal_head *jh)
 
        J_ASSERT_JH(jh, jh->b_jlist < BJ_Types);
        if (jh->b_jlist != BJ_None)
-               J_ASSERT_JH(jh, transaction != 0);
+               J_ASSERT_JH(jh, transaction != NULL);
 
        switch (jh->b_jlist) {
        case BJ_None:
@@ -1581,11 +1601,11 @@ __journal_try_to_free_buffer(journal_t *journal, struct buffer_head *bh)
        if (buffer_locked(bh) || buffer_dirty(bh))
                goto out;
 
-       if (jh->b_next_transaction != 0)
+       if (jh->b_next_transaction != NULL)
                goto out;
 
        spin_lock(&journal->j_list_lock);
-       if (jh->b_transaction != 0 && jh->b_cp_transaction == 0) {
+       if (jh->b_transaction != NULL && jh->b_cp_transaction == NULL) {
                if (jh->b_jlist == BJ_SyncData || jh->b_jlist == BJ_Locked) {
                        /* A written-back ordered data buffer */
                        JBUFFER_TRACE(jh, "release data");
@@ -1593,7 +1613,7 @@ __journal_try_to_free_buffer(journal_t *journal, struct buffer_head *bh)
                        jbd2_journal_remove_journal_head(bh);
                        __brelse(bh);
                }
-       } else if (jh->b_cp_transaction != 0 && jh->b_transaction == 0) {
+       } else if (jh->b_cp_transaction != NULL && jh->b_transaction == NULL) {
                /* written-back checkpointed metadata buffer */
                if (jh->b_jlist == BJ_None) {
                        JBUFFER_TRACE(jh, "remove from checkpoint list");
@@ -1953,7 +1973,7 @@ void __jbd2_journal_file_buffer(struct journal_head *jh,
 
        J_ASSERT_JH(jh, jh->b_jlist < BJ_Types);
        J_ASSERT_JH(jh, jh->b_transaction == transaction ||
-                               jh->b_transaction == 0);
+                               jh->b_transaction == NULL);
 
        if (jh->b_transaction && jh->b_jlist == jlist)
                return;
index a4b07730b2e1d0abb257a126fce7f3911ae1d434..0c095ce7723d87db01bbb5d980e0fd4643764eb7 100644 (file)
@@ -64,7 +64,7 @@ int o2cb_sys_init(void)
 {
        int ret;
 
-       o2cb_kset = kset_create_and_add("o2cb", NULL, fs_kobj);
+       o2cb_kset = kset_create_and_add("o2cb", NULL, NULL);
        if (!o2cb_kset)
                return -ENOMEM;
 
index c4d3d17923f155004fec85fd04f836a06aa585f3..1c177f29e1b72be0aa63f38496ed81727d01f653 100644 (file)
@@ -446,6 +446,7 @@ unsigned long iov_shorten(struct iovec *iov, unsigned long nr_segs, size_t to)
        }
        return seg;
 }
+EXPORT_SYMBOL(iov_shorten);
 
 ssize_t do_sync_readv_writev(struct file *filp, const struct iovec *iov,
                unsigned long nr_segs, size_t len, loff_t *ppos, iov_fn_t fn)
index 6673ee82cb4c09667ed06805a1505870c156220e..4faf8c4722c35804fcbbae88f06e2f3137584876 100644 (file)
@@ -16,23 +16,3 @@ EXTRA_CFLAGS += -DSMBFS_PARANOIA
 #EXTRA_CFLAGS += -DDEBUG_SMB_TIMESTAMP
 #EXTRA_CFLAGS += -Werror
 
-#
-# Maintainer rules
-#
-
-# getopt.c not included. It is intentionally separate
-SRC = proc.c dir.c cache.c sock.c inode.c file.c ioctl.c smbiod.c request.c \
-       symlink.c
-
-proto:
-       -rm -f proto.h
-       @echo >  proto2.h "/*"
-       @echo >> proto2.h " *  Autogenerated with cproto on: " `date`
-       @echo >> proto2.h " */"
-       @echo >> proto2.h ""
-       @echo >> proto2.h "struct smb_request;"
-       @echo >> proto2.h "struct sock;"
-       @echo >> proto2.h "struct statfs;"
-       @echo >> proto2.h ""
-       cproto -E "gcc -E" -e -v -I $(TOPDIR)/include -DMAKING_PROTO -D__KERNEL__ $(SRC) >> proto2.h
-       mv proto2.h proto.h
index 47a6b086eee27fd3ef5dc9a9a04a09db8c5488ee..5c60bfc1a84ddd92e67a20c0bd82ed194654bfdf 100644 (file)
@@ -310,6 +310,8 @@ static inline int constant_fls(int x)
                _find_first_zero_bit_le(p,sz)
 #define ext2_find_next_zero_bit(p,sz,off)      \
                _find_next_zero_bit_le(p,sz,off)
+#define ext2_find_next_bit(p, sz, off) \
+               _find_next_bit_le(p, sz, off)
 
 /*
  * Minix is defined to use little-endian byte ordering.
index b0828d43e110c93f27b65adf593fc0c40e58a676..ea3070ff13a528872b55915195c707ae1079f764 100644 (file)
@@ -110,7 +110,7 @@ struct tagtable {
        int     (*parse)(struct tag *);
 };
 
-#define __tag __attribute_used__ __attribute__((__section__(".taglist.init")))
+#define __tag __used __attribute__((__section__(".taglist.init")))
 #define __tagtable(tag, fn)                                            \
        static struct tagtable __tagtable_##fn __tag = { tag, fn }
 
index 1697404afa052fdec4d1d1528e91fcc012df7408..63cf822431a295b6d6ec2b05e4656eb20675ebaa 100644 (file)
@@ -14,5 +14,7 @@
        generic_find_first_zero_le_bit((unsigned long *)(addr), (size))
 #define ext2_find_next_zero_bit(addr, size, off) \
        generic_find_next_zero_le_bit((unsigned long *)(addr), (size), (off))
+#define ext2_find_next_bit(addr, size, off) \
+       generic_find_next_le_bit((unsigned long *)(addr), (size), (off))
 
 #endif /* _ASM_GENERIC_BITOPS_EXT2_NON_ATOMIC_H_ */
index b9c7e5d2d2ad67631e4318665fe900ac36634097..80e3bf13b2b97ed613c4126413bfbde8766dbda4 100644 (file)
@@ -20,6 +20,8 @@
 #define generic___test_and_clear_le_bit(nr, addr) __test_and_clear_bit(nr, addr)
 
 #define generic_find_next_zero_le_bit(addr, size, offset) find_next_zero_bit(addr, size, offset)
+#define generic_find_next_le_bit(addr, size, offset) \
+                       find_next_bit(addr, size, offset)
 
 #elif defined(__BIG_ENDIAN)
 
@@ -42,6 +44,8 @@
 
 extern unsigned long generic_find_next_zero_le_bit(const unsigned long *addr,
                unsigned long size, unsigned long offset);
+extern unsigned long generic_find_next_le_bit(const unsigned long *addr,
+               unsigned long size, unsigned long offset);
 
 #else
 #error "Please fix <asm/byteorder.h>"
index 9f584cc5c5fb48f9491b1cdbe8ceca5e01d844ca..76df771be58566171cd5b74ab16489ae1c5f2cea 100644 (file)
@@ -9,10 +9,46 @@
 /* Align . to a 8 byte boundary equals to maximum function alignment. */
 #define ALIGN_FUNCTION()  . = ALIGN(8)
 
+/* The actual configuration determine if the init/exit sections
+ * are handled as text/data or they can be discarded (which
+ * often happens at runtime)
+ */
+#ifdef CONFIG_HOTPLUG
+#define DEV_KEEP(sec)    *(.dev##sec)
+#define DEV_DISCARD(sec)
+#else
+#define DEV_KEEP(sec)
+#define DEV_DISCARD(sec) *(.dev##sec)
+#endif
+
+#ifdef CONFIG_HOTPLUG_CPU
+#define CPU_KEEP(sec)    *(.cpu##sec)
+#define CPU_DISCARD(sec)
+#else
+#define CPU_KEEP(sec)
+#define CPU_DISCARD(sec) *(.cpu##sec)
+#endif
+
+#if defined(CONFIG_MEMORY_HOTPLUG)
+#define MEM_KEEP(sec)    *(.mem##sec)
+#define MEM_DISCARD(sec)
+#else
+#define MEM_KEEP(sec)
+#define MEM_DISCARD(sec) *(.mem##sec)
+#endif
+
+
 /* .data section */
 #define DATA_DATA                                                      \
        *(.data)                                                        \
        *(.data.init.refok)                                             \
+       *(.ref.data)                                                    \
+       DEV_KEEP(init.data)                                             \
+       DEV_KEEP(exit.data)                                             \
+       CPU_KEEP(init.data)                                             \
+       CPU_KEEP(exit.data)                                             \
+       MEM_KEEP(init.data)                                             \
+       MEM_KEEP(exit.data)                                             \
        . = ALIGN(8);                                                   \
        VMLINUX_SYMBOL(__start___markers) = .;                          \
        *(__markers)                                                    \
                *(__ksymtab_strings)                                    \
        }                                                               \
                                                                        \
+       /* __*init sections */                                          \
+       __init_rodata : AT(ADDR(__init_rodata) - LOAD_OFFSET) {         \
+               *(.ref.rodata)                                          \
+               DEV_KEEP(init.rodata)                                   \
+               DEV_KEEP(exit.rodata)                                   \
+               CPU_KEEP(init.rodata)                                   \
+               CPU_KEEP(exit.rodata)                                   \
+               MEM_KEEP(init.rodata)                                   \
+               MEM_KEEP(exit.rodata)                                   \
+       }                                                               \
+                                                                       \
        /* Built-in module parameters. */                               \
        __param : AT(ADDR(__param) - LOAD_OFFSET) {                     \
                VMLINUX_SYMBOL(__start___param) = .;                    \
                VMLINUX_SYMBOL(__stop___param) = .;                     \
                VMLINUX_SYMBOL(__end_rodata) = .;                       \
        }                                                               \
-                                                                       \
        . = ALIGN((align));
 
 /* RODATA provided for backward compatibility.
 #define TEXT_TEXT                                                      \
                ALIGN_FUNCTION();                                       \
                *(.text)                                                \
+               *(.ref.text)                                            \
                *(.text.init.refok)                                     \
-               *(.exit.text.refok)
+               *(.exit.text.refok)                                     \
+       DEV_KEEP(init.text)                                             \
+       DEV_KEEP(exit.text)                                             \
+       CPU_KEEP(init.text)                                             \
+       CPU_KEEP(exit.text)                                             \
+       MEM_KEEP(init.text)                                             \
+       MEM_KEEP(exit.text)
+
 
 /* sched.text is aling to function alignment to secure we have same
  * address even at second ld pass when generating System.map */
                *(.kprobes.text)                                        \
                VMLINUX_SYMBOL(__kprobes_text_end) = .;
 
+/* init and exit section handling */
+#define INIT_DATA                                                      \
+       *(.init.data)                                                   \
+       DEV_DISCARD(init.data)                                          \
+       DEV_DISCARD(init.rodata)                                        \
+       CPU_DISCARD(init.data)                                          \
+       CPU_DISCARD(init.rodata)                                        \
+       MEM_DISCARD(init.data)                                          \
+       MEM_DISCARD(init.rodata)
+
+#define INIT_TEXT                                                      \
+       *(.init.text)                                                   \
+       DEV_DISCARD(init.text)                                          \
+       CPU_DISCARD(init.text)                                          \
+       MEM_DISCARD(init.text)
+
+#define EXIT_DATA                                                      \
+       *(.exit.data)                                                   \
+       DEV_DISCARD(exit.data)                                          \
+       DEV_DISCARD(exit.rodata)                                        \
+       CPU_DISCARD(exit.data)                                          \
+       CPU_DISCARD(exit.rodata)                                        \
+       MEM_DISCARD(exit.data)                                          \
+       MEM_DISCARD(exit.rodata)
+
+#define EXIT_TEXT                                                      \
+       *(.exit.text)                                                   \
+       DEV_DISCARD(exit.text)                                          \
+       CPU_DISCARD(exit.text)                                          \
+       MEM_DISCARD(exit.text)
+
                /* DWARF debug sections.
                Symbols in the DWARF debugging sections are relative to
                the beginning of the section so we begin them at 0.  */
index e58d3298fa109eb4ba733d580e39d3b06580dbcc..5b6665c754c914213f4e6fa8fb76bc497e5bbf1c 100644 (file)
@@ -24,7 +24,7 @@
 extern void ia64_bad_param_for_setreg (void);
 extern void ia64_bad_param_for_getreg (void);
 
-register unsigned long ia64_r13 asm ("r13") __attribute_used__;
+register unsigned long ia64_r13 asm ("r13") __used;
 
 #define ia64_setreg(regnum, val)                                               \
 ({                                                                             \
index 2976b5d68e96cbb5189cb1a348312939a699dab8..83d1f286230b6da68492a6b64fedaaaad8e5746e 100644 (file)
@@ -410,6 +410,8 @@ static inline int ext2_find_next_zero_bit(const void *vaddr, unsigned size,
        res = ext2_find_first_zero_bit (p, size - 32 * (p - addr));
        return (p - addr) * 32 + res;
 }
+#define ext2_find_next_bit(addr, size, off) \
+       generic_find_next_le_bit((unsigned long *)(addr), (size), (off))
 
 #endif /* __KERNEL__ */
 
index f8dfb7ba2e2573566a58cf97d01942adc48d4785..f43afe1fc3b33940dc1c46178045830f4bf01b44 100644 (file)
@@ -294,6 +294,8 @@ found_middle:
        return result + ffz(__swab32(tmp));
 }
 
+#define ext2_find_next_bit(addr, size, off) \
+       generic_find_next_le_bit((unsigned long *)(addr), (size), (off))
 #include <asm-generic/bitops/minix.h>
 
 #endif /* __KERNEL__ */
index 733b4af7f4f1e7e49269b17856bbf7e47924d7b6..220d9a781ab9c0f64030f15e45543472a950de2f 100644 (file)
@@ -359,6 +359,8 @@ static __inline__ int test_le_bit(unsigned long nr,
 unsigned long generic_find_next_zero_le_bit(const unsigned long *addr,
                                    unsigned long size, unsigned long offset);
 
+unsigned long generic_find_next_le_bit(const unsigned long *addr,
+                                   unsigned long size, unsigned long offset);
 /* Bitmap functions for the ext2 filesystem */
 
 #define ext2_set_bit(nr,addr) \
@@ -378,6 +380,8 @@ unsigned long generic_find_next_zero_le_bit(const unsigned long *addr,
 #define ext2_find_next_zero_bit(addr, size, off) \
        generic_find_next_zero_le_bit((unsigned long*)addr, size, off)
 
+#define ext2_find_next_bit(addr, size, off) \
+       generic_find_next_le_bit((unsigned long *)addr, size, off)
 /* Bitmap functions for the minix filesystem.  */
 
 #define minix_test_and_set_bit(nr,addr) \
index 34d9a6357c38941c70d7b638b2d2f19045291872..dba6fecad0be25353c535ddc1ac3feb64341dbcd 100644 (file)
@@ -772,6 +772,8 @@ static inline int sched_find_first_bit(unsigned long *b)
        test_and_clear_bit((nr)^(__BITOPS_WORDSIZE - 8), (unsigned long *)addr)
 #define ext2_test_bit(nr, addr)      \
        test_bit((nr)^(__BITOPS_WORDSIZE - 8), (unsigned long *)addr)
+#define ext2_find_next_bit(addr, size, off) \
+       generic_find_next_le_bit((unsigned long *)(addr), (size), (off))
 
 #ifndef __s390x__
 
index ddb18ad23303ae8a4f26dafdf38da0e3dd83c8fc..b2e4124070aef289a724a90049bba93adfcffa64 100644 (file)
@@ -65,6 +65,6 @@ extern struct sh_machine_vector sh_mv;
 #define get_system_type()      sh_mv.mv_name
 
 #define __initmv \
-       __attribute_used__ __attribute__((__section__ (".machvec.init")))
+       __used __section(.machvec.init)
 
 #endif /* _ASM_SH_MACHVEC_H */
index c6577d3dc46d594e33bc4a572f71d44bb02a88f2..c50e5d35fe84fefe26978011966a5d39f3d2ac7b 100644 (file)
@@ -68,7 +68,7 @@ struct thread_info {
 #define init_stack             (init_thread_union.stack)
 
 /* how to get the current stack pointer from C */
-register unsigned long current_stack_pointer asm("r15") __attribute_used__;
+register unsigned long current_stack_pointer asm("r15") __used;
 
 /* how to get the thread information struct from C */
 static inline struct thread_info *current_thread_info(void)
index ef58fd2a6eb089f49086f1dfec56d451af127f8b..a516e9192f1135a2a6b310caf5aaf0525df43220 100644 (file)
@@ -85,7 +85,7 @@ struct thread_info {
 
 
 /* how to get the current stack pointer from C */
-register unsigned long current_stack_pointer asm("esp") __attribute_used__;
+register unsigned long current_stack_pointer asm("esp") __used;
 
 /* how to get the thread information struct from C */
 static inline struct thread_info *current_thread_info(void)
index bd694f779346687e787a61b6547f15a9af28f0c7..ad99ce9f916960063e521d05017dace2f551f79f 100644 (file)
@@ -34,7 +34,6 @@ header-y += atmsap.h
 header-y += atmsvc.h
 header-y += atm_zatm.h
 header-y += auto_fs4.h
-header-y += auxvec.h
 header-y += ax25.h
 header-y += b1lli.h
 header-y += baycom.h
@@ -73,7 +72,7 @@ header-y += gen_stats.h
 header-y += gigaset_dev.h
 header-y += hdsmart.h
 header-y += hysdn_if.h
-header-y += i2c-dev.h
+header-y += i2o-dev.h
 header-y += i8k.h
 header-y += if_arcnet.h
 header-y += if_bonding.h
@@ -158,7 +157,6 @@ header-y += veth.h
 header-y += video_decoder.h
 header-y += video_encoder.h
 header-y += videotext.h
-header-y += vt.h
 header-y += x25.h
 
 unifdef-y += acct.h
@@ -173,6 +171,7 @@ unifdef-y += atm.h
 unifdef-y += atm_tcp.h
 unifdef-y += audit.h
 unifdef-y += auto_fs.h
+unifdef-y += auxvec.h
 unifdef-y += binfmts.h
 unifdef-y += capability.h
 unifdef-y += capi.h
@@ -214,7 +213,7 @@ unifdef-y += hdreg.h
 unifdef-y += hiddev.h
 unifdef-y += hpet.h
 unifdef-y += i2c.h
-unifdef-y += i2o-dev.h
+unifdef-y += i2c-dev.h
 unifdef-y += icmp.h
 unifdef-y += icmpv6.h
 unifdef-y += if_addr.h
@@ -349,6 +348,7 @@ unifdef-y += videodev.h
 unifdef-y += virtio_config.h
 unifdef-y += virtio_blk.h
 unifdef-y += virtio_net.h
+unifdef-y += vt.h
 unifdef-y += wait.h
 unifdef-y += wanrouter.h
 unifdef-y += watchdog.h
index da0d83fbadc0f30e11ebcb82171e75069431a463..e98801f06dcc04585a658cd6cae2f306154e8511 100644 (file)
@@ -192,6 +192,8 @@ int sync_dirty_buffer(struct buffer_head *bh);
 int submit_bh(int, struct buffer_head *);
 void write_boundary_block(struct block_device *bdev,
                        sector_t bblock, unsigned blocksize);
+int bh_uptodate_or_lock(struct buffer_head *bh);
+int bh_submit_read(struct buffer_head *bh);
 
 extern int buffer_heads_over_limit;
 
index 2d8c0f48f55e6236a4c25b8510feafb48670ff10..e5eb795f78a1c9b22f2c9c474b80c32a0cfff810 100644 (file)
@@ -7,10 +7,8 @@
 
 #if __GNUC_MINOR__ >= 3
 # define __used                        __attribute__((__used__))
-# define __attribute_used__    __used                          /* deprecated */
 #else
 # define __used                        __attribute__((__unused__))
-# define __attribute_used__    __used                          /* deprecated */
 #endif
 
 #if __GNUC_MINOR__ >= 4
index ee7ca5de970cb6cfadd4c04f5df4444ed07645a8..0ab3a32323307d1445392ea571550698c0cd70d1 100644 (file)
@@ -15,7 +15,6 @@
 #endif
 
 #define __used                 __attribute__((__used__))
-#define __attribute_used__     __used                  /* deprecated */
 #define __must_check           __attribute__((warn_unused_result))
 #define __compiler_offsetof(a,b) __builtin_offsetof(a,b)
 #define __always_inline                inline __attribute__((always_inline))
index c68b67b86ef1d5cccbed6f7f76f5a94c5a692c80..d0e17e1657dca11b86f151084a10bc87204c80a1 100644 (file)
@@ -126,10 +126,6 @@ extern void __chk_io_ptr(const volatile void __iomem *);
  * Mark functions that are referenced only in inline assembly as __used so
  * the code is emitted even though it appears to be unreferenced.
  */
-#ifndef __attribute_used__
-# define __attribute_used__    /* deprecated */
-#endif
-
 #ifndef __used
 # define __used                        /* unimplemented */
 #endif
@@ -175,4 +171,9 @@ extern void __chk_io_ptr(const volatile void __iomem *);
 #define __cold
 #endif
 
+/* Simple shorthand for a section definition */
+#ifndef __section
+# define __section(S) __attribute__ ((__section__(#S)))
+#endif
+
 #endif /* __LINUX_COMPILER_H */
index e831759b2fb5f971cfabdb94dfa9863b4f57cc4b..278e3ef0533699f2e9a3845bdc12715ce6be30cb 100644 (file)
@@ -76,7 +76,7 @@
                typeof(desc) _desc                                      \
                             __attribute__((aligned(sizeof(Elf##size##_Word)))); \
        } _ELFNOTE_PASTE(_note_, unique)                                \
-               __attribute_used__                                      \
+               __used                                                  \
                __attribute__((section(".note." name),                  \
                               aligned(sizeof(Elf##size##_Word)),       \
                               unused)) = {                             \
index 97dd409d5f4a1194fa6034242793d03290968f59..1852313fc7c73605887a1447ae33d18645c298c1 100644 (file)
@@ -20,6 +20,8 @@
 #include <linux/blkdev.h>
 #include <linux/magic.h>
 
+#include <linux/ext4_fs_i.h>
+
 /*
  * The second extended filesystem constants/structures
  */
 #define ext4_debug(f, a...)    do {} while (0)
 #endif
 
+#define EXT4_MULTIBLOCK_ALLOCATOR      1
+
+/* prefer goal again. length */
+#define EXT4_MB_HINT_MERGE             1
+/* blocks already reserved */
+#define EXT4_MB_HINT_RESERVED          2
+/* metadata is being allocated */
+#define EXT4_MB_HINT_METADATA          4
+/* first blocks in the file */
+#define EXT4_MB_HINT_FIRST             8
+/* search for the best chunk */
+#define EXT4_MB_HINT_BEST              16
+/* data is being allocated */
+#define EXT4_MB_HINT_DATA              32
+/* don't preallocate (for tails) */
+#define EXT4_MB_HINT_NOPREALLOC                64
+/* allocate for locality group */
+#define EXT4_MB_HINT_GROUP_ALLOC       128
+/* allocate goal blocks or none */
+#define EXT4_MB_HINT_GOAL_ONLY         256
+/* goal is meaningful */
+#define EXT4_MB_HINT_TRY_GOAL          512
+
+struct ext4_allocation_request {
+       /* target inode for block we're allocating */
+       struct inode *inode;
+       /* logical block in target inode */
+       ext4_lblk_t logical;
+       /* phys. target (a hint) */
+       ext4_fsblk_t goal;
+       /* the closest logical allocated block to the left */
+       ext4_lblk_t lleft;
+       /* phys. block for ^^^ */
+       ext4_fsblk_t pleft;
+       /* the closest logical allocated block to the right */
+       ext4_lblk_t lright;
+       /* phys. block for ^^^ */
+       ext4_fsblk_t pright;
+       /* how many blocks we want to allocate */
+       unsigned long len;
+       /* flags. see above EXT4_MB_HINT_* */
+       unsigned long flags;
+};
+
 /*
  * Special inodes numbers
  */
  * Macro-instructions used to manage several block sizes
  */
 #define EXT4_MIN_BLOCK_SIZE            1024
-#define        EXT4_MAX_BLOCK_SIZE             4096
-#define EXT4_MIN_BLOCK_LOG_SIZE                  10
+#define        EXT4_MAX_BLOCK_SIZE             65536
+#define EXT4_MIN_BLOCK_LOG_SIZE                10
 #ifdef __KERNEL__
 # define EXT4_BLOCK_SIZE(s)            ((s)->s_blocksize)
 #else
@@ -118,6 +164,11 @@ struct ext4_group_desc
        __le32  bg_block_bitmap_hi;     /* Blocks bitmap block MSB */
        __le32  bg_inode_bitmap_hi;     /* Inodes bitmap block MSB */
        __le32  bg_inode_table_hi;      /* Inodes table block MSB */
+       __le16  bg_free_blocks_count_hi;/* Free blocks count MSB */
+       __le16  bg_free_inodes_count_hi;/* Free inodes count MSB */
+       __le16  bg_used_dirs_count_hi;  /* Directories count MSB */
+       __le16  bg_itable_unused_hi;    /* Unused inodes count MSB */
+       __u32   bg_reserved2[3];
 };
 
 #define EXT4_BG_INODE_UNINIT   0x0001 /* Inode table/bitmap not in use */
@@ -178,8 +229,9 @@ struct ext4_group_desc
 #define EXT4_NOTAIL_FL                 0x00008000 /* file tail should not be merged */
 #define EXT4_DIRSYNC_FL                        0x00010000 /* dirsync behaviour (directories only) */
 #define EXT4_TOPDIR_FL                 0x00020000 /* Top of directory hierarchies*/
-#define EXT4_RESERVED_FL               0x80000000 /* reserved for ext4 lib */
+#define EXT4_HUGE_FILE_FL               0x00040000 /* Set to each huge file */
 #define EXT4_EXTENTS_FL                        0x00080000 /* Inode uses extents */
+#define EXT4_RESERVED_FL               0x80000000 /* reserved for ext4 lib */
 
 #define EXT4_FL_USER_VISIBLE           0x000BDFFF /* User visible flags */
 #define EXT4_FL_USER_MODIFIABLE                0x000380FF /* User modifiable flags */
@@ -237,6 +289,7 @@ struct ext4_new_group_data {
 #endif
 #define EXT4_IOC_GETRSVSZ              _IOR('f', 5, long)
 #define EXT4_IOC_SETRSVSZ              _IOW('f', 6, long)
+#define EXT4_IOC_MIGRATE               _IO('f', 7)
 
 /*
  * ioctl commands in 32 bit emulation
@@ -275,18 +328,18 @@ struct ext4_mount_options {
 struct ext4_inode {
        __le16  i_mode;         /* File mode */
        __le16  i_uid;          /* Low 16 bits of Owner Uid */
-       __le32  i_size;         /* Size in bytes */
+       __le32  i_size_lo;      /* Size in bytes */
        __le32  i_atime;        /* Access time */
        __le32  i_ctime;        /* Inode Change time */
        __le32  i_mtime;        /* Modification time */
        __le32  i_dtime;        /* Deletion Time */
        __le16  i_gid;          /* Low 16 bits of Group Id */
        __le16  i_links_count;  /* Links count */
-       __le32  i_blocks;       /* Blocks count */
+       __le32  i_blocks_lo;    /* Blocks count */
        __le32  i_flags;        /* File flags */
        union {
                struct {
-                       __u32  l_i_reserved1;
+                       __le32  l_i_version;
                } linux1;
                struct {
                        __u32  h_i_translator;
@@ -297,12 +350,12 @@ struct ext4_inode {
        } osd1;                         /* OS dependent 1 */
        __le32  i_block[EXT4_N_BLOCKS];/* Pointers to blocks */
        __le32  i_generation;   /* File version (for NFS) */
-       __le32  i_file_acl;     /* File ACL */
-       __le32  i_dir_acl;      /* Directory ACL */
+       __le32  i_file_acl_lo;  /* File ACL */
+       __le32  i_size_high;
        __le32  i_obso_faddr;   /* Obsoleted fragment address */
        union {
                struct {
-                       __le16  l_i_reserved1;  /* Obsoleted fragment number/size which are removed in ext4 */
+                       __le16  l_i_blocks_high; /* were l_i_reserved1 */
                        __le16  l_i_file_acl_high;
                        __le16  l_i_uid_high;   /* these 2 fields */
                        __le16  l_i_gid_high;   /* were reserved2[0] */
@@ -328,9 +381,9 @@ struct ext4_inode {
        __le32  i_atime_extra;  /* extra Access time      (nsec << 2 | epoch) */
        __le32  i_crtime;       /* File Creation time */
        __le32  i_crtime_extra; /* extra FileCreationtime (nsec << 2 | epoch) */
+       __le32  i_version_hi;   /* high 32 bits for 64-bit version */
 };
 
-#define i_size_high    i_dir_acl
 
 #define EXT4_EPOCH_BITS 2
 #define EXT4_EPOCH_MASK ((1 << EXT4_EPOCH_BITS) - 1)
@@ -402,9 +455,12 @@ do {                                                                              \
                                       raw_inode->xtime ## _extra);            \
 } while (0)
 
+#define i_disk_version osd1.linux1.l_i_version
+
 #if defined(__KERNEL__) || defined(__linux__)
 #define i_reserved1    osd1.linux1.l_i_reserved1
 #define i_file_acl_high        osd2.linux2.l_i_file_acl_high
+#define i_blocks_high  osd2.linux2.l_i_blocks_high
 #define i_uid_low      i_uid
 #define i_gid_low      i_gid
 #define i_uid_high     osd2.linux2.l_i_uid_high
@@ -461,7 +517,10 @@ do {                                                                              \
 #define EXT4_MOUNT_USRQUOTA            0x100000 /* "old" user quota */
 #define EXT4_MOUNT_GRPQUOTA            0x200000 /* "old" group quota */
 #define EXT4_MOUNT_EXTENTS             0x400000 /* Extents support */
-
+#define EXT4_MOUNT_JOURNAL_CHECKSUM    0x800000 /* Journal checksums */
+#define EXT4_MOUNT_JOURNAL_ASYNC_COMMIT        0x1000000 /* Journal Async Commit */
+#define EXT4_MOUNT_I_VERSION            0x2000000 /* i_version support */
+#define EXT4_MOUNT_MBALLOC             0x4000000 /* Buddy allocation support */
 /* Compatibility, for having both ext2_fs.h and ext4_fs.h included at once */
 #ifndef _LINUX_EXT2_FS_H
 #define clear_opt(o, opt)              o &= ~EXT4_MOUNT_##opt
@@ -481,6 +540,7 @@ do {                                                                               \
 #define ext4_test_bit                  ext2_test_bit
 #define ext4_find_first_zero_bit       ext2_find_first_zero_bit
 #define ext4_find_next_zero_bit                ext2_find_next_zero_bit
+#define ext4_find_next_bit             ext2_find_next_bit
 
 /*
  * Maximal mount counts between two filesystem checks
@@ -671,6 +731,7 @@ static inline int ext4_valid_inum(struct super_block *sb, unsigned long ino)
 #define EXT4_FEATURE_RO_COMPAT_SPARSE_SUPER    0x0001
 #define EXT4_FEATURE_RO_COMPAT_LARGE_FILE      0x0002
 #define EXT4_FEATURE_RO_COMPAT_BTREE_DIR       0x0004
+#define EXT4_FEATURE_RO_COMPAT_HUGE_FILE        0x0008
 #define EXT4_FEATURE_RO_COMPAT_GDT_CSUM                0x0010
 #define EXT4_FEATURE_RO_COMPAT_DIR_NLINK       0x0020
 #define EXT4_FEATURE_RO_COMPAT_EXTRA_ISIZE     0x0040
@@ -682,6 +743,7 @@ static inline int ext4_valid_inum(struct super_block *sb, unsigned long ino)
 #define EXT4_FEATURE_INCOMPAT_META_BG          0x0010
 #define EXT4_FEATURE_INCOMPAT_EXTENTS          0x0040 /* extents support */
 #define EXT4_FEATURE_INCOMPAT_64BIT            0x0080
+#define EXT4_FEATURE_INCOMPAT_MMP               0x0100
 #define EXT4_FEATURE_INCOMPAT_FLEX_BG          0x0200
 
 #define EXT4_FEATURE_COMPAT_SUPP       EXT2_FEATURE_COMPAT_EXT_ATTR
@@ -696,7 +758,8 @@ static inline int ext4_valid_inum(struct super_block *sb, unsigned long ino)
                                         EXT4_FEATURE_RO_COMPAT_GDT_CSUM| \
                                         EXT4_FEATURE_RO_COMPAT_DIR_NLINK | \
                                         EXT4_FEATURE_RO_COMPAT_EXTRA_ISIZE | \
-                                        EXT4_FEATURE_RO_COMPAT_BTREE_DIR)
+                                        EXT4_FEATURE_RO_COMPAT_BTREE_DIR |\
+                                        EXT4_FEATURE_RO_COMPAT_HUGE_FILE)
 
 /*
  * Default values for user and/or group using reserved blocks
@@ -767,6 +830,26 @@ struct ext4_dir_entry_2 {
 #define EXT4_DIR_ROUND                 (EXT4_DIR_PAD - 1)
 #define EXT4_DIR_REC_LEN(name_len)     (((name_len) + 8 + EXT4_DIR_ROUND) & \
                                         ~EXT4_DIR_ROUND)
+#define EXT4_MAX_REC_LEN               ((1<<16)-1)
+
+static inline unsigned ext4_rec_len_from_disk(__le16 dlen)
+{
+       unsigned len = le16_to_cpu(dlen);
+
+       if (len == EXT4_MAX_REC_LEN)
+               return 1 << 16;
+       return len;
+}
+
+static inline __le16 ext4_rec_len_to_disk(unsigned len)
+{
+       if (len == (1 << 16))
+               return cpu_to_le16(EXT4_MAX_REC_LEN);
+       else if (len > (1 << 16))
+               BUG();
+       return cpu_to_le16(len);
+}
+
 /*
  * Hash Tree Directory indexing
  * (c) Daniel Phillips, 2001
@@ -810,7 +893,7 @@ struct ext4_iloc
 {
        struct buffer_head *bh;
        unsigned long offset;
-       unsigned long block_group;
+       ext4_group_t block_group;
 };
 
 static inline struct ext4_inode *ext4_raw_inode(struct ext4_iloc *iloc)
@@ -835,7 +918,7 @@ struct dir_private_info {
 
 /* calculate the first block number of the group */
 static inline ext4_fsblk_t
-ext4_group_first_block_no(struct super_block *sb, unsigned long group_no)
+ext4_group_first_block_no(struct super_block *sb, ext4_group_t group_no)
 {
        return group_no * (ext4_fsblk_t)EXT4_BLOCKS_PER_GROUP(sb) +
                le32_to_cpu(EXT4_SB(sb)->s_es->s_first_data_block);
@@ -866,21 +949,24 @@ extern unsigned int ext4_block_group(struct super_block *sb,
                        ext4_fsblk_t blocknr);
 extern ext4_grpblk_t ext4_block_group_offset(struct super_block *sb,
                        ext4_fsblk_t blocknr);
-extern int ext4_bg_has_super(struct super_block *sb, int group);
-extern unsigned long ext4_bg_num_gdb(struct super_block *sb, int group);
+extern int ext4_bg_has_super(struct super_block *sb, ext4_group_t group);
+extern unsigned long ext4_bg_num_gdb(struct super_block *sb,
+                       ext4_group_t group);
 extern ext4_fsblk_t ext4_new_block (handle_t *handle, struct inode *inode,
                        ext4_fsblk_t goal, int *errp);
 extern ext4_fsblk_t ext4_new_blocks (handle_t *handle, struct inode *inode,
                        ext4_fsblk_t goal, unsigned long *count, int *errp);
+extern ext4_fsblk_t ext4_new_blocks_old(handle_t *handle, struct inode *inode,
+                       ext4_fsblk_t goal, unsigned long *count, int *errp);
 extern void ext4_free_blocks (handle_t *handle, struct inode *inode,
-                       ext4_fsblk_t block, unsigned long count);
+                       ext4_fsblk_t block, unsigned long count, int metadata);
 extern void ext4_free_blocks_sb (handle_t *handle, struct super_block *sb,
                                 ext4_fsblk_t block, unsigned long count,
                                unsigned long *pdquot_freed_blocks);
 extern ext4_fsblk_t ext4_count_free_blocks (struct super_block *);
 extern void ext4_check_blocks_bitmap (struct super_block *);
 extern struct ext4_group_desc * ext4_get_group_desc(struct super_block * sb,
-                                                   unsigned int block_group,
+                                                   ext4_group_t block_group,
                                                    struct buffer_head ** bh);
 extern int ext4_should_retry_alloc(struct super_block *sb, int *retries);
 extern void ext4_init_block_alloc_info(struct inode *);
@@ -911,15 +997,32 @@ extern unsigned long ext4_count_dirs (struct super_block *);
 extern void ext4_check_inodes_bitmap (struct super_block *);
 extern unsigned long ext4_count_free (struct buffer_head *, unsigned);
 
+/* mballoc.c */
+extern long ext4_mb_stats;
+extern long ext4_mb_max_to_scan;
+extern int ext4_mb_init(struct super_block *, int);
+extern int ext4_mb_release(struct super_block *);
+extern ext4_fsblk_t ext4_mb_new_blocks(handle_t *,
+                               struct ext4_allocation_request *, int *);
+extern int ext4_mb_reserve_blocks(struct super_block *, int);
+extern void ext4_mb_discard_inode_preallocations(struct inode *);
+extern int __init init_ext4_mballoc(void);
+extern void exit_ext4_mballoc(void);
+extern void ext4_mb_free_blocks(handle_t *, struct inode *,
+               unsigned long, unsigned long, int, unsigned long *);
+
 
 /* inode.c */
 int ext4_forget(handle_t *handle, int is_metadata, struct inode *inode,
                struct buffer_head *bh, ext4_fsblk_t blocknr);
-struct buffer_head * ext4_getblk (handle_t *, struct inode *, long, int, int *);
-struct buffer_head * ext4_bread (handle_t *, struct inode *, int, int, int *);
+struct buffer_head *ext4_getblk(handle_t *, struct inode *,
+                                               ext4_lblk_t, int, int *);
+struct buffer_head *ext4_bread(handle_t *, struct inode *,
+                                               ext4_lblk_t, int, int *);
 int ext4_get_blocks_handle(handle_t *handle, struct inode *inode,
-       sector_t iblock, unsigned long maxblocks, struct buffer_head *bh_result,
-       int create, int extend_disksize);
+                               ext4_lblk_t iblock, unsigned long maxblocks,
+                               struct buffer_head *bh_result,
+                               int create, int extend_disksize);
 
 extern void ext4_read_inode (struct inode *);
 extern int  ext4_write_inode (struct inode *, int);
@@ -943,6 +1046,9 @@ extern int ext4_ioctl (struct inode *, struct file *, unsigned int,
                       unsigned long);
 extern long ext4_compat_ioctl (struct file *, unsigned int, unsigned long);
 
+/* migrate.c */
+extern int ext4_ext_migrate(struct inode *, struct file *, unsigned int,
+                      unsigned long);
 /* namei.c */
 extern int ext4_orphan_add(handle_t *, struct inode *);
 extern int ext4_orphan_del(handle_t *, struct inode *);
@@ -965,6 +1071,12 @@ extern void ext4_abort (struct super_block *, const char *, const char *, ...)
 extern void ext4_warning (struct super_block *, const char *, const char *, ...)
        __attribute__ ((format (printf, 3, 4)));
 extern void ext4_update_dynamic_rev (struct super_block *sb);
+extern int ext4_update_compat_feature(handle_t *handle, struct super_block *sb,
+                                       __u32 compat);
+extern int ext4_update_rocompat_feature(handle_t *handle,
+                                       struct super_block *sb, __u32 rocompat);
+extern int ext4_update_incompat_feature(handle_t *handle,
+                                       struct super_block *sb, __u32 incompat);
 extern ext4_fsblk_t ext4_block_bitmap(struct super_block *sb,
                                      struct ext4_group_desc *bg);
 extern ext4_fsblk_t ext4_inode_bitmap(struct super_block *sb,
@@ -1017,6 +1129,29 @@ static inline void ext4_r_blocks_count_set(struct ext4_super_block *es,
        es->s_r_blocks_count_hi = cpu_to_le32(blk >> 32);
 }
 
+static inline loff_t ext4_isize(struct ext4_inode *raw_inode)
+{
+       return ((loff_t)le32_to_cpu(raw_inode->i_size_high) << 32) |
+               le32_to_cpu(raw_inode->i_size_lo);
+}
+
+static inline void ext4_isize_set(struct ext4_inode *raw_inode, loff_t i_size)
+{
+       raw_inode->i_size_lo = cpu_to_le32(i_size);
+       raw_inode->i_size_high = cpu_to_le32(i_size >> 32);
+}
+
+static inline
+struct ext4_group_info *ext4_get_group_info(struct super_block *sb,
+                                                       ext4_group_t group)
+{
+        struct ext4_group_info ***grp_info;
+        long indexv, indexh;
+        grp_info = EXT4_SB(sb)->s_group_info;
+        indexv = group >> (EXT4_DESC_PER_BLOCK_BITS(sb));
+        indexh = group & ((EXT4_DESC_PER_BLOCK(sb)) - 1);
+        return grp_info[indexv][indexh];
+}
 
 
 #define ext4_std_error(sb, errno)                              \
@@ -1048,7 +1183,7 @@ extern const struct inode_operations ext4_fast_symlink_inode_operations;
 extern int ext4_ext_tree_init(handle_t *handle, struct inode *);
 extern int ext4_ext_writepage_trans_blocks(struct inode *, int);
 extern int ext4_ext_get_blocks(handle_t *handle, struct inode *inode,
-                       ext4_fsblk_t iblock,
+                       ext4_lblk_t iblock,
                        unsigned long max_blocks, struct buffer_head *bh_result,
                        int create, int extend_disksize);
 extern void ext4_ext_truncate(struct inode *, struct page *);
@@ -1056,19 +1191,10 @@ extern void ext4_ext_init(struct super_block *);
 extern void ext4_ext_release(struct super_block *);
 extern long ext4_fallocate(struct inode *inode, int mode, loff_t offset,
                          loff_t len);
-static inline int
-ext4_get_blocks_wrap(handle_t *handle, struct inode *inode, sector_t block,
-                       unsigned long max_blocks, struct buffer_head *bh,
-                       int create, int extend_disksize)
-{
-       if (EXT4_I(inode)->i_flags & EXT4_EXTENTS_FL)
-               return ext4_ext_get_blocks(handle, inode, block, max_blocks,
-                                       bh, create, extend_disksize);
-       return ext4_get_blocks_handle(handle, inode, block, max_blocks, bh,
-                                       create, extend_disksize);
-}
-
-
+extern int ext4_get_blocks_wrap(handle_t *handle, struct inode *inode,
+                       sector_t block, unsigned long max_blocks,
+                       struct buffer_head *bh, int create,
+                       int extend_disksize);
 #endif /* __KERNEL__ */
 
 #endif /* _LINUX_EXT4_FS_H */
index d2045a26195d6ebf576b922dd050b3c1a433d60d..697da4bce6c513b003e9118d95e8bda34eeeb81a 100644 (file)
@@ -124,20 +124,6 @@ struct ext4_ext_path {
 #define EXT4_EXT_CACHE_GAP     1
 #define EXT4_EXT_CACHE_EXTENT  2
 
-/*
- * to be called by ext4_ext_walk_space()
- * negative retcode - error
- * positive retcode - signal for ext4_ext_walk_space(), see below
- * callback must return valid extent (passed or newly created)
- */
-typedef int (*ext_prepare_callback)(struct inode *, struct ext4_ext_path *,
-                                       struct ext4_ext_cache *,
-                                       void *);
-
-#define EXT_CONTINUE   0
-#define EXT_BREAK      1
-#define EXT_REPEAT     2
-
 
 #define EXT_MAX_BLOCK  0xffffffff
 
@@ -226,6 +212,8 @@ static inline int ext4_ext_get_actual_len(struct ext4_extent *ext)
                (le16_to_cpu(ext->ee_len) - EXT_INIT_MAX_LEN));
 }
 
+extern ext4_fsblk_t idx_pblock(struct ext4_extent_idx *);
+extern void ext4_ext_store_pblock(struct ext4_extent *, ext4_fsblk_t);
 extern int ext4_extent_tree_init(handle_t *, struct inode *);
 extern int ext4_ext_calc_credits_for_insert(struct inode *, struct ext4_ext_path *);
 extern int ext4_ext_try_to_merge(struct inode *inode,
@@ -233,8 +221,11 @@ extern int ext4_ext_try_to_merge(struct inode *inode,
                                 struct ext4_extent *);
 extern unsigned int ext4_ext_check_overlap(struct inode *, struct ext4_extent *, struct ext4_ext_path *);
 extern int ext4_ext_insert_extent(handle_t *, struct inode *, struct ext4_ext_path *, struct ext4_extent *);
-extern int ext4_ext_walk_space(struct inode *, unsigned long, unsigned long, ext_prepare_callback, void *);
-extern struct ext4_ext_path * ext4_ext_find_extent(struct inode *, int, struct ext4_ext_path *);
-
+extern struct ext4_ext_path *ext4_ext_find_extent(struct inode *, ext4_lblk_t,
+                                                       struct ext4_ext_path *);
+extern int ext4_ext_search_left(struct inode *, struct ext4_ext_path *,
+                                               ext4_lblk_t *, ext4_fsblk_t *);
+extern int ext4_ext_search_right(struct inode *, struct ext4_ext_path *,
+                                               ext4_lblk_t *, ext4_fsblk_t *);
 #endif /* _LINUX_EXT4_EXTENTS */
 
index 86ddfe2089f30ed1d2d14e7f5d976fde318b03f9..d5508d3cf29096db958efa7be8719d69fb155ac0 100644 (file)
@@ -27,6 +27,12 @@ typedef int ext4_grpblk_t;
 /* data type for filesystem-wide blocks number */
 typedef unsigned long long ext4_fsblk_t;
 
+/* data type for file logical block number */
+typedef __u32 ext4_lblk_t;
+
+/* data type for block group number */
+typedef unsigned long ext4_group_t;
+
 struct ext4_reserve_window {
        ext4_fsblk_t    _rsv_start;     /* First byte reserved */
        ext4_fsblk_t    _rsv_end;       /* Last byte reserved or 0 */
@@ -48,7 +54,7 @@ struct ext4_block_alloc_info {
         * most-recently-allocated block in this file.
         * We use this for detecting linearly ascending allocation requests.
         */
-       __u32 last_alloc_logical_block;
+       ext4_lblk_t last_alloc_logical_block;
        /*
         * Was i_next_alloc_goal in ext4_inode_info
         * is the *physical* companion to i_next_alloc_block.
@@ -67,7 +73,7 @@ struct ext4_block_alloc_info {
  */
 struct ext4_ext_cache {
        ext4_fsblk_t    ec_start;
-       __u32           ec_block;
+       ext4_lblk_t     ec_block;
        __u32           ec_len; /* must be 32bit to return holes */
        __u32           ec_type;
 };
@@ -79,7 +85,6 @@ struct ext4_inode_info {
        __le32  i_data[15];     /* unconverted */
        __u32   i_flags;
        ext4_fsblk_t    i_file_acl;
-       __u32   i_dir_acl;
        __u32   i_dtime;
 
        /*
@@ -89,13 +94,13 @@ struct ext4_inode_info {
         * place a file's data blocks near its inode block, and new inodes
         * near to their parent directory's inode.
         */
-       __u32   i_block_group;
+       ext4_group_t    i_block_group;
        __u32   i_state;                /* Dynamic state flags for ext4 */
 
        /* block reservation info */
        struct ext4_block_alloc_info *i_block_alloc_info;
 
-       __u32   i_dir_start_lookup;
+       ext4_lblk_t             i_dir_start_lookup;
 #ifdef CONFIG_EXT4DEV_FS_XATTR
        /*
         * Extended attributes can be read independently of the main file
@@ -134,16 +139,16 @@ struct ext4_inode_info {
        __u16 i_extra_isize;
 
        /*
-        * truncate_mutex is for serialising ext4_truncate() against
+        * i_data_sem is for serialising ext4_truncate() against
         * ext4_getblock().  In the 2.4 ext2 design, great chunks of inode's
         * data tree are chopped off during truncate. We can't do that in
         * ext4 because whenever we perform intermediate commits during
         * truncate, the inode and all the metadata blocks *must* be in a
         * consistent state which allows truncation of the orphans to restart
         * during recovery.  Hence we must fix the get_block-vs-truncate race
-        * by other means, so we have truncate_mutex.
+        * by other means, so we have i_data_sem.
         */
-       struct mutex truncate_mutex;
+       struct rw_semaphore i_data_sem;
        struct inode vfs_inode;
 
        unsigned long i_ext_generation;
@@ -153,6 +158,10 @@ struct ext4_inode_info {
         * struct timespec i_{a,c,m}time in the generic inode.
         */
        struct timespec i_crtime;
+
+       /* mballoc */
+       struct list_head i_prealloc_list;
+       spinlock_t i_prealloc_lock;
 };
 
 #endif /* _LINUX_EXT4_FS_I */
index b40e827cd4954f7ecead22be34f2259c8f81b915..abaae2c8cccf3d7bad74769998816f0b081e7d5d 100644 (file)
@@ -35,9 +35,10 @@ struct ext4_sb_info {
        unsigned long s_itb_per_group;  /* Number of inode table blocks per group */
        unsigned long s_gdb_count;      /* Number of group descriptor blocks */
        unsigned long s_desc_per_block; /* Number of group descriptors per block */
-       unsigned long s_groups_count;   /* Number of groups in the fs */
+       ext4_group_t s_groups_count;    /* Number of groups in the fs */
        unsigned long s_overhead_last;  /* Last calculated overhead */
        unsigned long s_blocks_last;    /* Last seen block count */
+       loff_t s_bitmap_maxbytes;       /* max bytes for bitmap files */
        struct buffer_head * s_sbh;     /* Buffer containing the super block */
        struct ext4_super_block * s_es; /* Pointer to the super block in the buffer */
        struct buffer_head ** s_group_desc;
@@ -90,6 +91,58 @@ struct ext4_sb_info {
        unsigned long s_ext_blocks;
        unsigned long s_ext_extents;
 #endif
+
+       /* for buddy allocator */
+       struct ext4_group_info ***s_group_info;
+       struct inode *s_buddy_cache;
+       long s_blocks_reserved;
+       spinlock_t s_reserve_lock;
+       struct list_head s_active_transaction;
+       struct list_head s_closed_transaction;
+       struct list_head s_committed_transaction;
+       spinlock_t s_md_lock;
+       tid_t s_last_transaction;
+       unsigned short *s_mb_offsets, *s_mb_maxs;
+
+       /* tunables */
+       unsigned long s_stripe;
+       unsigned long s_mb_stream_request;
+       unsigned long s_mb_max_to_scan;
+       unsigned long s_mb_min_to_scan;
+       unsigned long s_mb_stats;
+       unsigned long s_mb_order2_reqs;
+       unsigned long s_mb_group_prealloc;
+       /* where last allocation was done - for stream allocation */
+       unsigned long s_mb_last_group;
+       unsigned long s_mb_last_start;
+
+       /* history to debug policy */
+       struct ext4_mb_history *s_mb_history;
+       int s_mb_history_cur;
+       int s_mb_history_max;
+       int s_mb_history_num;
+       struct proc_dir_entry *s_mb_proc;
+       spinlock_t s_mb_history_lock;
+       int s_mb_history_filter;
+
+       /* stats for buddy allocator */
+       spinlock_t s_mb_pa_lock;
+       atomic_t s_bal_reqs;    /* number of reqs with len > 1 */
+       atomic_t s_bal_success; /* we found long enough chunks */
+       atomic_t s_bal_allocated;       /* in blocks */
+       atomic_t s_bal_ex_scanned;      /* total extents scanned */
+       atomic_t s_bal_goals;   /* goal hits */
+       atomic_t s_bal_breaks;  /* too long searches */
+       atomic_t s_bal_2orders; /* 2^order hits */
+       spinlock_t s_bal_lock;
+       unsigned long s_mb_buddies_generated;
+       unsigned long long s_mb_generation_time;
+       atomic_t s_mb_lost_chunks;
+       atomic_t s_mb_preallocated;
+       atomic_t s_mb_discarded;
+
+       /* locality groups */
+       struct ext4_locality_group *s_locality_groups;
 };
 
 #endif /* _LINUX_EXT4_FS_SB */
index 21398a5d688d3e188325822c781ae19e8c67d6b9..a516b6716870d01f2238929a463e83e698a396ac 100644 (file)
@@ -124,6 +124,7 @@ extern int dir_notify_enable;
 #define MS_SHARED      (1<<20) /* change to shared */
 #define MS_RELATIME    (1<<21) /* Update atime relative to mtime/ctime. */
 #define MS_KERNMOUNT   (1<<22) /* this is a kern_mount call */
+#define MS_I_VERSION   (1<<23) /* Update inode I_version field */
 #define MS_ACTIVE      (1<<30)
 #define MS_NOUSER      (1<<31)
 
@@ -173,6 +174,7 @@ extern int dir_notify_enable;
                                        ((inode)->i_flags & (S_SYNC|S_DIRSYNC)))
 #define IS_MANDLOCK(inode)     __IS_FLG(inode, MS_MANDLOCK)
 #define IS_NOATIME(inode)   __IS_FLG(inode, MS_RDONLY|MS_NOATIME)
+#define IS_I_VERSION(inode)   __IS_FLG(inode, MS_I_VERSION)
 
 #define IS_NOQUOTA(inode)      ((inode)->i_flags & S_NOQUOTA)
 #define IS_APPEND(inode)       ((inode)->i_flags & S_APPEND)
@@ -599,7 +601,7 @@ struct inode {
        uid_t                   i_uid;
        gid_t                   i_gid;
        dev_t                   i_rdev;
-       unsigned long           i_version;
+       u64                     i_version;
        loff_t                  i_size;
 #ifdef __NEED_I_SIZE_ORDERED
        seqcount_t              i_size_seqcount;
@@ -1394,6 +1396,21 @@ static inline void inode_dec_link_count(struct inode *inode)
        mark_inode_dirty(inode);
 }
 
+/**
+ * inode_inc_iversion - increments i_version
+ * @inode: inode that need to be updated
+ *
+ * Every time the inode is modified, the i_version field will be incremented.
+ * The filesystem has to be mounted with i_version flag
+ */
+
+static inline void inode_inc_iversion(struct inode *inode)
+{
+       spin_lock(&inode->i_lock);
+       inode->i_version++;
+       spin_unlock(&inode->i_lock);
+}
+
 extern void touch_atime(struct vfsmount *mnt, struct dentry *dentry);
 static inline void file_accessed(struct file *file)
 {
index 5141381a75279bbb0a6b15c729e93d015302088b..2efbda01674186d166671dbf2455063970c52bfa 100644 (file)
 
 /* These are for everybody (although not all archs will actually
    discard it in modules) */
-#define __init         __attribute__ ((__section__ (".init.text"))) __cold
-#define __initdata     __attribute__ ((__section__ (".init.data")))
-#define __exitdata     __attribute__ ((__section__(".exit.data")))
-#define __exit_call    __attribute_used__ __attribute__ ((__section__ (".exitcall.exit")))
+#define __init         __section(.init.text) __cold
+#define __initdata     __section(.init.data)
+#define __exitdata     __section(.exit.data)
+#define __exit_call    __used __section(.exitcall.exit)
 
 /* modpost check for section mismatches during the kernel build.
  * A section mismatch happens when there are references from a
  * when early init has completed so all such references are potential bugs.
  * For exit sections the same issue exists.
  * The following markers are used for the cases where the reference to
- * the init/exit section (code or data) is valid and will teach modpost
- * not to issue a warning.
+ * the *init / *exit section (code or data) is valid and will teach
+ * modpost not to issue a warning.
  * The markers follow same syntax rules as __init / __initdata. */
-#define __init_refok     noinline __attribute__ ((__section__ (".text.init.refok")))
-#define __initdata_refok          __attribute__ ((__section__ (".data.init.refok")))
-#define __exit_refok     noinline __attribute__ ((__section__ (".exit.text.refok")))
+#define __ref            __section(.ref.text) noinline
+#define __refdata        __section(.ref.data)
+#define __refconst       __section(.ref.rodata)
+
+/* backward compatibility note
+ *  A few places hardcode the old section names:
+ *  .text.init.refok
+ *  .data.init.refok
+ *  .exit.text.refok
+ *  They should be converted to use the defines from this file
+ */
+
+/* compatibility defines */
+#define __init_refok     __ref
+#define __initdata_refok __refdata
+#define __exit_refok     __ref
+
 
 #ifdef MODULE
-#define __exit         __attribute__ ((__section__(".exit.text"))) __cold
+#define __exitused
 #else
-#define __exit         __attribute_used__ __attribute__ ((__section__(".exit.text"))) __cold
+#define __exitused  __used
 #endif
 
+#define __exit          __section(.exit.text) __exitused __cold
+
+/* Used for HOTPLUG */
+#define __devinit        __section(.devinit.text) __cold
+#define __devinitdata    __section(.devinit.data)
+#define __devinitconst   __section(.devinit.rodata)
+#define __devexit        __section(.devexit.text) __exitused __cold
+#define __devexitdata    __section(.devexit.data)
+#define __devexitconst   __section(.devexit.rodata)
+
+/* Used for HOTPLUG_CPU */
+#define __cpuinit        __section(.cpuinit.text) __cold
+#define __cpuinitdata    __section(.cpuinit.data)
+#define __cpuinitconst   __section(.cpuinit.rodata)
+#define __cpuexit        __section(.cpuexit.text) __exitused __cold
+#define __cpuexitdata    __section(.cpuexit.data)
+#define __cpuexitconst   __section(.cpuexit.rodata)
+
+/* Used for MEMORY_HOTPLUG */
+#define __meminit        __section(.meminit.text) __cold
+#define __meminitdata    __section(.meminit.data)
+#define __meminitconst   __section(.meminit.rodata)
+#define __memexit        __section(.memexit.text) __exitused __cold
+#define __memexitdata    __section(.memexit.data)
+#define __memexitconst   __section(.memexit.rodata)
+
 /* For assembly routines */
 #define __INIT         .section        ".init.text","ax"
-#define __INIT_REFOK   .section        ".text.init.refok","ax"
 #define __FINIT                .previous
+
 #define __INITDATA     .section        ".init.data","aw"
-#define __INITDATA_REFOK .section      ".data.init.refok","aw"
+
+#define __DEVINIT        .section      ".devinit.text", "ax"
+#define __DEVINITDATA    .section      ".devinit.data", "aw"
+
+#define __CPUINIT        .section      ".cpuinit.text", "ax"
+#define __CPUINITDATA    .section      ".cpuinit.data", "aw"
+
+#define __MEMINIT        .section      ".meminit.text", "ax"
+#define __MEMINITDATA    .section      ".meminit.data", "aw"
+
+/* silence warnings when references are OK */
+#define __REF            .section       ".ref.text", "ax"
+#define __REFDATA        .section       ".ref.data", "aw"
+#define __REFCONST       .section       ".ref.rodata", "aw"
+/* backward compatibility */
+#define __INIT_REFOK     .section      __REF
+#define __INITDATA_REFOK .section      __REFDATA
 
 #ifndef __ASSEMBLY__
 /*
@@ -108,7 +164,7 @@ void prepare_namespace(void);
  */
 
 #define __define_initcall(level,fn,id) \
-       static initcall_t __initcall_##fn##id __attribute_used__ \
+       static initcall_t __initcall_##fn##id __used \
        __attribute__((__section__(".initcall" level ".init"))) = fn
 
 /*
@@ -142,11 +198,11 @@ void prepare_namespace(void);
 
 #define console_initcall(fn) \
        static initcall_t __initcall_##fn \
-       __attribute_used__ __attribute__((__section__(".con_initcall.init")))=fn
+       __used __section(.con_initcall.init) = fn
 
 #define security_initcall(fn) \
        static initcall_t __initcall_##fn \
-       __attribute_used__ __attribute__((__section__(".security_initcall.init"))) = fn
+       __used __section(.security_initcall.init) = fn
 
 struct obs_kernel_param {
        const char *str;
@@ -163,8 +219,7 @@ struct obs_kernel_param {
 #define __setup_param(str, unique_id, fn, early)                       \
        static char __setup_str_##unique_id[] __initdata __aligned(1) = str; \
        static struct obs_kernel_param __setup_##unique_id      \
-               __attribute_used__                              \
-               __attribute__((__section__(".init.setup")))     \
+               __used __section(.init.setup)                   \
                __attribute__((aligned((sizeof(long)))))        \
                = { __setup_str_##unique_id, fn, early }
 
@@ -242,7 +297,7 @@ void __init parse_early_param(void);
 #endif
 
 /* Data marked not to be saved by software suspend */
-#define __nosavedata __attribute__ ((__section__ (".data.nosave")))
+#define __nosavedata __section(.data.nosave)
 
 /* This means "can be init if no module support, otherwise module load
    may call it." */
@@ -254,43 +309,6 @@ void __init parse_early_param(void);
 #define __initdata_or_module __initdata
 #endif /*CONFIG_MODULES*/
 
-#ifdef CONFIG_HOTPLUG
-#define __devinit
-#define __devinitdata
-#define __devexit
-#define __devexitdata
-#else
-#define __devinit __init
-#define __devinitdata __initdata
-#define __devexit __exit
-#define __devexitdata __exitdata
-#endif
-
-#ifdef CONFIG_HOTPLUG_CPU
-#define __cpuinit
-#define __cpuinitdata
-#define __cpuexit
-#define __cpuexitdata
-#else
-#define __cpuinit      __init
-#define __cpuinitdata __initdata
-#define __cpuexit __exit
-#define __cpuexitdata  __exitdata
-#endif
-
-#if defined(CONFIG_MEMORY_HOTPLUG) || defined(CONFIG_ACPI_HOTPLUG_MEMORY) \
-       || defined(CONFIG_ACPI_HOTPLUG_MEMORY_MODULE)
-#define __meminit
-#define __meminitdata
-#define __memexit
-#define __memexitdata
-#else
-#define __meminit      __init
-#define __meminitdata __initdata
-#define __memexit __exit
-#define __memexitdata  __exitdata
-#endif
-
 /* Functions marked as __devexit may be discarded at kernel link time, depending
    on config options.  Newer versions of binutils detect references from
    retained sections to discarded sections and flag an error.  Pointers to
index 06ef114570518e338ead34f6fce5c00b0420fc7b..2cbf6fdb17993409c40c4cc67db62a633ec858db 100644 (file)
@@ -149,6 +149,28 @@ typedef struct journal_header_s
        __be32          h_sequence;
 } journal_header_t;
 
+/*
+ * Checksum types.
+ */
+#define JBD2_CRC32_CHKSUM   1
+#define JBD2_MD5_CHKSUM     2
+#define JBD2_SHA1_CHKSUM    3
+
+#define JBD2_CRC32_CHKSUM_SIZE 4
+
+#define JBD2_CHECKSUM_BYTES (32 / sizeof(u32))
+/*
+ * Commit block header for storing transactional checksums:
+ */
+struct commit_header {
+       __be32          h_magic;
+       __be32          h_blocktype;
+       __be32          h_sequence;
+       unsigned char   h_chksum_type;
+       unsigned char   h_chksum_size;
+       unsigned char   h_padding[2];
+       __be32          h_chksum[JBD2_CHECKSUM_BYTES];
+};
 
 /*
  * The block tag: used to describe a single buffer in the journal.
@@ -242,31 +264,25 @@ typedef struct journal_superblock_s
        ((j)->j_format_version >= 2 &&                                  \
         ((j)->j_superblock->s_feature_incompat & cpu_to_be32((mask))))
 
-#define JBD2_FEATURE_INCOMPAT_REVOKE   0x00000001
-#define JBD2_FEATURE_INCOMPAT_64BIT    0x00000002
+#define JBD2_FEATURE_COMPAT_CHECKSUM   0x00000001
+
+#define JBD2_FEATURE_INCOMPAT_REVOKE           0x00000001
+#define JBD2_FEATURE_INCOMPAT_64BIT            0x00000002
+#define JBD2_FEATURE_INCOMPAT_ASYNC_COMMIT     0x00000004
 
 /* Features known to this kernel version: */
-#define JBD2_KNOWN_COMPAT_FEATURES     0
+#define JBD2_KNOWN_COMPAT_FEATURES     JBD2_FEATURE_COMPAT_CHECKSUM
 #define JBD2_KNOWN_ROCOMPAT_FEATURES   0
 #define JBD2_KNOWN_INCOMPAT_FEATURES   (JBD2_FEATURE_INCOMPAT_REVOKE | \
-                                        JBD2_FEATURE_INCOMPAT_64BIT)
+                                       JBD2_FEATURE_INCOMPAT_64BIT | \
+                                       JBD2_FEATURE_INCOMPAT_ASYNC_COMMIT)
 
 #ifdef __KERNEL__
 
 #include <linux/fs.h>
 #include <linux/sched.h>
 
-#define JBD2_ASSERTIONS
-#ifdef JBD2_ASSERTIONS
-#define J_ASSERT(assert)                                               \
-do {                                                                   \
-       if (!(assert)) {                                                \
-               printk (KERN_EMERG                                      \
-                       "Assertion failure in %s() at %s:%d: \"%s\"\n", \
-                       __FUNCTION__, __FILE__, __LINE__, # assert);    \
-               BUG();                                                  \
-       }                                                               \
-} while (0)
+#define J_ASSERT(assert)       BUG_ON(!(assert))
 
 #if defined(CONFIG_BUFFER_DEBUG)
 void buffer_assertion_failure(struct buffer_head *bh);
@@ -282,10 +298,6 @@ void buffer_assertion_failure(struct buffer_head *bh);
 #define J_ASSERT_JH(jh, expr)  J_ASSERT(expr)
 #endif
 
-#else
-#define J_ASSERT(assert)       do { } while (0)
-#endif         /* JBD2_ASSERTIONS */
-
 #if defined(JBD2_PARANOID_IOFAIL)
 #define J_EXPECT(expr, why...)         J_ASSERT(expr)
 #define J_EXPECT_BH(bh, expr, why...)  J_ASSERT_BH(bh, expr)
@@ -406,9 +418,23 @@ struct handle_s
        unsigned int    h_sync:         1;      /* sync-on-close */
        unsigned int    h_jdata:        1;      /* force data journaling */
        unsigned int    h_aborted:      1;      /* fatal error on handle */
+
+#ifdef CONFIG_DEBUG_LOCK_ALLOC
+       struct lockdep_map      h_lockdep_map;
+#endif
 };
 
 
+/*
+ * Some stats for checkpoint phase
+ */
+struct transaction_chp_stats_s {
+       unsigned long           cs_chp_time;
+       unsigned long           cs_forced_to_close;
+       unsigned long           cs_written;
+       unsigned long           cs_dropped;
+};
+
 /* The transaction_t type is the guts of the journaling mechanism.  It
  * tracks a compound transaction through its various states:
  *
@@ -456,6 +482,8 @@ struct transaction_s
        /*
         * Transaction's current state
         * [no locking - only kjournald2 alters this]
+        * [j_list_lock] guards transition of a transaction into T_FINISHED
+        * state and subsequent call of __jbd2_journal_drop_transaction()
         * FIXME: needs barriers
         * KLUDGE: [use j_state_lock]
         */
@@ -543,6 +571,21 @@ struct transaction_s
         */
        spinlock_t              t_handle_lock;
 
+       /*
+        * Longest time some handle had to wait for running transaction
+        */
+       unsigned long           t_max_wait;
+
+       /*
+        * When transaction started
+        */
+       unsigned long           t_start;
+
+       /*
+        * Checkpointing stats [j_checkpoint_sem]
+        */
+       struct transaction_chp_stats_s t_chp_stats;
+
        /*
         * Number of outstanding updates running on this transaction
         * [t_handle_lock]
@@ -574,6 +617,39 @@ struct transaction_s
 
 };
 
+struct transaction_run_stats_s {
+       unsigned long           rs_wait;
+       unsigned long           rs_running;
+       unsigned long           rs_locked;
+       unsigned long           rs_flushing;
+       unsigned long           rs_logging;
+
+       unsigned long           rs_handle_count;
+       unsigned long           rs_blocks;
+       unsigned long           rs_blocks_logged;
+};
+
+struct transaction_stats_s {
+       int                     ts_type;
+       unsigned long           ts_tid;
+       union {
+               struct transaction_run_stats_s run;
+               struct transaction_chp_stats_s chp;
+       } u;
+};
+
+#define JBD2_STATS_RUN         1
+#define JBD2_STATS_CHECKPOINT  2
+
+static inline unsigned long
+jbd2_time_diff(unsigned long start, unsigned long end)
+{
+       if (end >= start)
+               return end - start;
+
+       return end + (MAX_JIFFY_OFFSET - start);
+}
+
 /**
  * struct journal_s - The journal_s type is the concrete type associated with
  *     journal_t.
@@ -635,6 +711,12 @@ struct transaction_s
  * @j_wbufsize: maximum number of buffer_heads allowed in j_wbuf, the
  *     number that will fit in j_blocksize
  * @j_last_sync_writer: most recent pid which did a synchronous write
+ * @j_history: Buffer storing the transactions statistics history
+ * @j_history_max: Maximum number of transactions in the statistics history
+ * @j_history_cur: Current number of transactions in the statistics history
+ * @j_history_lock: Protect the transactions statistics history
+ * @j_proc_entry: procfs entry for the jbd statistics directory
+ * @j_stats: Overall statistics
  * @j_private: An opaque pointer to fs-private information.
  */
 
@@ -826,6 +908,19 @@ struct journal_s
 
        pid_t                   j_last_sync_writer;
 
+       /*
+        * Journal statistics
+        */
+       struct transaction_stats_s *j_history;
+       int                     j_history_max;
+       int                     j_history_cur;
+       /*
+        * Protect the transactions statistics history
+        */
+       spinlock_t              j_history_lock;
+       struct proc_dir_entry   *j_proc_entry;
+       struct transaction_stats_s j_stats;
+
        /*
         * An opaque pointer to fs-private information.  ext3 puts its
         * superblock pointer here
@@ -932,6 +1027,8 @@ extern int    jbd2_journal_check_available_features
                   (journal_t *, unsigned long, unsigned long, unsigned long);
 extern int        jbd2_journal_set_features
                   (journal_t *, unsigned long, unsigned long, unsigned long);
+extern void       jbd2_journal_clear_features
+                  (journal_t *, unsigned long, unsigned long, unsigned long);
 extern int        jbd2_journal_create     (journal_t *);
 extern int        jbd2_journal_load       (journal_t *journal);
 extern void       jbd2_journal_destroy    (journal_t *);
index c97bdb7eb957cdd71e625ac9cf66862132d153da..ac481e2094fd01b79e12279a0b631092b7b6f8e4 100644 (file)
@@ -178,7 +178,7 @@ void *__symbol_get_gpl(const char *symbol);
 #define __CRC_SYMBOL(sym, sec)                                 \
        extern void *__crc_##sym __attribute__((weak));         \
        static const unsigned long __kcrctab_##sym              \
-       __attribute_used__                                      \
+       __used                                                  \
        __attribute__((section("__kcrctab" sec), unused))       \
        = (unsigned long) &__crc_##sym;
 #else
@@ -193,7 +193,7 @@ void *__symbol_get_gpl(const char *symbol);
        __attribute__((section("__ksymtab_strings")))           \
        = MODULE_SYMBOL_PREFIX #sym;                            \
        static const struct kernel_symbol __ksymtab_##sym       \
-       __attribute_used__                                      \
+       __used                                                  \
        __attribute__((section("__ksymtab" sec), unused))       \
        = { (unsigned long)&sym, __kstrtab_##sym }
 
@@ -446,11 +446,14 @@ static inline void __module_get(struct module *module)
        __mod ? __mod->name : "kernel";         \
 })
 
-/* For kallsyms to ask for address resolution.  NULL means not found. */
-const char *module_address_lookup(unsigned long addr,
-                                 unsigned long *symbolsize,
-                                 unsigned long *offset,
-                                 char **modname);
+/* For kallsyms to ask for address resolution.  namebuf should be at
+ * least KSYM_NAME_LEN long: a pointer to namebuf is returned if
+ * found, otherwise NULL. */
+char *module_address_lookup(unsigned long addr,
+                           unsigned long *symbolsize,
+                           unsigned long *offset,
+                           char **modname,
+                           char *namebuf);
 int lookup_module_symbol_name(unsigned long addr, char *symname);
 int lookup_module_symbol_attrs(unsigned long addr, unsigned long *size, unsigned long *offset, char *modname, char *name);
 
@@ -516,10 +519,11 @@ static inline void module_put(struct module *module)
 #define module_name(mod) "kernel"
 
 /* For kallsyms to ask for address resolution.  NULL means not found. */
-static inline const char *module_address_lookup(unsigned long addr,
-                                               unsigned long *symbolsize,
-                                               unsigned long *offset,
-                                               char **modname)
+static inline char *module_address_lookup(unsigned long addr,
+                                         unsigned long *symbolsize,
+                                         unsigned long *offset,
+                                         char **modname,
+                                         char *namebuf)
 {
        return NULL;
 }
index 13410b20600f4a33284da6252cc7beeb99c9f105..8126e55c5bdcbd3c68d4c6ffc4c83797e3b1ca7c 100644 (file)
@@ -18,7 +18,7 @@
 #define __module_cat(a,b) ___module_cat(a,b)
 #define __MODULE_INFO(tag, name, info)                                   \
 static const char __module_cat(name,__LINE__)[]                                  \
-  __attribute_used__                                                     \
+  __used                                                                 \
   __attribute__((section(".modinfo"),unused)) = __stringify(tag) "=" info
 #else  /* !MODULE */
 #define __MODULE_INFO(tag, name, info)
@@ -72,7 +72,7 @@ struct kparam_array
        BUILD_BUG_ON_ZERO((perm) < 0 || (perm) > 0777 || ((perm) & 2)); \
        static const char __param_str_##name[] = prefix #name;          \
        static struct kernel_param const __param_##name                 \
-       __attribute_used__                                              \
+       __used                                                          \
     __attribute__ ((unused,__section__ ("__param"),aligned(sizeof(void *)))) \
        = { __param_str_##name, perm, set, get, { arg } }
 
index 0dd93bb62fbe9f4af33c458f24db96ca79d8c0c2..ae1006322f808d90b233188108efac1f59f23359 100644 (file)
@@ -867,7 +867,7 @@ enum pci_fixup_pass {
 
 /* Anonymous variables would be nice... */
 #define DECLARE_PCI_FIXUP_SECTION(section, name, vendor, device, hook) \
-       static const struct pci_fixup __pci_fixup_##name __attribute_used__ \
+       static const struct pci_fixup __pci_fixup_##name __used         \
        __attribute__((__section__(#section))) = { vendor, device, hook };
 #define DECLARE_PCI_FIXUP_EARLY(vendor, device, hook)                  \
        DECLARE_PCI_FIXUP_SECTION(.pci_fixup_early,                     \
index 288444b4cd8ae8a7e1b74614b9fd4745c635de7a..0d0bbf218f1f363d2c4732fe7cde4d9ac0909529 100644 (file)
@@ -1,3 +1,11 @@
+config ARCH
+       string
+       option env="ARCH"
+
+config KERNELVERSION
+       string
+       option env="KERNELVERSION"
+
 config DEFCONFIG_LIST
        string
        depends on !UML
index 7fe2628553172eadc0d49a3cbaf9d779b80b5a7c..a26cb2e170237d849c9f87038a6def48ee948dd6 100644 (file)
@@ -46,7 +46,8 @@ int core_kernel_text(unsigned long addr)
            addr <= (unsigned long)_etext)
                return 1;
 
-       if (addr >= (unsigned long)_sinittext &&
+       if (system_state == SYSTEM_BOOTING &&
+           addr >= (unsigned long)_sinittext &&
            addr <= (unsigned long)_einittext)
                return 1;
        return 0;
index 2fc25810509e187441297e4d2b8873e3d2db8ea3..7dadc71ce5162926529172e5a9ff71b86590a4a4 100644 (file)
@@ -233,10 +233,11 @@ static unsigned long get_symbol_pos(unsigned long addr,
 int kallsyms_lookup_size_offset(unsigned long addr, unsigned long *symbolsize,
                                unsigned long *offset)
 {
+       char namebuf[KSYM_NAME_LEN];
        if (is_ksym_addr(addr))
                return !!get_symbol_pos(addr, symbolsize, offset);
 
-       return !!module_address_lookup(addr, symbolsize, offset, NULL);
+       return !!module_address_lookup(addr, symbolsize, offset, NULL, namebuf);
 }
 
 /*
@@ -251,8 +252,6 @@ const char *kallsyms_lookup(unsigned long addr,
                            unsigned long *offset,
                            char **modname, char *namebuf)
 {
-       const char *msym;
-
        namebuf[KSYM_NAME_LEN - 1] = 0;
        namebuf[0] = 0;
 
@@ -268,10 +267,8 @@ const char *kallsyms_lookup(unsigned long addr,
        }
 
        /* see if it's in a module */
-       msym = module_address_lookup(addr, symbolsize, offset, modname);
-       if (msym)
-               return strncpy(namebuf, msym, KSYM_NAME_LEN - 1);
-
+       return module_address_lookup(addr, symbolsize, offset, modname,
+                                    namebuf);
        return NULL;
 }
 
index 1bb4c5e0d56e330b1de92b5e20fddbd446e9d314..f6a4e721fd4907339dfbbca63d5081244697973f 100644 (file)
@@ -65,6 +65,9 @@
 static DEFINE_MUTEX(module_mutex);
 static LIST_HEAD(modules);
 
+/* Waiting for a module to finish initializing? */
+static DECLARE_WAIT_QUEUE_HEAD(module_wq);
+
 static BLOCKING_NOTIFIER_HEAD(module_notify_list);
 
 int register_module_notifier(struct notifier_block * nb)
@@ -84,8 +87,11 @@ EXPORT_SYMBOL(unregister_module_notifier);
 static inline int strong_try_module_get(struct module *mod)
 {
        if (mod && mod->state == MODULE_STATE_COMING)
+               return -EBUSY;
+       if (try_module_get(mod))
                return 0;
-       return try_module_get(mod);
+       else
+               return -ENOENT;
 }
 
 static inline void add_taint_module(struct module *mod, unsigned flag)
@@ -539,11 +545,21 @@ static int already_uses(struct module *a, struct module *b)
 static int use_module(struct module *a, struct module *b)
 {
        struct module_use *use;
-       int no_warn;
+       int no_warn, err;
 
        if (b == NULL || already_uses(a, b)) return 1;
 
-       if (!strong_try_module_get(b))
+       /* If we're interrupted or time out, we fail. */
+       if (wait_event_interruptible_timeout(
+                   module_wq, (err = strong_try_module_get(b)) != -EBUSY,
+                   30 * HZ) <= 0) {
+               printk("%s: gave up waiting for init of module %s.\n",
+                      a->name, b->name);
+               return 0;
+       }
+
+       /* If strong_try_module_get() returned a different error, we fail. */
+       if (err)
                return 0;
 
        DEBUGP("Allocating new usage for %s.\n", a->name);
@@ -722,7 +738,7 @@ sys_delete_module(const char __user *name_user, unsigned int flags)
                mutex_lock(&module_mutex);
        }
        /* Store the name of the last unloaded module for diagnostic purposes */
-       sprintf(last_unloaded_module, mod->name);
+       strlcpy(last_unloaded_module, mod->name, sizeof(last_unloaded_module));
        free_module(mod);
 
  out:
@@ -816,7 +832,7 @@ static inline void module_unload_free(struct module *mod)
 
 static inline int use_module(struct module *a, struct module *b)
 {
-       return strong_try_module_get(b);
+       return strong_try_module_get(b) == 0;
 }
 
 static inline void module_unload_init(struct module *mod)
@@ -1214,6 +1230,7 @@ void module_remove_modinfo_attrs(struct module *mod)
 int mod_sysfs_init(struct module *mod)
 {
        int err;
+       struct kobject *kobj;
 
        if (!module_sysfs_initialized) {
                printk(KERN_ERR "%s: module sysfs not initialized\n",
@@ -1221,6 +1238,15 @@ int mod_sysfs_init(struct module *mod)
                err = -EINVAL;
                goto out;
        }
+
+       kobj = kset_find_obj(module_kset, mod->name);
+       if (kobj) {
+               printk(KERN_ERR "%s: module is already loaded\n", mod->name);
+               kobject_put(kobj);
+               err = -EINVAL;
+               goto out;
+       }
+
        mod->mkobj.mod = mod;
 
        memset(&mod->mkobj.kobj, 0, sizeof(mod->mkobj.kobj));
@@ -1277,6 +1303,17 @@ static void mod_kobject_remove(struct module *mod)
        kobject_put(&mod->mkobj.kobj);
 }
 
+/*
+ * link the module with the whole machine is stopped with interrupts off
+ * - this defends against kallsyms not taking locks
+ */
+static int __link_module(void *_mod)
+{
+       struct module *mod = _mod;
+       list_add(&mod->list, &modules);
+       return 0;
+}
+
 /*
  * unlink the module with the whole machine is stopped with interrupts off
  * - this defends against kallsyms not taking locks
@@ -1326,7 +1363,7 @@ void *__symbol_get(const char *symbol)
 
        preempt_disable();
        value = __find_symbol(symbol, &owner, &crc, 1);
-       if (value && !strong_try_module_get(owner))
+       if (value && strong_try_module_get(owner) != 0)
                value = 0;
        preempt_enable();
 
@@ -1889,7 +1926,7 @@ static struct module *load_module(void __user *umod,
        set_license(mod, get_modinfo(sechdrs, infoindex, "license"));
 
        if (strcmp(mod->name, "ndiswrapper") == 0)
-               add_taint(TAINT_PROPRIETARY_MODULE);
+               add_taint_module(mod, TAINT_PROPRIETARY_MODULE);
        if (strcmp(mod->name, "driverloader") == 0)
                add_taint_module(mod, TAINT_PROPRIETARY_MODULE);
 
@@ -2019,6 +2056,11 @@ static struct module *load_module(void __user *umod,
                printk(KERN_WARNING "%s: Ignoring obsolete parameters\n",
                       mod->name);
 
+       /* Now sew it into the lists so we can get lockdep and oops
+         * info during argument parsing.  Noone should access us, since
+         * strong_try_module_get() will fail. */
+       stop_machine_run(__link_module, mod, NR_CPUS);
+
        /* Size of section 0 is 0, so this works well if no params */
        err = parse_args(mod->name, mod->args,
                         (struct kernel_param *)
@@ -2027,7 +2069,7 @@ static struct module *load_module(void __user *umod,
                         / sizeof(struct kernel_param),
                         NULL);
        if (err < 0)
-               goto arch_cleanup;
+               goto unlink;
 
        err = mod_sysfs_setup(mod,
                              (struct kernel_param *)
@@ -2035,7 +2077,7 @@ static struct module *load_module(void __user *umod,
                              sechdrs[setupindex].sh_size
                              / sizeof(struct kernel_param));
        if (err < 0)
-               goto arch_cleanup;
+               goto unlink;
        add_sect_attrs(mod, hdr->e_shnum, secstrings, sechdrs);
        add_notes_attrs(mod, hdr->e_shnum, secstrings, sechdrs);
 
@@ -2050,7 +2092,8 @@ static struct module *load_module(void __user *umod,
        /* Done! */
        return mod;
 
- arch_cleanup:
+ unlink:
+       stop_machine_run(__unlink_module, mod, NR_CPUS);
        module_arch_cleanup(mod);
  cleanup:
        kobject_del(&mod->mkobj.kobj);
@@ -2075,17 +2118,6 @@ static struct module *load_module(void __user *umod,
        goto free_hdr;
 }
 
-/*
- * link the module with the whole machine is stopped with interrupts off
- * - this defends against kallsyms not taking locks
- */
-static int __link_module(void *_mod)
-{
-       struct module *mod = _mod;
-       list_add(&mod->list, &modules);
-       return 0;
-}
-
 /* This is where the real work happens */
 asmlinkage long
 sys_init_module(void __user *umod,
@@ -2110,10 +2142,6 @@ sys_init_module(void __user *umod,
                return PTR_ERR(mod);
        }
 
-       /* Now sew it into the lists.  They won't access us, since
-           strong_try_module_get() will fail. */
-       stop_machine_run(__link_module, mod, NR_CPUS);
-
        /* Drop lock so they can recurse */
        mutex_unlock(&module_mutex);
 
@@ -2132,6 +2160,7 @@ sys_init_module(void __user *umod,
                mutex_lock(&module_mutex);
                free_module(mod);
                mutex_unlock(&module_mutex);
+               wake_up(&module_wq);
                return ret;
        }
 
@@ -2146,6 +2175,7 @@ sys_init_module(void __user *umod,
        mod->init_size = 0;
        mod->init_text_size = 0;
        mutex_unlock(&module_mutex);
+       wake_up(&module_wq);
 
        return 0;
 }
@@ -2210,14 +2240,13 @@ static const char *get_ksymbol(struct module *mod,
        return mod->strtab + mod->symtab[best].st_name;
 }
 
-/* For kallsyms to ask for address resolution.  NULL means not found.
-   We don't lock, as this is used for oops resolution and races are a
-   lesser concern. */
-/* FIXME: Risky: returns a pointer into a module w/o lock */
-const char *module_address_lookup(unsigned long addr,
-                                 unsigned long *size,
-                                 unsigned long *offset,
-                                 char **modname)
+/* For kallsyms to ask for address resolution.  NULL means not found.  Careful
+ * not to lock to avoid deadlock on oopses, simply disable preemption. */
+char *module_address_lookup(unsigned long addr,
+                           unsigned long *size,
+                           unsigned long *offset,
+                           char **modname,
+                           char *namebuf)
 {
        struct module *mod;
        const char *ret = NULL;
@@ -2232,8 +2261,13 @@ const char *module_address_lookup(unsigned long addr,
                        break;
                }
        }
+       /* Make a copy in here where it's safe */
+       if (ret) {
+               strncpy(namebuf, ret, KSYM_NAME_LEN - 1);
+               ret = namebuf;
+       }
        preempt_enable();
-       return ret;
+       return (char *)ret;
 }
 
 int lookup_module_symbol_name(unsigned long addr, char *symname)
index 67f65ee7211de8fa79311ca4141fdd96b9426765..42fe5e6126c0948e5b6859f5fcc75ebb71e2b039 100644 (file)
@@ -376,8 +376,6 @@ int param_get_string(char *buffer, struct kernel_param *kp)
 
 extern struct kernel_param __start___param[], __stop___param[];
 
-#define MAX_KBUILD_MODNAME KOBJ_NAME_LEN
-
 struct param_attribute
 {
        struct module_attribute mattr;
@@ -587,7 +585,7 @@ static void __init param_sysfs_builtin(void)
 {
        struct kernel_param *kp, *kp_begin = NULL;
        unsigned int i, name_len, count = 0;
-       char modname[MAX_KBUILD_MODNAME + 1] = "";
+       char modname[MODULE_NAME_LEN + 1] = "";
 
        for (i=0; i < __stop___param - __start___param; i++) {
                char *dot;
@@ -595,12 +593,12 @@ static void __init param_sysfs_builtin(void)
 
                kp = &__start___param[i];
                max_name_len =
-                       min_t(size_t, MAX_KBUILD_MODNAME, strlen(kp->name));
+                       min_t(size_t, MODULE_NAME_LEN, strlen(kp->name));
 
                dot = memchr(kp->name, '.', max_name_len);
                if (!dot) {
                        DEBUGP("couldn't find period in first %d characters "
-                              "of %s\n", MAX_KBUILD_MODNAME, kp->name);
+                              "of %s\n", MODULE_NAME_LEN, kp->name);
                        continue;
                }
                name_len = dot - kp->name;
index 14fb355e3caacc1311adfcaf52435e974c715dce..c4ecb2994ba3855110cdb596b6500a99b2404b24 100644 (file)
@@ -79,6 +79,38 @@ config HEADERS_CHECK
          exported to $(INSTALL_HDR_PATH) (usually 'usr/include' in
          your build tree), to make sure they're suitable.
 
+config DEBUG_SECTION_MISMATCH
+       bool "Enable full Section mismatch analysis"
+       default n
+       help
+         The section mismatch analysis checks if there are illegal
+         references from one section to another section.
+         Linux will during link or during runtime drop some sections
+         and any use of code/data previously in these sections will
+         most likely result in an oops.
+         In the code functions and variables are annotated with
+         __init, __devinit etc. (see full list in include/linux/init.h)
+         which result in the code/data being placed in specific sections.
+         The section mismatch anaylsis are always done after a full
+         kernel build but enabling this options will in addition
+         do the following:
+         - Add the option -fno-inline-functions-called-once to gcc
+           When inlining a function annotated __init in a non-init
+           function we would loose the section information and thus
+           the analysis would not catch the illegal reference.
+           This options tell gcc to inline less but will also
+           result in a larger kernel.
+         - Run the section mismatch analysis for each module/built-in.o
+           When we run the section mismatch analysis on vmlinux.o we
+           looses valueable information about where the mismatch was
+           introduced.
+           Running the analysis for each module/built-in.o file
+           will tell where the mismatch happens much closer to the
+           source. The drawback is that we will report the same
+           mismatch at least twice.
+         - Enable verbose reporting from modpost to help solving
+           the section mismatches reported.
+
 config DEBUG_KERNEL
        bool "Kernel debugging"
        help
index bda0d71a25144c110df24920b6320470e9c213a8..78ccd73a884188f9e9348df1e4ad6e0e527e24c0 100644 (file)
@@ -178,4 +178,47 @@ found_middle_swap:
 
 EXPORT_SYMBOL(generic_find_next_zero_le_bit);
 
+unsigned long generic_find_next_le_bit(const unsigned long *addr, unsigned
+               long size, unsigned long offset)
+{
+       const unsigned long *p = addr + BITOP_WORD(offset);
+       unsigned long result = offset & ~(BITS_PER_LONG - 1);
+       unsigned long tmp;
+
+       if (offset >= size)
+               return size;
+       size -= result;
+       offset &= (BITS_PER_LONG - 1UL);
+       if (offset) {
+               tmp = ext2_swabp(p++);
+               tmp &= (~0UL << offset);
+               if (size < BITS_PER_LONG)
+                       goto found_first;
+               if (tmp)
+                       goto found_middle;
+               size -= BITS_PER_LONG;
+               result += BITS_PER_LONG;
+       }
+
+       while (size & ~(BITS_PER_LONG - 1)) {
+               tmp = *(p++);
+               if (tmp)
+                       goto found_middle_swap;
+               result += BITS_PER_LONG;
+               size -= BITS_PER_LONG;
+       }
+       if (!size)
+               return result;
+       tmp = ext2_swabp(p);
+found_first:
+       tmp &= (~0UL >> (BITS_PER_LONG - size));
+       if (tmp == 0UL)         /* Are any bits set? */
+               return result + size; /* Nope. */
+found_middle:
+       return result + __ffs(tmp);
+
+found_middle_swap:
+       return result + __ffs(ext2_swab(tmp));
+}
+EXPORT_SYMBOL(generic_find_next_le_bit);
 #endif /* __BIG_ENDIAN */
index de9836eee8bbba866312ce4e5c22f05caef98486..67fb4530a6ff88afb7e3a86945dc2837c87c72f5 100644 (file)
@@ -83,10 +83,12 @@ ifneq ($(strip $(obj-y) $(obj-m) $(obj-n) $(obj-) $(lib-target)),)
 builtin-target := $(obj)/built-in.o
 endif
 
+modorder-target := $(obj)/modules.order
+
 # We keep a list of all modules in $(MODVERDIR)
 
 __build: $(if $(KBUILD_BUILTIN),$(builtin-target) $(lib-target) $(extra-y)) \
-        $(if $(KBUILD_MODULES),$(obj-m)) \
+        $(if $(KBUILD_MODULES),$(obj-m) $(modorder-target)) \
         $(subdir-ym) $(always)
        @:
 
@@ -101,6 +103,10 @@ ifneq ($(KBUILD_CHECKSRC),0)
   endif
 endif
 
+# Do section mismatch analysis for each module/built-in.o
+ifdef CONFIG_DEBUG_SECTION_MISMATCH
+  cmd_secanalysis = ; scripts/mod/modpost $@
+endif
 
 # Compile C sources (.c)
 # ---------------------------------------------------------------------------
@@ -266,7 +272,8 @@ ifdef builtin-target
 quiet_cmd_link_o_target = LD      $@
 # If the list of objects to link is empty, just create an empty built-in.o
 cmd_link_o_target = $(if $(strip $(obj-y)),\
-                     $(LD) $(ld_flags) -r -o $@ $(filter $(obj-y), $^),\
+                     $(LD) $(ld_flags) -r -o $@ $(filter $(obj-y), $^) \
+                     $(cmd_secanalysis),\
                      rm -f $@; $(AR) rcs $@)
 
 $(builtin-target): $(obj-y) FORCE
@@ -275,6 +282,19 @@ $(builtin-target): $(obj-y) FORCE
 targets += $(builtin-target)
 endif # builtin-target
 
+#
+# Rule to create modules.order file
+#
+# Create commands to either record .ko file or cat modules.order from
+# a subdirectory
+modorder-cmds =                                                \
+       $(foreach m, $(modorder),                       \
+               $(if $(filter %/modules.order, $m),     \
+                       cat $m;, echo kernel/$m;))
+
+$(modorder-target): $(subdir-ym) FORCE
+       $(Q)(cat /dev/null; $(modorder-cmds)) > $@
+
 #
 # Rule to compile a set of .o files into one .a file
 #
@@ -301,7 +321,7 @@ $($(subst $(obj)/,,$(@:.o=-objs)))    \
 $($(subst $(obj)/,,$(@:.o=-y)))), $^)
  
 quiet_cmd_link_multi-y = LD      $@
-cmd_link_multi-y = $(LD) $(ld_flags) -r -o $@ $(link_multi_deps)
+cmd_link_multi-y = $(LD) $(ld_flags) -r -o $@ $(link_multi_deps) $(cmd_secanalysis)
 
 quiet_cmd_link_multi-m = LD [M]  $@
 cmd_link_multi-m = $(cmd_link_multi-y)
index 3c5e88bfecf1bc3d00e2de16fc4c31720406a8fe..8e440233c27dacf8e3083417047d1f8024f25131 100644 (file)
@@ -25,6 +25,11 @@ lib-y := $(filter-out $(obj-y), $(sort $(lib-y) $(lib-m)))
 # o if we encounter foo/ in $(obj-m), remove it from $(obj-m) 
 #   and add the directory to the list of dirs to descend into: $(subdir-m)
 
+# Determine modorder.
+# Unfortunately, we don't have information about ordering between -y
+# and -m subdirs.  Just put -y's first.
+modorder       := $(patsubst %/,%/modules.order, $(filter %/, $(obj-y)) $(obj-m:.o=.ko))
+
 __subdir-y     := $(patsubst %/,%,$(filter %/, $(obj-y)))
 subdir-y       += $(__subdir-y)
 __subdir-m     := $(patsubst %/,%,$(filter %/, $(obj-m)))
@@ -64,6 +69,7 @@ real-objs-m := $(foreach m, $(obj-m), $(if $(strip $($(m:.o=-objs)) $($(m:.o=-y)
 extra-y                := $(addprefix $(obj)/,$(extra-y))
 always         := $(addprefix $(obj)/,$(always))
 targets                := $(addprefix $(obj)/,$(targets))
+modorder       := $(addprefix $(obj)/,$(modorder))
 obj-y          := $(addprefix $(obj)/,$(obj-y))
 obj-m          := $(addprefix $(obj)/,$(obj-m))
 lib-y          := $(addprefix $(obj)/,$(lib-y))
index f0ff248f5e6f55e07d71b66104cb794e7310d959..efa5d940e6324caa0aa120343043b89475502e74 100644 (file)
@@ -21,7 +21,7 @@ quiet_cmd_modules_install = INSTALL $@
 
 # Modules built outside the kernel source tree go into extra by default
 INSTALL_MOD_DIR ?= extra
-ext-mod-dir = $(INSTALL_MOD_DIR)$(subst $(KBUILD_EXTMOD),,$(@D))
+ext-mod-dir = $(INSTALL_MOD_DIR)$(subst $(patsubst %/,%,$(KBUILD_EXTMOD)),,$(@D))
 
 modinst_dir = $(if $(KBUILD_EXTMOD),$(ext-mod-dir),kernel/$(@D))
 
index d988f5d21e3df12b50e31b9eac86bc21e82b867c..65e707e1ffc3a564e1ed1924cab806ff88e09286 100644 (file)
@@ -62,6 +62,7 @@ modpost = scripts/mod/modpost                    \
  $(if $(KBUILD_EXTMOD),-i,-o) $(kernelsymfile)   \
  $(if $(KBUILD_EXTMOD),-I $(modulesymfile))      \
  $(if $(KBUILD_EXTMOD),-o $(modulesymfile))      \
+ $(if $(CONFIG_DEBUG_SECTION_MISMATCH),,-S)      \
  $(if $(KBUILD_EXTMOD)$(KBUILD_MODPOST_WARN),-w)
 
 quiet_cmd_modpost = MODPOST $(words $(filter-out vmlinux FORCE, $^)) modules
index 0e4bd5459df4e08e90770516e942ba9236da3a22..35bdc68b6e66604fc31f184fff4ad1c0e6c5398e 100644 (file)
@@ -30,6 +30,7 @@
  *             !Ifilename
  *             !Dfilename
  *             !Ffilename
+ *             !Pfilename
  *
  */
 
@@ -57,6 +58,7 @@ FILEONLY *symbolsonly;
 typedef void FILELINE(char * file, char * line);
 FILELINE * singlefunctions;
 FILELINE * entity_system;
+FILELINE * docsection;
 
 #define MAXLINESZ     2048
 #define MAXFILES      250
@@ -65,6 +67,7 @@ FILELINE * entity_system;
 #define DOCBOOK       "-docbook"
 #define FUNCTION      "-function"
 #define NOFUNCTION    "-nofunction"
+#define NODOCSECTIONS "-no-doc-sections"
 
 char *srctree;
 
@@ -231,13 +234,14 @@ void docfunctions(char * filename, char * type)
 
        for (i=0; i <= symfilecnt; i++)
                symcnt += symfilelist[i].symbolcnt;
-       vec = malloc((2 + 2 * symcnt + 2) * sizeof(char*));
+       vec = malloc((2 + 2 * symcnt + 3) * sizeof(char *));
        if (vec == NULL) {
                perror("docproc: ");
                exit(1);
        }
        vec[idx++] = KERNELDOC;
        vec[idx++] = DOCBOOK;
+       vec[idx++] = NODOCSECTIONS;
        for (i=0; i < symfilecnt; i++) {
                struct symfile * sym = &symfilelist[i];
                for (j=0; j < sym->symbolcnt; j++) {
@@ -286,13 +290,37 @@ void singfunc(char * filename, char * line)
        exec_kernel_doc(vec);
 }
 
+/*
+ * Insert specific documentation section from a file.
+ * Call kernel-doc with the following parameters:
+ * kernel-doc -docbook -function "doc section" filename
+ */
+void docsect(char *filename, char *line)
+{
+       char *vec[6]; /* kerneldoc -docbook -function "section" file NULL */
+       char *s;
+
+       for (s = line; *s; s++)
+               if (*s == '\n')
+                       *s = '\0';
+
+       vec[0] = KERNELDOC;
+       vec[1] = DOCBOOK;
+       vec[2] = FUNCTION;
+       vec[3] = line;
+       vec[4] = filename;
+       vec[5] = NULL;
+       exec_kernel_doc(vec);
+}
+
 /*
  * Parse file, calling action specific functions for:
  * 1) Lines containing !E
  * 2) Lines containing !I
  * 3) Lines containing !D
  * 4) Lines containing !F
- * 5) Default lines - lines not matching the above
+ * 5) Lines containing !P
+ * 6) Default lines - lines not matching the above
  */
 void parse_file(FILE *infile)
 {
@@ -326,6 +354,15 @@ void parse_file(FILE *infile)
                                                s++;
                                        singlefunctions(line +2, s);
                                        break;
+                               case 'P':
+                                       /* filename */
+                                       while (*s && !isspace(*s)) s++;
+                                       *s++ = '\0';
+                                       /* DOC: section name */
+                                       while (isspace(*s))
+                                               s++;
+                                       docsection(line + 2, s);
+                                       break;
                                default:
                                        defaultline(line);
                        }
@@ -372,6 +409,7 @@ int main(int argc, char *argv[])
                externalfunctions = find_export_symbols;
                symbolsonly       = find_export_symbols;
                singlefunctions   = noaction2;
+               docsection        = noaction2;
                parse_file(infile);
 
                /* Rewind to start from beginning of file again */
@@ -381,6 +419,7 @@ int main(int argc, char *argv[])
                externalfunctions = extfunc;
                symbolsonly       = printline;
                singlefunctions   = singfunc;
+               docsection        = docsect;
 
                parse_file(infile);
        }
@@ -394,6 +433,7 @@ int main(int argc, char *argv[])
                externalfunctions = adddep;
                symbolsonly       = adddep;
                singlefunctions   = adddep2;
+               docsection        = adddep2;
                parse_file(infile);
                printf("\n");
        }
index 1e1a8f620c473708d3477048105c24e05de83428..235d3938529db23f34c53996ec65bca70172d34a 100644 (file)
@@ -6,7 +6,19 @@
 # e.g., to decode an i386 oops on an x86_64 system, use:
 # AFLAGS=--32 decodecode < 386.oops
 
-T=`mktemp`
+cleanup() {
+       rm -f $T $T.s $T.o
+       exit 1
+}
+
+die() {
+       echo "$@"
+       exit 1
+}
+
+trap cleanup EXIT
+
+T=`mktemp` || die "cannot create temp file"
 code=
 
 while read i ; do
@@ -20,6 +32,7 @@ esac
 done
 
 if [ -z "$code" ]; then
+       rm $T
        exit
 fi
 
@@ -48,4 +61,4 @@ echo -n "     .byte 0x" > $T.s
 echo $code >> $T.s
 as $AFLAGS -o $T.o $T.s
 objdump -S $T.o
-rm $T.o $T.s
+rm $T $T.s $T.o
index a5121a6d8949eb6bce96cca1904801f75a8bbe57..cc767b388baf7b30ef04b61650911069818d7ba0 100644 (file)
@@ -9,7 +9,10 @@
 # gcc-2.95.3, `030301' for gcc-3.3.1, etc.
 #
 
-if [[ $1 = "-p" ]] ; then with_patchlevel=1; shift; fi
+if [ "$1" = "-p" ] ; then
+       with_patchlevel=1;
+       shift;
+fi
 
 compiler="$*"
 
index 511023b430a8fc8cb688de2614e73d3dba7c47f4..dca5e0dd09bf1fdc57b525f23befeba074e41c10 100644 (file)
@@ -440,17 +440,21 @@ void error_with_pos(const char *fmt, ...)
 
 static void genksyms_usage(void)
 {
-       fputs("Usage:\n" "genksyms [-dDwqhV] > /path/to/.tmp_obj.ver\n" "\n"
+       fputs("Usage:\n" "genksyms [-adDTwqhV] > /path/to/.tmp_obj.ver\n" "\n"
 #ifdef __GNU_LIBRARY__
+             "  -a, --arch            Select architecture\n"
              "  -d, --debug           Increment the debug level (repeatable)\n"
              "  -D, --dump            Dump expanded symbol defs (for debugging only)\n"
+             "  -T, --dump-types file Dump expanded types into file (for debugging only)\n"
              "  -w, --warnings        Enable warnings\n"
              "  -q, --quiet           Disable warnings (default)\n"
              "  -h, --help            Print this message\n"
              "  -V, --version         Print the release version\n"
 #else                          /* __GNU_LIBRARY__ */
+             "  -a                    Select architecture\n"
              "  -d                    Increment the debug level (repeatable)\n"
              "  -D                    Dump expanded symbol defs (for debugging only)\n"
+             "  -T file               Dump expanded types into file (for debugging only)\n"
              "  -w                    Enable warnings\n"
              "  -q                    Disable warnings (default)\n"
              "  -h                    Print this message\n"
@@ -477,10 +481,10 @@ int main(int argc, char **argv)
                {0, 0, 0, 0}
        };
 
-       while ((o = getopt_long(argc, argv, "a:dwqVDT:k:p:",
+       while ((o = getopt_long(argc, argv, "a:dwqVDT:h",
                                &long_opts[0], NULL)) != EOF)
 #else                          /* __GNU_LIBRARY__ */
-       while ((o = getopt(argc, argv, "a:dwqVDT:k:p:")) != EOF)
+       while ((o = getopt(argc, argv, "a:dwqVDT:h")) != EOF)
 #endif                         /* __GNU_LIBRARY__ */
                switch (o) {
                case 'a':
index 1ad6f7fc490a1233b46bcb20fd8ea7e0562f02ba..32e8c5a227c3a28bab34514da98e1491d057bef4 100644 (file)
@@ -24,22 +24,25 @@ oldconfig: $(obj)/conf
 silentoldconfig: $(obj)/conf
        $< -s $(Kconfig)
 
-# Create new linux.po file
+# Create new linux.pot file
 # Adjust charset to UTF-8 in .po file to accept UTF-8 in Kconfig files
 # The symlink is used to repair a deficiency in arch/um
-update-po-config: $(obj)/kxgettext
-       xgettext --default-domain=linux                  \
+update-po-config: $(obj)/kxgettext $(obj)/gconf.glade.h
+       $(Q)echo "  GEN config"
+       $(Q)xgettext --default-domain=linux              \
            --add-comments --keyword=_ --keyword=N_      \
            --from-code=UTF-8                            \
            --files-from=scripts/kconfig/POTFILES.in     \
            --output $(obj)/config.pot
        $(Q)sed -i s/CHARSET/UTF-8/ $(obj)/config.pot
        $(Q)ln -fs Kconfig.i386 arch/um/Kconfig.arch
-       (for i in `ls arch/`;                            \
-       do                                               \
-           $(obj)/kxgettext arch/$$i/Kconfig;           \
-       done ) >> $(obj)/config.pot
-       msguniq --sort-by-file --to-code=UTF-8 $(obj)/config.pot \
+       $(Q)(for i in `ls arch/`;                        \
+           do                                           \
+               echo "  GEN $$i";                        \
+               $(obj)/kxgettext arch/$$i/Kconfig        \
+                    >> $(obj)/config.pot;               \
+           done )
+       $(Q)msguniq --sort-by-file --to-code=UTF-8 $(obj)/config.pot \
            --output $(obj)/linux.pot
        $(Q)rm -f arch/um/Kconfig.arch
        $(Q)rm -f $(obj)/config.pot
@@ -93,12 +96,6 @@ HOST_LOADLIBES   = $(shell $(CONFIG_SHELL) $(check-lxdialog) -ldflags $(HOSTCC))
 
 HOST_EXTRACFLAGS += -DLOCALE
 
-PHONY += $(obj)/dochecklxdialog
-$(obj)/dochecklxdialog:
-       $(Q)$(CONFIG_SHELL) $(check-lxdialog) -check $(HOSTCC) $(HOST_LOADLIBES)
-
-always := dochecklxdialog
-
 
 # ===========================================================================
 # Shared Makefile for the various kconfig executables:
@@ -142,8 +139,17 @@ gconf-objs := gconf.o kconfig_load.o zconf.tab.o
 endif
 
 clean-files    := lkc_defs.h qconf.moc .tmp_qtcheck \
-                  .tmp_gtkcheck zconf.tab.c lex.zconf.c zconf.hash.c
+                  .tmp_gtkcheck zconf.tab.c lex.zconf.c zconf.hash.c gconf.glade.h
 clean-files     += mconf qconf gconf
+clean-files     += config.pot linux.pot
+
+# Check that we have the required ncurses stuff installed for lxdialog (menuconfig)
+PHONY += $(obj)/dochecklxdialog
+$(addprefix $(obj)/,$(lxdialog)): $(obj)/dochecklxdialog
+$(obj)/dochecklxdialog:
+       $(Q)$(CONFIG_SHELL) $(check-lxdialog) -check $(HOSTCC) $(HOST_EXTRACFLAGS) $(HOST_LOADLIBES)
+
+always := dochecklxdialog
 
 # Add environment specific flags
 HOST_EXTRACFLAGS += $(shell $(CONFIG_SHELL) $(srctree)/$(src)/check.sh $(HOSTCC) $(HOSTCFLAGS))
@@ -248,6 +254,9 @@ $(obj)/%.moc: $(src)/%.h
 $(obj)/lkc_defs.h: $(src)/lkc_proto.h
        sed < $< > $@ 's/P(\([^,]*\),.*/#define \1 (\*\1_p)/'
 
+# Extract gconf menu items for I18N support
+$(obj)/gconf.glade.h: $(obj)/gconf.glade
+       intltool-extract --type=gettext/glade $(obj)/gconf.glade
 
 ###
 # The following requires flex/bison/gperf
index cc94e46a79e8cf37cc5c074e74c11bf23989ea8f..9674573969902fab7bc90a3875f653d5bae6ba20 100644 (file)
@@ -1,5 +1,12 @@
+scripts/kconfig/lxdialog/checklist.c
+scripts/kconfig/lxdialog/inputbox.c
+scripts/kconfig/lxdialog/menubox.c
+scripts/kconfig/lxdialog/textbox.c
+scripts/kconfig/lxdialog/util.c
+scripts/kconfig/lxdialog/yesno.c
 scripts/kconfig/mconf.c
 scripts/kconfig/conf.c
 scripts/kconfig/confdata.c
 scripts/kconfig/gconf.c
+scripts/kconfig/gconf.glade.h
 scripts/kconfig/qconf.cc
index 8d6f17490c5efe753e34b28e38158997f6226567..fda63136ae6809512c6d84ab5c11d1497eee3bc5 100644 (file)
@@ -3,12 +3,13 @@
  * Released under the terms of the GNU GPL v2.0.
  */
 
+#include <locale.h>
 #include <ctype.h>
-#include <stdlib.h>
 #include <stdio.h>
+#include <stdlib.h>
 #include <string.h>
-#include <unistd.h>
 #include <time.h>
+#include <unistd.h>
 #include <sys/stat.h>
 
 #define LKC_DIRECT_LINK
@@ -40,7 +41,7 @@ static char nohelp_text[] = N_("Sorry, no help available for this option yet.\n"
 static const char *get_help(struct menu *menu)
 {
        if (menu_has_help(menu))
-               return menu_get_help(menu);
+               return _(menu_get_help(menu));
        else
                return nohelp_text;
 }
@@ -78,7 +79,7 @@ static int conf_askvalue(struct symbol *sym, const char *def)
        tristate val;
 
        if (!sym_has_value(sym))
-               printf("(NEW) ");
+               printf(_("(NEW) "));
 
        line[0] = '\n';
        line[1] = 0;
@@ -160,7 +161,7 @@ static int conf_askvalue(struct symbol *sym, const char *def)
                }
        case set_random:
                do {
-                       val = (tristate)(random() % 3);
+                       val = (tristate)(rand() % 3);
                } while (!sym_tristate_within_range(sym, val));
                switch (val) {
                case no: line[0] = 'n'; break;
@@ -183,7 +184,7 @@ int conf_string(struct menu *menu)
        const char *def;
 
        while (1) {
-               printf("%*s%s ", indent - 1, "", menu->prompt->text);
+               printf("%*s%s ", indent - 1, "", _(menu->prompt->text));
                printf("(%s) ", sym->name);
                def = sym_get_string_value(sym);
                if (sym_get_string_value(sym))
@@ -216,7 +217,7 @@ static int conf_sym(struct menu *menu)
        tristate oldval, newval;
 
        while (1) {
-               printf("%*s%s ", indent - 1, "", menu->prompt->text);
+               printf("%*s%s ", indent - 1, "", _(menu->prompt->text));
                if (sym->name)
                        printf("(%s) ", sym->name);
                type = sym_get_type(sym);
@@ -306,7 +307,7 @@ static int conf_choice(struct menu *menu)
                case no:
                        return 1;
                case mod:
-                       printf("%*s%s\n", indent - 1, "", menu_get_prompt(menu));
+                       printf("%*s%s\n", indent - 1, "", _(menu_get_prompt(menu)));
                        return 0;
                case yes:
                        break;
@@ -316,7 +317,7 @@ static int conf_choice(struct menu *menu)
        while (1) {
                int cnt, def;
 
-               printf("%*s%s\n", indent - 1, "", menu_get_prompt(menu));
+               printf("%*s%s\n", indent - 1, "", _(menu_get_prompt(menu)));
                def_sym = sym_get_choice_value(sym);
                cnt = def = 0;
                line[0] = 0;
@@ -324,7 +325,7 @@ static int conf_choice(struct menu *menu)
                        if (!menu_is_visible(child))
                                continue;
                        if (!child->sym) {
-                               printf("%*c %s\n", indent, '*', menu_get_prompt(child));
+                               printf("%*c %s\n", indent, '*', _(menu_get_prompt(child)));
                                continue;
                        }
                        cnt++;
@@ -333,14 +334,14 @@ static int conf_choice(struct menu *menu)
                                printf("%*c", indent, '>');
                        } else
                                printf("%*c", indent, ' ');
-                       printf(" %d. %s", cnt, menu_get_prompt(child));
+                       printf(" %d. %s", cnt, _(menu_get_prompt(child)));
                        if (child->sym->name)
                                printf(" (%s)", child->sym->name);
                        if (!sym_has_value(child->sym))
-                               printf(" (NEW)");
+                               printf(_(" (NEW)"));
                        printf("\n");
                }
-               printf("%*schoice", indent - 1, "");
+               printf(_("%*schoice"), indent - 1, "");
                if (cnt == 1) {
                        printf("[1]: 1\n");
                        goto conf_childs;
@@ -375,7 +376,7 @@ static int conf_choice(struct menu *menu)
                        break;
                case set_random:
                        if (is_new)
-                               def = (random() % cnt) + 1;
+                               def = (rand() % cnt) + 1;
                case set_default:
                case set_yes:
                case set_mod:
@@ -399,9 +400,9 @@ static int conf_choice(struct menu *menu)
                        continue;
                }
                sym_set_choice_value(sym, child->sym);
-               if (child->list) {
+               for (child = child->list; child; child = child->next) {
                        indent += 2;
-                       conf(child->list);
+                       conf(child);
                        indent -= 2;
                }
                return 1;
@@ -433,7 +434,7 @@ static void conf(struct menu *menu)
                        if (prompt)
                                printf("%*c\n%*c %s\n%*c\n",
                                        indent, '*',
-                                       indent, '*', prompt,
+                                       indent, '*', _(prompt),
                                        indent, '*');
                default:
                        ;
@@ -495,12 +496,16 @@ static void check_conf(struct menu *menu)
 
 int main(int ac, char **av)
 {
-       int i = 1;
+       int opt;
        const char *name;
        struct stat tmpstat;
 
-       if (ac > i && av[i][0] == '-') {
-               switch (av[i++][1]) {
+       setlocale(LC_ALL, "");
+       bindtextdomain(PACKAGE, LOCALEDIR);
+       textdomain(PACKAGE);
+
+       while ((opt = getopt(ac, av, "osdD:nmyrh")) != -1) {
+               switch (opt) {
                case 'o':
                        input_mode = ask_new;
                        break;
@@ -513,12 +518,7 @@ int main(int ac, char **av)
                        break;
                case 'D':
                        input_mode = set_default;
-                       defconfig_file = av[i++];
-                       if (!defconfig_file) {
-                               printf(_("%s: No default config file specified\n"),
-                                       av[0]);
-                               exit(1);
-                       }
+                       defconfig_file = optarg;
                        break;
                case 'n':
                        input_mode = set_no;
@@ -531,19 +531,22 @@ int main(int ac, char **av)
                        break;
                case 'r':
                        input_mode = set_random;
-                       srandom(time(NULL));
+                       srand(time(NULL));
                        break;
                case 'h':
-               case '?':
-                       fprintf(stderr, "See README for usage info\n");
+                       printf(_("See README for usage info\n"));
                        exit(0);
+                       break;
+               default:
+                       fprintf(stderr, _("See README for usage info\n"));
+                       exit(1);
                }
        }
-       name = av[i];
-       if (!name) {
+       if (ac == optind) {
                printf(_("%s: Kconfig file missing\n"), av[0]);
                exit(1);
        }
+       name = av[optind];
        conf_parse(name);
        //zconfdump(stdout);
        switch (input_mode) {
@@ -551,9 +554,9 @@ int main(int ac, char **av)
                if (!defconfig_file)
                        defconfig_file = conf_get_default_confname();
                if (conf_read(defconfig_file)) {
-                       printf("***\n"
+                       printf(_("***\n"
                                "*** Can't find default configuration \"%s\"!\n"
-                               "***\n", defconfig_file);
+                               "***\n"), defconfig_file);
                        exit(1);
                }
                break;
index e0f402f3b75d94d765875a6299f1f154e23a1c32..ee5fe943d58db0eda374b91a8181e5991d8cf23b 100644 (file)
@@ -232,8 +232,7 @@ load:
                                        sym->type = S_BOOLEAN;
                        }
                        if (sym->flags & def_flags) {
-                               conf_warning("trying to reassign symbol %s", sym->name);
-                               break;
+                               conf_warning("override: reassigning to symbol %s", sym->name);
                        }
                        switch (sym->type) {
                        case S_BOOLEAN:
@@ -272,8 +271,7 @@ load:
                                        sym->type = S_OTHER;
                        }
                        if (sym->flags & def_flags) {
-                               conf_warning("trying to reassign symbol %s", sym->name);
-                               break;
+                               conf_warning("override: reassigning to symbol %s", sym->name);
                        }
                        if (conf_set_sym_val(sym, def, def_flags, p))
                                continue;
@@ -297,14 +295,12 @@ load:
                                }
                                break;
                        case yes:
-                               if (cs->def[def].tri != no) {
-                                       conf_warning("%s creates inconsistent choice state", sym->name);
-                                       cs->flags &= ~def_flags;
-                               } else
-                                       cs->def[def].val = sym;
+                               if (cs->def[def].tri != no)
+                                       conf_warning("override: %s changes choice state", sym->name);
+                               cs->def[def].val = sym;
                                break;
                        }
-                       cs->def[def].tri = E_OR(cs->def[def].tri, sym->def[def].tri);
+                       cs->def[def].tri = EXPR_OR(cs->def[def].tri, sym->def[def].tri);
                }
        }
        fclose(in);
@@ -316,7 +312,7 @@ load:
 
 int conf_read(const char *name)
 {
-       struct symbol *sym;
+       struct symbol *sym, *choice_sym;
        struct property *prop;
        struct expr *e;
        int i, flags;
@@ -357,9 +353,9 @@ int conf_read(const char *name)
                 */
                prop = sym_get_choice_prop(sym);
                flags = sym->flags;
-               for (e = prop->expr; e; e = e->left.expr)
-                       if (e->right.sym->visible != no)
-                               flags &= e->right.sym->flags;
+               expr_list_for_each_sym(prop->expr, e, choice_sym)
+                       if (choice_sym->visible != no)
+                               flags &= choice_sym->flags;
                sym->flags &= flags | ~SYMBOL_DEF_USER;
        }
 
index 6f98dbfe70cf4df2e57432e838465c33e4f4888e..579ece4fa584506a882e9e2bdd3e7fbb28d3b0c1 100644 (file)
@@ -87,7 +87,7 @@ struct expr *expr_copy(struct expr *org)
                break;
        case E_AND:
        case E_OR:
-       case E_CHOICE:
+       case E_LIST:
                e->left.expr = expr_copy(org->left.expr);
                e->right.expr = expr_copy(org->right.expr);
                break;
@@ -217,7 +217,7 @@ int expr_eq(struct expr *e1, struct expr *e2)
                expr_free(e2);
                trans_count = old_count;
                return res;
-       case E_CHOICE:
+       case E_LIST:
        case E_RANGE:
        case E_NONE:
                /* panic */;
@@ -648,7 +648,7 @@ struct expr *expr_transform(struct expr *e)
        case E_EQUAL:
        case E_UNEQUAL:
        case E_SYMBOL:
-       case E_CHOICE:
+       case E_LIST:
                break;
        default:
                e->left.expr = expr_transform(e->left.expr);
@@ -932,7 +932,7 @@ struct expr *expr_trans_compare(struct expr *e, enum expr_type type, struct symb
                break;
        case E_SYMBOL:
                return expr_alloc_comp(type, e->left.sym, sym);
-       case E_CHOICE:
+       case E_LIST:
        case E_RANGE:
        case E_NONE:
                /* panic */;
@@ -955,14 +955,14 @@ tristate expr_calc_value(struct expr *e)
        case E_AND:
                val1 = expr_calc_value(e->left.expr);
                val2 = expr_calc_value(e->right.expr);
-               return E_AND(val1, val2);
+               return EXPR_AND(val1, val2);
        case E_OR:
                val1 = expr_calc_value(e->left.expr);
                val2 = expr_calc_value(e->right.expr);
-               return E_OR(val1, val2);
+               return EXPR_OR(val1, val2);
        case E_NOT:
                val1 = expr_calc_value(e->left.expr);
-               return E_NOT(val1);
+               return EXPR_NOT(val1);
        case E_EQUAL:
                sym_calc_value(e->left.sym);
                sym_calc_value(e->right.sym);
@@ -1000,9 +1000,9 @@ int expr_compare_type(enum expr_type t1, enum expr_type t2)
                if (t2 == E_OR)
                        return 1;
        case E_OR:
-               if (t2 == E_CHOICE)
+               if (t2 == E_LIST)
                        return 1;
-       case E_CHOICE:
+       case E_LIST:
                if (t2 == 0)
                        return 1;
        default:
@@ -1034,12 +1034,18 @@ void expr_print(struct expr *e, void (*fn)(void *, struct symbol *, const char *
                expr_print(e->left.expr, fn, data, E_NOT);
                break;
        case E_EQUAL:
-               fn(data, e->left.sym, e->left.sym->name);
+               if (e->left.sym->name)
+                       fn(data, e->left.sym, e->left.sym->name);
+               else
+                       fn(data, NULL, "<choice>");
                fn(data, NULL, "=");
                fn(data, e->right.sym, e->right.sym->name);
                break;
        case E_UNEQUAL:
-               fn(data, e->left.sym, e->left.sym->name);
+               if (e->left.sym->name)
+                       fn(data, e->left.sym, e->left.sym->name);
+               else
+                       fn(data, NULL, "<choice>");
                fn(data, NULL, "!=");
                fn(data, e->right.sym, e->right.sym->name);
                break;
@@ -1053,11 +1059,11 @@ void expr_print(struct expr *e, void (*fn)(void *, struct symbol *, const char *
                fn(data, NULL, " && ");
                expr_print(e->right.expr, fn, data, E_AND);
                break;
-       case E_CHOICE:
+       case E_LIST:
                fn(data, e->right.sym, e->right.sym->name);
                if (e->left.expr) {
                        fn(data, NULL, " ^ ");
-                       expr_print(e->left.expr, fn, data, E_CHOICE);
+                       expr_print(e->left.expr, fn, data, E_LIST);
                }
                break;
        case E_RANGE:
index a195986eec6f83f0d7b76627782815855fc845fe..9d4cba1c001d9ec5f48a365a2d864e1b2f3f467a 100644 (file)
@@ -25,14 +25,13 @@ struct file {
 
 #define FILE_BUSY              0x0001
 #define FILE_SCANNED           0x0002
-#define FILE_PRINTED           0x0004
 
 typedef enum tristate {
        no, mod, yes
 } tristate;
 
 enum expr_type {
-       E_NONE, E_OR, E_AND, E_NOT, E_EQUAL, E_UNEQUAL, E_CHOICE, E_SYMBOL, E_RANGE
+       E_NONE, E_OR, E_AND, E_NOT, E_EQUAL, E_UNEQUAL, E_LIST, E_SYMBOL, E_RANGE
 };
 
 union expr_data {
@@ -45,9 +44,12 @@ struct expr {
        union expr_data left, right;
 };
 
-#define E_OR(dep1, dep2)       (((dep1)>(dep2))?(dep1):(dep2))
-#define E_AND(dep1, dep2)      (((dep1)<(dep2))?(dep1):(dep2))
-#define E_NOT(dep)             (2-(dep))
+#define EXPR_OR(dep1, dep2)    (((dep1)>(dep2))?(dep1):(dep2))
+#define EXPR_AND(dep1, dep2)   (((dep1)<(dep2))?(dep1):(dep2))
+#define EXPR_NOT(dep)          (2-(dep))
+
+#define expr_list_for_each_sym(l, e, s) \
+       for (e = (l); e && (s = e->right.sym); e = e->left.expr)
 
 struct expr_value {
        struct expr *expr;
@@ -86,7 +88,6 @@ struct symbol {
 #define SYMBOL_CHECK           0x0008
 #define SYMBOL_CHOICE          0x0010
 #define SYMBOL_CHOICEVAL       0x0020
-#define SYMBOL_PRINTED         0x0040
 #define SYMBOL_VALID           0x0080
 #define SYMBOL_OPTIONAL                0x0100
 #define SYMBOL_WRITE           0x0200
@@ -105,7 +106,8 @@ struct symbol {
 #define SYMBOL_HASHMASK                0xff
 
 enum prop_type {
-       P_UNKNOWN, P_PROMPT, P_COMMENT, P_MENU, P_DEFAULT, P_CHOICE, P_SELECT, P_RANGE
+       P_UNKNOWN, P_PROMPT, P_COMMENT, P_MENU, P_DEFAULT, P_CHOICE,
+       P_SELECT, P_RANGE, P_ENV
 };
 
 struct property {
index 262908cfc2ac66a057eb18d4a7c1853f96f34f95..199b22bb49e2a7940bae8f84e38348e6aca30480 100644 (file)
@@ -119,8 +119,6 @@ const char *dbg_print_flags(int val)
                strcat(buf, "choice/");
        if (val & SYMBOL_CHOICEVAL)
                strcat(buf, "choiceval/");
-       if (val & SYMBOL_PRINTED)
-               strcat(buf, "printed/");
        if (val & SYMBOL_VALID)
                strcat(buf, "valid/");
        if (val & SYMBOL_OPTIONAL)
@@ -457,14 +455,18 @@ static void text_insert_help(struct menu *menu)
 {
        GtkTextBuffer *buffer;
        GtkTextIter start, end;
-       const char *prompt = menu_get_prompt(menu);
+       const char *prompt = _(menu_get_prompt(menu));
        gchar *name;
        const char *help;
 
-       help = _(menu_get_help(menu));
+       help = menu_get_help(menu);
+
+       /* Gettextize if the help text not empty */
+       if ((help != 0) && (help[0] != 0))
+               help = _(help);
 
        if (menu->sym && menu->sym->name)
-               name = g_strdup_printf(_(menu->sym->name));
+               name = g_strdup_printf(menu->sym->name);
        else
                name = g_strdup("");
 
@@ -1171,7 +1173,7 @@ static gchar **fill_row(struct menu *menu)
        bzero(row, sizeof(row));
 
        row[COL_OPTION] =
-           g_strdup_printf("%s %s", menu_get_prompt(menu),
+           g_strdup_printf("%s %s", _(menu_get_prompt(menu)),
                            sym && sym_has_value(sym) ? "(NEW)" : "");
 
        if (show_all && !menu_is_visible(menu))
@@ -1221,7 +1223,7 @@ static gchar **fill_row(struct menu *menu)
 
                if (def_menu)
                        row[COL_VALUE] =
-                           g_strdup(menu_get_prompt(def_menu));
+                           g_strdup(_(menu_get_prompt(def_menu)));
        }
        if (sym->flags & SYMBOL_CHOICEVAL)
                row[COL_BTNRAD] = GINT_TO_POINTER(TRUE);
index a065d5a57c01c131694d6da73ce5591063b34e9b..bed0f4e2d2f7e6632518c8577d39afd74462afdc 100644 (file)
@@ -1275,6 +1275,11 @@ YY_RULE_SETUP
 case 32:
 YY_RULE_SETUP
 {
+               while (zconfleng) {
+                       if ((zconftext[zconfleng-1] != ' ') && (zconftext[zconfleng-1] != '\t'))
+                               break;
+                       zconfleng--;
+               }
                append_string(zconftext, zconfleng);
                if (!first_ts)
                        first_ts = last_ts;
index 8a07ee4f6bd486a8cb18fec1297ef1f21dc08e82..4bc68f20a73c4ab8ac9364133ee57e5cd5655375 100644 (file)
@@ -44,6 +44,7 @@ extern "C" {
 
 #define T_OPT_MODULES          1
 #define T_OPT_DEFCONFIG_LIST   2
+#define T_OPT_ENV              3
 
 struct kconf_id {
        int name;
@@ -74,6 +75,7 @@ void kconfig_load(void);
 
 /* menu.c */
 void menu_init(void);
+void menu_warn(struct menu *menu, const char *fmt, ...);
 struct menu *menu_add_menu(void);
 void menu_end_menu(void);
 void menu_add_entry(struct symbol *sym);
@@ -103,6 +105,8 @@ void str_printf(struct gstr *gs, const char *fmt, ...);
 const char *str_get(struct gstr *gs);
 
 /* symbol.c */
+extern struct expr *sym_env_list;
+
 void sym_init(void);
 void sym_clear_all_valid(void);
 void sym_set_all_changed(void);
@@ -110,6 +114,7 @@ void sym_set_changed(struct symbol *sym);
 struct symbol *sym_check_deps(struct symbol *sym);
 struct property *prop_alloc(enum prop_type type, struct symbol *sym);
 struct symbol *prop_get_symbol(struct property *prop);
+struct property *sym_get_env_prop(struct symbol *sym);
 
 static inline tristate sym_get_tristate_value(struct symbol *sym)
 {
index 9681476b96e7fcf371840d78e17aa85657d327a1..62e1e02126e61d5cde4945c9d7f922153ea6f90e 100644 (file)
@@ -36,14 +36,16 @@ trap "rm -f $tmp" 0 1 2 3 15
 
 # Check if we can link to ncurses
 check() {
-       echo "main() {}" | $cc -xc - -o $tmp 2> /dev/null
+       echo -e " #include CURSES_LOC \n main() {}" |
+           $cc -xc - -o $tmp 2> /dev/null
        if [ $? != 0 ]; then
-               echo " *** Unable to find the ncurses libraries."          1>&2
-               echo " *** make menuconfig require the ncurses libraries"  1>&2
-               echo " *** "                                               1>&2
-               echo " *** Install ncurses (ncurses-devel) and try again"  1>&2
-               echo " *** "                                               1>&2
-               exit 1
+           echo " *** Unable to find the ncurses libraries or the"       1>&2
+           echo " *** required header files."                            1>&2
+           echo " *** 'make menuconfig' requires the ncurses libraries." 1>&2
+           echo " *** "                                                  1>&2
+           echo " *** Install ncurses (ncurses-devel) and try again."    1>&2
+           echo " *** "                                                  1>&2
+           exit 1
        fi
 }
 
index cf697080ddddd4bf47df0057f860ef681255ea29..b2a878c936d63c793bca031b9b5d52f62b9e2e19 100644 (file)
@@ -97,8 +97,8 @@ static void print_buttons(WINDOW * dialog, int height, int width, int selected)
        int x = width / 2 - 11;
        int y = height - 2;
 
-       print_button(dialog, "Select", y, x, selected == 0);
-       print_button(dialog, " Help ", y, x + 14, selected == 1);
+       print_button(dialog, gettext("Select"), y, x, selected == 0);
+       print_button(dialog, gettext(" Help "), y, x + 14, selected == 1);
 
        wmove(dialog, y, x + 1 + 14 * selected);
        wrefresh(dialog);
index 7e17eba75ae8d8e7b23cef47fe19096fa33870db..b5211fce0d946ab90eaf1ba239f3b20b40039b11 100644 (file)
 #include <string.h>
 #include <stdbool.h>
 
+#ifndef KBUILD_NO_NLS
+# include <libintl.h>
+#else
+# define gettext(Msgid) ((const char *) (Msgid))
+#endif
+
 #ifdef __sun__
 #define CURS_MACROS
 #endif
@@ -187,10 +193,9 @@ int item_is_tag(char tag);
 int on_key_esc(WINDOW *win);
 int on_key_resize(void);
 
-void init_dialog(const char *backtitle);
+int init_dialog(const char *backtitle);
 void set_dialog_backtitle(const char *backtitle);
-void reset_dialog(void);
-void end_dialog(void);
+void end_dialog(int x, int y);
 void attr_clear(WINDOW * win, int height, int width, chtype attr);
 void dialog_clear(void);
 void print_autowrap(WINDOW * win, const char *prompt, int width, int y, int x);
index 05e72066b35987dfe7037f6abd2fe29a03cb016a..4946bd02b46d4324939b8ba7e03db1174359d645 100644 (file)
@@ -31,8 +31,8 @@ static void print_buttons(WINDOW * dialog, int height, int width, int selected)
        int x = width / 2 - 11;
        int y = height - 2;
 
-       print_button(dialog, "  Ok  ", y, x, selected == 0);
-       print_button(dialog, " Help ", y, x + 14, selected == 1);
+       print_button(dialog, gettext("  Ok  "), y, x, selected == 0);
+       print_button(dialog, gettext(" Help "), y, x + 14, selected == 1);
 
        wmove(dialog, y, x + 1 + 14 * selected);
        wrefresh(dialog);
index 0d83159d90127a1dad38be99edff1a0973353a33..fa9d633f293c7c3a107b956e15774afd27560b12 100644 (file)
@@ -157,9 +157,9 @@ static void print_buttons(WINDOW * win, int height, int width, int selected)
        int x = width / 2 - 16;
        int y = height - 2;
 
-       print_button(win, "Select", y, x, selected == 0);
-       print_button(win, " Exit ", y, x + 12, selected == 1);
-       print_button(win, " Help ", y, x + 24, selected == 2);
+       print_button(win, gettext("Select"), y, x, selected == 0);
+       print_button(win, gettext(" Exit "), y, x + 12, selected == 1);
+       print_button(win, gettext(" Help "), y, x + 24, selected == 2);
 
        wmove(win, y, x + 1 + 12 * selected);
        wrefresh(win);
index fabfc1ad789d674f5f91637cba70cbfc335431d0..c704712d02271431fa9c42d0cd95be9c2ab9d988 100644 (file)
@@ -114,7 +114,7 @@ do_resize:
 
        print_title(dialog, title, width);
 
-       print_button(dialog, " Exit ", height - 2, width / 2 - 4, TRUE);
+       print_button(dialog, gettext(" Exit "), height - 2, width / 2 - 4, TRUE);
        wnoutrefresh(dialog);
        getyx(dialog, cur_y, cur_x);    /* Save cursor position */
 
index a1bddefe73d07485ab008b382a71905ede31ee71..86d95cca46a7cfc57922e50d8fd91b74fc8da745 100644 (file)
@@ -266,31 +266,41 @@ void dialog_clear(void)
 /*
  * Do some initialization for dialog
  */
-void init_dialog(const char *backtitle)
+int init_dialog(const char *backtitle)
 {
-       dlg.backtitle = backtitle;
-       color_setup(getenv("MENUCONFIG_COLOR"));
-}
+       int height, width;
+
+       initscr();              /* Init curses */
+       getmaxyx(stdscr, height, width);
+       if (height < 19 || width < 80) {
+               endwin();
+               return -ERRDISPLAYTOOSMALL;
+       }
 
-void set_dialog_backtitle(const char *backtitle)
-{
        dlg.backtitle = backtitle;
-}
+       color_setup(getenv("MENUCONFIG_COLOR"));
 
-void reset_dialog(void)
-{
-       initscr();              /* Init curses */
        keypad(stdscr, TRUE);
        cbreak();
        noecho();
        dialog_clear();
+
+       return 0;
+}
+
+void set_dialog_backtitle(const char *backtitle)
+{
+       dlg.backtitle = backtitle;
 }
 
 /*
  * End using dialog functions.
  */
-void end_dialog(void)
+void end_dialog(int x, int y)
 {
+       /* move cursor back to original position */
+       move(y, x);
+       refresh();
        endwin();
 }
 
index ee0a04e3e012ecd4ba119cd153d96a08ded7b2a2..4e6e8090c20b573868e3cfe15015a6b6f54b2f69 100644 (file)
@@ -29,8 +29,8 @@ static void print_buttons(WINDOW * dialog, int height, int width, int selected)
        int x = width / 2 - 10;
        int y = height - 2;
 
-       print_button(dialog, " Yes ", y, x, selected == 0);
-       print_button(dialog, "  No  ", y, x + 13, selected == 1);
+       print_button(dialog, gettext(" Yes "), y, x, selected == 0);
+       print_button(dialog, gettext("  No  "), y, x + 13, selected == 1);
 
        wmove(dialog, y, x + 1 + 13 * selected);
        wrefresh(dialog);
index 47e226fdedd7c8564447c76ab2481dd0ee608a44..50e61c411bc0f89d8f9c5ec9d7c7ea6992a43371 100644 (file)
@@ -8,17 +8,13 @@
  * i18n, 2005, Arnaldo Carvalho de Melo <acme@conectiva.com.br>
  */
 
-#include <sys/ioctl.h>
-#include <sys/wait.h>
 #include <ctype.h>
 #include <errno.h>
 #include <fcntl.h>
 #include <limits.h>
-#include <signal.h>
 #include <stdarg.h>
 #include <stdlib.h>
 #include <string.h>
-#include <termios.h>
 #include <unistd.h>
 #include <locale.h>
 
@@ -275,8 +271,6 @@ search_help[] = N_(
        "\n");
 
 static int indent;
-static struct termios ios_org;
-static int rows = 0, cols = 0;
 static struct menu *current_menu;
 static int child_count;
 static int single_menu_mode;
@@ -290,51 +284,16 @@ static void show_textbox(const char *title, const char *text, int r, int c);
 static void show_helptext(const char *title, const char *text);
 static void show_help(struct menu *menu);
 
-static void init_wsize(void)
-{
-       struct winsize ws;
-       char *env;
-
-       if (!ioctl(STDIN_FILENO, TIOCGWINSZ, &ws)) {
-               rows = ws.ws_row;
-               cols = ws.ws_col;
-       }
-
-       if (!rows) {
-               env = getenv("LINES");
-               if (env)
-                       rows = atoi(env);
-               if (!rows)
-                       rows = 24;
-       }
-       if (!cols) {
-               env = getenv("COLUMNS");
-               if (env)
-                       cols = atoi(env);
-               if (!cols)
-                       cols = 80;
-       }
-
-       if (rows < 19 || cols < 80) {
-               fprintf(stderr, N_("Your display is too small to run Menuconfig!\n"));
-               fprintf(stderr, N_("It must be at least 19 lines by 80 columns.\n"));
-               exit(1);
-       }
-
-       rows -= 4;
-       cols -= 5;
-}
-
 static void get_prompt_str(struct gstr *r, struct property *prop)
 {
        int i, j;
        struct menu *submenu[8], *menu;
 
-       str_printf(r, "Prompt: %s\n", prop->text);
-       str_printf(r, "  Defined at %s:%d\n", prop->menu->file->name,
+       str_printf(r, _("Prompt: %s\n"), _(prop->text));
+       str_printf(r, _("  Defined at %s:%d\n"), prop->menu->file->name,
                prop->menu->lineno);
        if (!expr_is_yes(prop->visible.expr)) {
-               str_append(r, "  Depends on: ");
+               str_append(r, _("  Depends on: "));
                expr_gstr_print(prop->visible.expr, r);
                str_append(r, "\n");
        }
@@ -342,13 +301,13 @@ static void get_prompt_str(struct gstr *r, struct property *prop)
        for (i = 0; menu != &rootmenu && i < 8; menu = menu->parent)
                submenu[i++] = menu;
        if (i > 0) {
-               str_printf(r, "  Location:\n");
+               str_printf(r, _("  Location:\n"));
                for (j = 4; --i >= 0; j += 2) {
                        menu = submenu[i];
-                       str_printf(r, "%*c-> %s", j, ' ', menu_get_prompt(menu));
+                       str_printf(r, "%*c-> %s", j, ' ', _(menu_get_prompt(menu)));
                        if (menu->sym) {
                                str_printf(r, " (%s [=%s])", menu->sym->name ?
-                                       menu->sym->name : "<choice>",
+                                       menu->sym->name : _("<choice>"),
                                        sym_get_string_value(menu->sym));
                        }
                        str_append(r, "\n");
@@ -378,7 +337,7 @@ static void get_symbol_str(struct gstr *r, struct symbol *sym)
        if (hit)
                str_append(r, "\n");
        if (sym->rev_dep.expr) {
-               str_append(r, "  Selected by: ");
+               str_append(r, _("  Selected by: "));
                expr_gstr_print(sym->rev_dep.expr, r);
                str_append(r, "\n");
        }
@@ -394,7 +353,7 @@ static struct gstr get_relations_str(struct symbol **sym_arr)
        for (i = 0; sym_arr && (sym = sym_arr[i]); i++)
                get_symbol_str(&res, sym);
        if (!i)
-               str_append(&res, "No matches found.\n");
+               str_append(&res, _("No matches found.\n"));
        return res;
 }
 
@@ -474,6 +433,7 @@ static void build_conf(struct menu *menu)
                        switch (prop->type) {
                        case P_MENU:
                                child_count++;
+                               prompt = _(prompt);
                                if (single_menu_mode) {
                                        item_make("%s%*c%s",
                                                  menu->data ? "-->" : "++>",
@@ -489,7 +449,7 @@ static void build_conf(struct menu *menu)
                        case P_COMMENT:
                                if (prompt) {
                                        child_count++;
-                                       item_make("   %*c*** %s ***", indent + 1, ' ', prompt);
+                                       item_make("   %*c*** %s ***", indent + 1, ' ', _(prompt));
                                        item_set_tag(':');
                                        item_set_data(menu);
                                }
@@ -497,7 +457,7 @@ static void build_conf(struct menu *menu)
                        default:
                                if (prompt) {
                                        child_count++;
-                                       item_make("---%*c%s", indent + 1, ' ', prompt);
+                                       item_make("---%*c%s", indent + 1, ' ', _(prompt));
                                        item_set_tag(':');
                                        item_set_data(menu);
                                }
@@ -541,10 +501,10 @@ static void build_conf(struct menu *menu)
                        item_set_data(menu);
                }
 
-               item_add_str("%*c%s", indent + 1, ' ', menu_get_prompt(menu));
+               item_add_str("%*c%s", indent + 1, ' ', _(menu_get_prompt(menu)));
                if (val == yes) {
                        if (def_menu) {
-                               item_add_str(" (%s)", menu_get_prompt(def_menu));
+                               item_add_str(" (%s)", _(menu_get_prompt(def_menu)));
                                item_add_str("  --->");
                                if (def_menu->list) {
                                        indent += 2;
@@ -556,7 +516,7 @@ static void build_conf(struct menu *menu)
                }
        } else {
                if (menu == current_menu) {
-                       item_make("---%*c%s", indent + 1, ' ', menu_get_prompt(menu));
+                       item_make("---%*c%s", indent + 1, ' ', _(menu_get_prompt(menu)));
                        item_set_tag(':');
                        item_set_data(menu);
                        goto conf_childs;
@@ -599,17 +559,17 @@ static void build_conf(struct menu *menu)
                                tmp = indent - tmp + 4;
                                if (tmp < 0)
                                        tmp = 0;
-                               item_add_str("%*c%s%s", tmp, ' ', menu_get_prompt(menu),
+                               item_add_str("%*c%s%s", tmp, ' ', _(menu_get_prompt(menu)),
                                             (sym_has_value(sym) || !sym_is_changable(sym)) ?
-                                            "" : " (NEW)");
+                                            "" : _(" (NEW)"));
                                item_set_tag('s');
                                item_set_data(menu);
                                goto conf_childs;
                        }
                }
-               item_add_str("%*c%s%s", indent + 1, ' ', menu_get_prompt(menu),
+               item_add_str("%*c%s%s", indent + 1, ' ', _(menu_get_prompt(menu)),
                          (sym_has_value(sym) || !sym_is_changable(sym)) ?
-                         "" : " (NEW)");
+                         "" : _(" (NEW)"));
                if (menu->prompt->type == P_MENU) {
                        item_add_str("  --->");
                        return;
@@ -647,7 +607,7 @@ static void conf(struct menu *menu)
                        item_set_tag('S');
                }
                dialog_clear();
-               res = dialog_menu(prompt ? prompt : _("Main Menu"),
+               res = dialog_menu(prompt ? _(prompt) : _("Main Menu"),
                                  _(menu_instructions),
                                  active_menu, &s_scroll);
                if (res == 1 || res == KEY_ESC || res == -ERRDISPLAYTOOSMALL)
@@ -694,7 +654,7 @@ static void conf(struct menu *menu)
                        if (sym)
                                show_help(submenu);
                        else
-                               show_helptext("README", _(mconf_readme));
+                               show_helptext(_("README"), _(mconf_readme));
                        break;
                case 3:
                        if (item_is_tag('t')) {
@@ -752,13 +712,13 @@ static void show_help(struct menu *menu)
                str_append(&help, nohelp_text);
        }
        get_symbol_str(&help, sym);
-       show_helptext(menu_get_prompt(menu), str_get(&help));
+       show_helptext(_(menu_get_prompt(menu)), str_get(&help));
        str_free(&help);
 }
 
 static void conf_choice(struct menu *menu)
 {
-       const char *prompt = menu_get_prompt(menu);
+       const char *prompt = _(menu_get_prompt(menu));
        struct menu *child;
        struct symbol *active;
 
@@ -772,7 +732,7 @@ static void conf_choice(struct menu *menu)
                for (child = menu->list; child; child = child->next) {
                        if (!menu_is_visible(child))
                                continue;
-                       item_make("%s", menu_get_prompt(child));
+                       item_make("%s", _(menu_get_prompt(child)));
                        item_set_data(child);
                        if (child->sym == active)
                                item_set_selected(1);
@@ -780,7 +740,7 @@ static void conf_choice(struct menu *menu)
                                item_set_tag('X');
                }
                dialog_clear();
-               res = dialog_checklist(prompt ? prompt : _("Main Menu"),
+               res = dialog_checklist(prompt ? _(prompt) : _("Main Menu"),
                                        _(radiolist_instructions),
                                         15, 70, 6);
                selected = item_activate_selected();
@@ -826,10 +786,10 @@ static void conf_string(struct menu *menu)
                        heading = _(inputbox_instructions_string);
                        break;
                default:
-                       heading = "Internal mconf error!";
+                       heading = _("Internal mconf error!");
                }
                dialog_clear();
-               res = dialog_inputbox(prompt ? prompt : _("Main Menu"),
+               res = dialog_inputbox(prompt ? _(prompt) : _("Main Menu"),
                                      heading, 10, 75,
                                      sym_get_string_value(menu->sym));
                switch (res) {
@@ -900,13 +860,9 @@ static void conf_save(void)
        }
 }
 
-static void conf_cleanup(void)
-{
-       tcsetattr(1, TCSAFLUSH, &ios_org);
-}
-
 int main(int ac, char **av)
 {
+       int saved_x, saved_y;
        char *mode;
        int res;
 
@@ -923,11 +879,13 @@ int main(int ac, char **av)
                        single_menu_mode = 1;
        }
 
-       tcgetattr(1, &ios_org);
-       atexit(conf_cleanup);
-       init_wsize();
-       reset_dialog();
-       init_dialog(NULL);
+       getyx(stdscr, saved_y, saved_x);
+       if (init_dialog(NULL)) {
+               fprintf(stderr, N_("Your display is too small to run Menuconfig!\n"));
+               fprintf(stderr, N_("It must be at least 19 lines by 80 columns.\n"));
+               return 1;
+       }
+
        set_config_filename(conf_get_configname());
        do {
                conf(&rootmenu);
@@ -941,7 +899,7 @@ int main(int ac, char **av)
                else
                        res = -1;
        } while (res == KEY_ESC);
-       end_dialog();
+       end_dialog(saved_x, saved_y);
 
        switch (res) {
        case 0:
index f9d0d91a3fe44e2407761a60bb5a31dc22926b41..fdad17367f612428d61ab78e8ebb0df864672454 100644 (file)
@@ -15,7 +15,7 @@ static struct menu **last_entry_ptr;
 struct file *file_list;
 struct file *current_file;
 
-static void menu_warn(struct menu *menu, const char *fmt, ...)
+void menu_warn(struct menu *menu, const char *fmt, ...)
 {
        va_list ap;
        va_start(ap, fmt);
@@ -172,6 +172,9 @@ void menu_add_option(int token, char *arg)
                else if (sym_defconfig_list != current_entry->sym)
                        zconf_error("trying to redefine defconfig symbol");
                break;
+       case T_OPT_ENV:
+               prop_add_env(arg);
+               break;
        }
 }
 
@@ -239,9 +242,11 @@ void menu_finalize(struct menu *parent)
                        for (menu = parent->list; menu; menu = menu->next) {
                                if (menu->sym) {
                                        current_entry = parent;
-                                       menu_set_type(menu->sym->type);
+                                       if (sym->type == S_UNKNOWN)
+                                               menu_set_type(menu->sym->type);
                                        current_entry = menu;
-                                       menu_set_type(sym->type);
+                                       if (menu->sym->type == S_UNKNOWN)
+                                               menu_set_type(sym->type);
                                        break;
                                }
                        }
@@ -326,12 +331,42 @@ void menu_finalize(struct menu *parent)
                                            "values not supported");
                        }
                        current_entry = menu;
-                       menu_set_type(sym->type);
+                       if (menu->sym->type == S_UNKNOWN)
+                               menu_set_type(sym->type);
+                       /* Non-tristate choice values of tristate choices must
+                        * depend on the choice being set to Y. The choice
+                        * values' dependencies were propagated to their
+                        * properties above, so the change here must be re-
+                        * propagated. */
+                       if (sym->type == S_TRISTATE && menu->sym->type != S_TRISTATE) {
+                               basedep = expr_alloc_comp(E_EQUAL, sym, &symbol_yes);
+                               basedep = expr_alloc_and(basedep, menu->dep);
+                               basedep = expr_eliminate_dups(basedep);
+                               menu->dep = basedep;
+                               for (prop = menu->sym->prop; prop; prop = prop->next) {
+                                       if (prop->menu != menu)
+                                               continue;
+                                       dep = expr_alloc_and(expr_copy(basedep),
+                                                            prop->visible.expr);
+                                       dep = expr_eliminate_dups(dep);
+                                       dep = expr_trans_bool(dep);
+                                       prop->visible.expr = dep;
+                                       if (prop->type == P_SELECT) {
+                                               struct symbol *es = prop_get_symbol(prop);
+                                               dep2 = expr_alloc_symbol(menu->sym);
+                                               dep = expr_alloc_and(dep2,
+                                                                    expr_copy(dep));
+                                               dep = expr_alloc_or(es->rev_dep.expr, dep);
+                                               dep = expr_eliminate_dups(dep);
+                                               es->rev_dep.expr = dep;
+                                       }
+                               }
+                       }
                        menu_add_symbol(P_CHOICE, sym, NULL);
                        prop = sym_get_choice_prop(sym);
                        for (ep = &prop->expr; *ep; ep = &(*ep)->left.expr)
                                ;
-                       *ep = expr_alloc_one(E_CHOICE, NULL);
+                       *ep = expr_alloc_one(E_LIST, NULL);
                        (*ep)->right.sym = menu->sym;
                }
                if (menu->list && (!menu->prompt || !menu->prompt->text)) {
@@ -394,9 +429,9 @@ bool menu_is_visible(struct menu *menu)
 const char *menu_get_prompt(struct menu *menu)
 {
        if (menu->prompt)
-               return _(menu->prompt->text);
+               return menu->prompt->text;
        else if (menu->sym)
-               return _(menu->sym->name);
+               return menu->sym->name;
        return NULL;
 }
 
index b9bb32dfd62866527a8a414164e82c92721eea83..5d0fd38b089b937614dee19073ec9a2657b26d22 100644 (file)
@@ -114,7 +114,7 @@ void ConfigItem::updateMenu(void)
 
        sym = menu->sym;
        prop = menu->prompt;
-       prompt = QString::fromLocal8Bit(menu_get_prompt(menu));
+       prompt = _(menu_get_prompt(menu));
 
        if (prop) switch (prop->type) {
        case P_MENU:
@@ -208,7 +208,7 @@ void ConfigItem::updateMenu(void)
                break;
        }
        if (!sym_has_value(sym) && visible)
-               prompt += " (NEW)";
+               prompt += _(" (NEW)");
 set_prompt:
        setText(promptColIdx, prompt);
 }
@@ -346,7 +346,7 @@ ConfigList::ConfigList(ConfigView* p, const char *name)
 
        for (i = 0; i < colNr; i++)
                colMap[i] = colRevMap[i] = -1;
-       addColumn(promptColIdx, "Option");
+       addColumn(promptColIdx, _("Option"));
 
        reinit();
 }
@@ -360,14 +360,14 @@ void ConfigList::reinit(void)
        removeColumn(nameColIdx);
 
        if (showName)
-               addColumn(nameColIdx, "Name");
+               addColumn(nameColIdx, _("Name"));
        if (showRange) {
                addColumn(noColIdx, "N");
                addColumn(modColIdx, "M");
                addColumn(yesColIdx, "Y");
        }
        if (showData)
-               addColumn(dataColIdx, "Value");
+               addColumn(dataColIdx, _("Value"));
 
        updateListAll();
 }
@@ -803,7 +803,7 @@ void ConfigList::contextMenuEvent(QContextMenuEvent *e)
                        QAction *action;
 
                        headerPopup = new QPopupMenu(this);
-                       action = new QAction(NULL, "Show Name", 0, this);
+                       action = new QAction(NULL, _("Show Name"), 0, this);
                          action->setToggleAction(TRUE);
                          connect(action, SIGNAL(toggled(bool)),
                                  parent(), SLOT(setShowName(bool)));
@@ -811,7 +811,7 @@ void ConfigList::contextMenuEvent(QContextMenuEvent *e)
                                  action, SLOT(setOn(bool)));
                          action->setOn(showName);
                          action->addTo(headerPopup);
-                       action = new QAction(NULL, "Show Range", 0, this);
+                       action = new QAction(NULL, _("Show Range"), 0, this);
                          action->setToggleAction(TRUE);
                          connect(action, SIGNAL(toggled(bool)),
                                  parent(), SLOT(setShowRange(bool)));
@@ -819,7 +819,7 @@ void ConfigList::contextMenuEvent(QContextMenuEvent *e)
                                  action, SLOT(setOn(bool)));
                          action->setOn(showRange);
                          action->addTo(headerPopup);
-                       action = new QAction(NULL, "Show Data", 0, this);
+                       action = new QAction(NULL, _("Show Data"), 0, this);
                          action->setToggleAction(TRUE);
                          connect(action, SIGNAL(toggled(bool)),
                                  parent(), SLOT(setShowData(bool)));
@@ -1041,7 +1041,12 @@ void ConfigInfoView::menuInfo(void)
                if (showDebug())
                        debug = debug_info(sym);
 
-               help = print_filter(_(menu_get_help(menu)));
+               help = menu_get_help(menu);
+               /* Gettextize if the help text not empty */
+               if (help.isEmpty())
+                       help = print_filter(menu_get_help(menu));
+               else
+                       help = print_filter(_(menu_get_help(menu)));
        } else if (menu->prompt) {
                head += "<big><b>";
                head += print_filter(_(menu->prompt->text));
@@ -1083,7 +1088,11 @@ QString ConfigInfoView::debug_info(struct symbol *sym)
                        debug += "</a><br>";
                        break;
                case P_DEFAULT:
-                       debug += "default: ";
+               case P_SELECT:
+               case P_RANGE:
+               case P_ENV:
+                       debug += prop_get_type_name(prop->type);
+                       debug += ": ";
                        expr_print(prop->expr, expr_print_help, &debug, E_NONE);
                        debug += "<br>";
                        break;
@@ -1094,16 +1103,6 @@ QString ConfigInfoView::debug_info(struct symbol *sym)
                                debug += "<br>";
                        }
                        break;
-               case P_SELECT:
-                       debug += "select: ";
-                       expr_print(prop->expr, expr_print_help, &debug, E_NONE);
-                       debug += "<br>";
-                       break;
-               case P_RANGE:
-                       debug += "range: ";
-                       expr_print(prop->expr, expr_print_help, &debug, E_NONE);
-                       debug += "<br>";
-                       break;
                default:
                        debug += "unknown property: ";
                        debug += prop_get_type_name(prop->type);
@@ -1167,7 +1166,7 @@ void ConfigInfoView::expr_print_help(void *data, struct symbol *sym, const char
 QPopupMenu* ConfigInfoView::createPopupMenu(const QPoint& pos)
 {
        QPopupMenu* popup = Parent::createPopupMenu(pos);
-       QAction* action = new QAction(NULL,"Show Debug Info", 0, popup);
+       QAction* action = new QAction(NULL, _("Show Debug Info"), 0, popup);
          action->setToggleAction(TRUE);
          connect(action, SIGNAL(toggled(bool)), SLOT(setShowDebug(bool)));
          connect(this, SIGNAL(showDebugChanged(bool)), action, SLOT(setOn(bool)));
@@ -1189,11 +1188,11 @@ ConfigSearchWindow::ConfigSearchWindow(ConfigMainWindow* parent, const char *nam
 
        QVBoxLayout* layout1 = new QVBoxLayout(this, 11, 6);
        QHBoxLayout* layout2 = new QHBoxLayout(0, 0, 6);
-       layout2->addWidget(new QLabel("Find:", this));
+       layout2->addWidget(new QLabel(_("Find:"), this));
        editField = new QLineEdit(this);
        connect(editField, SIGNAL(returnPressed()), SLOT(search()));
        layout2->addWidget(editField);
-       searchButton = new QPushButton("Search", this);
+       searchButton = new QPushButton(_("Search"), this);
        searchButton->setAutoDefault(FALSE);
        connect(searchButton, SIGNAL(clicked()), SLOT(search()));
        layout2->addWidget(searchButton);
@@ -1313,58 +1312,58 @@ ConfigMainWindow::ConfigMainWindow(void)
        menu = menuBar();
        toolBar = new QToolBar("Tools", this);
 
-       backAction = new QAction("Back", QPixmap(xpm_back), "Back", 0, this);
+       backAction = new QAction("Back", QPixmap(xpm_back), _("Back"), 0, this);
          connect(backAction, SIGNAL(activated()), SLOT(goBack()));
          backAction->setEnabled(FALSE);
-       QAction *quitAction = new QAction("Quit", "&Quit", CTRL+Key_Q, this);
+       QAction *quitAction = new QAction("Quit", _("&Quit"), CTRL+Key_Q, this);
          connect(quitAction, SIGNAL(activated()), SLOT(close()));
-       QAction *loadAction = new QAction("Load", QPixmap(xpm_load), "&Load", CTRL+Key_L, this);
+       QAction *loadAction = new QAction("Load", QPixmap(xpm_load), _("&Load"), CTRL+Key_L, this);
          connect(loadAction, SIGNAL(activated()), SLOT(loadConfig()));
-       saveAction = new QAction("Save", QPixmap(xpm_save), "&Save", CTRL+Key_S, this);
+       saveAction = new QAction("Save", QPixmap(xpm_save), _("&Save"), CTRL+Key_S, this);
          connect(saveAction, SIGNAL(activated()), SLOT(saveConfig()));
        conf_set_changed_callback(conf_changed);
        // Set saveAction's initial state
        conf_changed();
-       QAction *saveAsAction = new QAction("Save As...", "Save &As...", 0, this);
+       QAction *saveAsAction = new QAction("Save As...", _("Save &As..."), 0, this);
          connect(saveAsAction, SIGNAL(activated()), SLOT(saveConfigAs()));
-       QAction *searchAction = new QAction("Find", "&Find", CTRL+Key_F, this);
+       QAction *searchAction = new QAction("Find", _("&Find"), CTRL+Key_F, this);
          connect(searchAction, SIGNAL(activated()), SLOT(searchConfig()));
-       QAction *singleViewAction = new QAction("Single View", QPixmap(xpm_single_view), "Split View", 0, this);
+       QAction *singleViewAction = new QAction("Single View", QPixmap(xpm_single_view), _("Single View"), 0, this);
          connect(singleViewAction, SIGNAL(activated()), SLOT(showSingleView()));
-       QAction *splitViewAction = new QAction("Split View", QPixmap(xpm_split_view), "Split View", 0, this);
+       QAction *splitViewAction = new QAction("Split View", QPixmap(xpm_split_view), _("Split View"), 0, this);
          connect(splitViewAction, SIGNAL(activated()), SLOT(showSplitView()));
-       QAction *fullViewAction = new QAction("Full View", QPixmap(xpm_tree_view), "Full View", 0, this);
+       QAction *fullViewAction = new QAction("Full View", QPixmap(xpm_tree_view), _("Full View"), 0, this);
          connect(fullViewAction, SIGNAL(activated()), SLOT(showFullView()));
 
-       QAction *showNameAction = new QAction(NULL, "Show Name", 0, this);
+       QAction *showNameAction = new QAction(NULL, _("Show Name"), 0, this);
          showNameAction->setToggleAction(TRUE);
          connect(showNameAction, SIGNAL(toggled(bool)), configView, SLOT(setShowName(bool)));
          connect(configView, SIGNAL(showNameChanged(bool)), showNameAction, SLOT(setOn(bool)));
          showNameAction->setOn(configView->showName());
-       QAction *showRangeAction = new QAction(NULL, "Show Range", 0, this);
+       QAction *showRangeAction = new QAction(NULL, _("Show Range"), 0, this);
          showRangeAction->setToggleAction(TRUE);
          connect(showRangeAction, SIGNAL(toggled(bool)), configView, SLOT(setShowRange(bool)));
          connect(configView, SIGNAL(showRangeChanged(bool)), showRangeAction, SLOT(setOn(bool)));
          showRangeAction->setOn(configList->showRange);
-       QAction *showDataAction = new QAction(NULL, "Show Data", 0, this);
+       QAction *showDataAction = new QAction(NULL, _("Show Data"), 0, this);
          showDataAction->setToggleAction(TRUE);
          connect(showDataAction, SIGNAL(toggled(bool)), configView, SLOT(setShowData(bool)));
          connect(configView, SIGNAL(showDataChanged(bool)), showDataAction, SLOT(setOn(bool)));
          showDataAction->setOn(configList->showData);
-       QAction *showAllAction = new QAction(NULL, "Show All Options", 0, this);
+       QAction *showAllAction = new QAction(NULL, _("Show All Options"), 0, this);
          showAllAction->setToggleAction(TRUE);
          connect(showAllAction, SIGNAL(toggled(bool)), configView, SLOT(setShowAll(bool)));
          connect(showAllAction, SIGNAL(toggled(bool)), menuView, SLOT(setShowAll(bool)));
          showAllAction->setOn(configList->showAll);
-       QAction *showDebugAction = new QAction(NULL, "Show Debug Info", 0, this);
+       QAction *showDebugAction = new QAction(NULL, _("Show Debug Info"), 0, this);
          showDebugAction->setToggleAction(TRUE);
          connect(showDebugAction, SIGNAL(toggled(bool)), helpText, SLOT(setShowDebug(bool)));
          connect(helpText, SIGNAL(showDebugChanged(bool)), showDebugAction, SLOT(setOn(bool)));
          showDebugAction->setOn(helpText->showDebug());
 
-       QAction *showIntroAction = new QAction(NULL, "Introduction", 0, this);
+       QAction *showIntroAction = new QAction(NULL, _("Introduction"), 0, this);
          connect(showIntroAction, SIGNAL(activated()), SLOT(showIntro()));
-       QAction *showAboutAction = new QAction(NULL, "About", 0, this);
+       QAction *showAboutAction = new QAction(NULL, _("About"), 0, this);
          connect(showAboutAction, SIGNAL(activated()), SLOT(showAbout()));
 
        // init tool bar
@@ -1379,7 +1378,7 @@ ConfigMainWindow::ConfigMainWindow(void)
 
        // create config menu
        QPopupMenu* config = new QPopupMenu(this);
-       menu->insertItem("&File", config);
+       menu->insertItem(_("&File"), config);
        loadAction->addTo(config);
        saveAction->addTo(config);
        saveAsAction->addTo(config);
@@ -1388,12 +1387,12 @@ ConfigMainWindow::ConfigMainWindow(void)
 
        // create edit menu
        QPopupMenu* editMenu = new QPopupMenu(this);
-       menu->insertItem("&Edit", editMenu);
+       menu->insertItem(_("&Edit"), editMenu);
        searchAction->addTo(editMenu);
 
        // create options menu
        QPopupMenu* optionMenu = new QPopupMenu(this);
-       menu->insertItem("&Option", optionMenu);
+       menu->insertItem(_("&Option"), optionMenu);
        showNameAction->addTo(optionMenu);
        showRangeAction->addTo(optionMenu);
        showDataAction->addTo(optionMenu);
@@ -1404,7 +1403,7 @@ ConfigMainWindow::ConfigMainWindow(void)
        // create help menu
        QPopupMenu* helpMenu = new QPopupMenu(this);
        menu->insertSeparator();
-       menu->insertItem("&Help", helpMenu);
+       menu->insertItem(_("&Help"), helpMenu);
        showIntroAction->addTo(helpMenu);
        showAboutAction->addTo(helpMenu);
 
@@ -1452,14 +1451,14 @@ void ConfigMainWindow::loadConfig(void)
        if (s.isNull())
                return;
        if (conf_read(QFile::encodeName(s)))
-               QMessageBox::information(this, "qconf", "Unable to load configuration!");
+               QMessageBox::information(this, "qconf", _("Unable to load configuration!"));
        ConfigView::updateListAll();
 }
 
 void ConfigMainWindow::saveConfig(void)
 {
        if (conf_write(NULL))
-               QMessageBox::information(this, "qconf", "Unable to save configuration!");
+               QMessageBox::information(this, "qconf", _("Unable to save configuration!"));
 }
 
 void ConfigMainWindow::saveConfigAs(void)
@@ -1468,7 +1467,7 @@ void ConfigMainWindow::saveConfigAs(void)
        if (s.isNull())
                return;
        if (conf_write(QFile::encodeName(s)))
-               QMessageBox::information(this, "qconf", "Unable to save configuration!");
+               QMessageBox::information(this, "qconf", _("Unable to save configuration!"));
 }
 
 void ConfigMainWindow::searchConfig(void)
@@ -1612,11 +1611,11 @@ void ConfigMainWindow::closeEvent(QCloseEvent* e)
                e->accept();
                return;
        }
-       QMessageBox mb("qconf", "Save configuration?", QMessageBox::Warning,
+       QMessageBox mb("qconf", _("Save configuration?"), QMessageBox::Warning,
                        QMessageBox::Yes | QMessageBox::Default, QMessageBox::No, QMessageBox::Cancel | QMessageBox::Escape);
-       mb.setButtonText(QMessageBox::Yes, "&Save Changes");
-       mb.setButtonText(QMessageBox::No, "&Discard Changes");
-       mb.setButtonText(QMessageBox::Cancel, "Cancel Exit");
+       mb.setButtonText(QMessageBox::Yes, _("&Save Changes"));
+       mb.setButtonText(QMessageBox::No, _("&Discard Changes"));
+       mb.setButtonText(QMessageBox::Cancel, _("Cancel Exit"));
        switch (mb.exec()) {
        case QMessageBox::Yes:
                conf_write(NULL);
@@ -1631,7 +1630,7 @@ void ConfigMainWindow::closeEvent(QCloseEvent* e)
 
 void ConfigMainWindow::showIntro(void)
 {
-       static char str[] = "Welcome to the qconf graphical kernel configuration tool for Linux.\n\n"
+       static const QString str = _("Welcome to the qconf graphical kernel configuration tool for Linux.\n\n"
                "For each option, a blank box indicates the feature is disabled, a check\n"
                "indicates it is enabled, and a dot indicates that it is to be compiled\n"
                "as a module.  Clicking on the box will cycle through the three states.\n\n"
@@ -1641,15 +1640,15 @@ void ConfigMainWindow::showIntro(void)
                "options must be enabled to support the option you are interested in, you can\n"
                "still view the help of a grayed-out option.\n\n"
                "Toggling Show Debug Info under the Options menu will show the dependencies,\n"
-               "which you can then match by examining other options.\n\n";
+               "which you can then match by examining other options.\n\n");
 
        QMessageBox::information(this, "qconf", str);
 }
 
 void ConfigMainWindow::showAbout(void)
 {
-       static char str[] = "qconf is Copyright (C) 2002 Roman Zippel <zippel@linux-m68k.org>.\n\n"
-               "Bug reports and feature request can also be entered at http://bugzilla.kernel.org/\n";
+       static const QString str = _("qconf is Copyright (C) 2002 Roman Zippel <zippel@linux-m68k.org>.\n\n"
+               "Bug reports and feature request can also be entered at http://bugzilla.kernel.org/\n");
 
        QMessageBox::information(this, "qconf", str);
 }
@@ -1707,7 +1706,7 @@ static const char *progname;
 
 static void usage(void)
 {
-       printf("%s <config>\n", progname);
+       printf(_("%s <config>\n"), progname);
        exit(0);
 }
 
index c35dcc5d61894ba51d7e265484a5dc9296ae57a4..3929e5b35e79db7dc0d03083aabaa988e10e1992 100644 (file)
@@ -34,6 +34,8 @@ struct symbol *sym_defconfig_list;
 struct symbol *modules_sym;
 tristate modules_val;
 
+struct expr *sym_env_list;
+
 void sym_add_default(struct symbol *sym, const char *def)
 {
        struct property *prop = prop_alloc(P_DEFAULT, sym);
@@ -45,7 +47,6 @@ void sym_init(void)
 {
        struct symbol *sym;
        struct utsname uts;
-       char *p;
        static bool inited = false;
 
        if (inited)
@@ -54,20 +55,6 @@ void sym_init(void)
 
        uname(&uts);
 
-       sym = sym_lookup("ARCH", 0);
-       sym->type = S_STRING;
-       sym->flags |= SYMBOL_AUTO;
-       p = getenv("ARCH");
-       if (p)
-               sym_add_default(sym, p);
-
-       sym = sym_lookup("KERNELVERSION", 0);
-       sym->type = S_STRING;
-       sym->flags |= SYMBOL_AUTO;
-       p = getenv("KERNELVERSION");
-       if (p)
-               sym_add_default(sym, p);
-
        sym = sym_lookup("UNAME_RELEASE", 0);
        sym->type = S_STRING;
        sym->flags |= SYMBOL_AUTO;
@@ -117,6 +104,15 @@ struct property *sym_get_choice_prop(struct symbol *sym)
        return NULL;
 }
 
+struct property *sym_get_env_prop(struct symbol *sym)
+{
+       struct property *prop;
+
+       for_all_properties(sym, prop, P_ENV)
+               return prop;
+       return NULL;
+}
+
 struct property *sym_get_default_prop(struct symbol *sym)
 {
        struct property *prop;
@@ -199,7 +195,7 @@ static void sym_calc_visibility(struct symbol *sym)
        tri = no;
        for_all_prompts(sym, prop) {
                prop->visible.tri = expr_calc_value(prop->visible.expr);
-               tri = E_OR(tri, prop->visible.tri);
+               tri = EXPR_OR(tri, prop->visible.tri);
        }
        if (tri == mod && (sym->type != S_TRISTATE || modules_val == no))
                tri = yes;
@@ -247,8 +243,7 @@ static struct symbol *sym_calc_choice(struct symbol *sym)
 
        /* just get the first visible value */
        prop = sym_get_choice_prop(sym);
-       for (e = prop->expr; e; e = e->left.expr) {
-               def_sym = e->right.sym;
+       expr_list_for_each_sym(prop->expr, e, def_sym) {
                sym_calc_visibility(def_sym);
                if (def_sym->visible != no)
                        return def_sym;
@@ -303,7 +298,7 @@ void sym_calc_value(struct symbol *sym)
                if (sym_is_choice_value(sym) && sym->visible == yes) {
                        prop = sym_get_choice_prop(sym);
                        newval.tri = (prop_get_symbol(prop)->curr.val == sym) ? yes : no;
-               } else if (E_OR(sym->visible, sym->rev_dep.tri) != no) {
+               } else if (EXPR_OR(sym->visible, sym->rev_dep.tri) != no) {
                        sym->flags |= SYMBOL_WRITE;
                        if (sym_has_value(sym))
                                newval.tri = sym->def[S_DEF_USER].tri;
@@ -312,7 +307,7 @@ void sym_calc_value(struct symbol *sym)
                                if (prop)
                                        newval.tri = expr_calc_value(prop->expr);
                        }
-                       newval.tri = E_OR(E_AND(newval.tri, sym->visible), sym->rev_dep.tri);
+                       newval.tri = EXPR_OR(EXPR_AND(newval.tri, sym->visible), sym->rev_dep.tri);
                } else if (!sym_is_choice(sym)) {
                        prop = sym_get_default_prop(sym);
                        if (prop) {
@@ -347,6 +342,9 @@ void sym_calc_value(struct symbol *sym)
                ;
        }
 
+       if (sym->flags & SYMBOL_AUTO)
+               sym->flags &= ~SYMBOL_WRITE;
+
        sym->curr = newval;
        if (sym_is_choice(sym) && newval.tri == yes)
                sym->curr.val = sym_calc_choice(sym);
@@ -361,12 +359,14 @@ void sym_calc_value(struct symbol *sym)
        }
 
        if (sym_is_choice(sym)) {
+               struct symbol *choice_sym;
                int flags = sym->flags & (SYMBOL_CHANGED | SYMBOL_WRITE);
+
                prop = sym_get_choice_prop(sym);
-               for (e = prop->expr; e; e = e->left.expr) {
-                       e->right.sym->flags |= flags;
+               expr_list_for_each_sym(prop->expr, e, choice_sym) {
+                       choice_sym->flags |= flags;
                        if (flags & SYMBOL_CHANGED)
-                               sym_set_changed(e->right.sym);
+                               sym_set_changed(choice_sym);
                }
        }
 }
@@ -849,7 +849,7 @@ struct property *prop_alloc(enum prop_type type, struct symbol *sym)
 struct symbol *prop_get_symbol(struct property *prop)
 {
        if (prop->expr && (prop->expr->type == E_SYMBOL ||
-                          prop->expr->type == E_CHOICE))
+                          prop->expr->type == E_LIST))
                return prop->expr->left.sym;
        return NULL;
 }
@@ -859,6 +859,8 @@ const char *prop_get_type_name(enum prop_type type)
        switch (type) {
        case P_PROMPT:
                return "prompt";
+       case P_ENV:
+               return "env";
        case P_COMMENT:
                return "comment";
        case P_MENU:
@@ -876,3 +878,32 @@ const char *prop_get_type_name(enum prop_type type)
        }
        return "unknown";
 }
+
+void prop_add_env(const char *env)
+{
+       struct symbol *sym, *sym2;
+       struct property *prop;
+       char *p;
+
+       sym = current_entry->sym;
+       sym->flags |= SYMBOL_AUTO;
+       for_all_properties(sym, prop, P_ENV) {
+               sym2 = prop_get_symbol(prop);
+               if (strcmp(sym2->name, env))
+                       menu_warn(current_entry, "redefining environment symbol from %s",
+                                 sym2->name);
+               return;
+       }
+
+       prop = prop_alloc(P_ENV, sym);
+       prop->expr = expr_alloc_symbol(sym_lookup(env, 1));
+
+       sym_env_list = expr_alloc_one(E_LIST, sym_env_list);
+       sym_env_list->right.sym = sym;
+
+       p = getenv(env);
+       if (p)
+               sym_add_default(sym, p);
+       else
+               menu_warn(current_entry, "environment variable %s undefined", env);
+}
index e1cad924c0a41ab6fa1e8f917c0c01198419b549..f8e73c039dc880984f96aa8cadd4e43edaa6baba 100644 (file)
@@ -29,6 +29,8 @@ struct file *file_lookup(const char *name)
 /* write a dependency file as used by kbuild to track dependencies */
 int file_write_dep(const char *name)
 {
+       struct symbol *sym, *env_sym;
+       struct expr *e;
        struct file *file;
        FILE *out;
 
@@ -45,8 +47,25 @@ int file_write_dep(const char *name)
                        fprintf(out, "\t%s\n", file->name);
        }
        fprintf(out, "\ninclude/config/auto.conf: \\\n"
-                    "\t$(deps_config)\n\n"
-                    "$(deps_config): ;\n");
+                    "\t$(deps_config)\n\n");
+
+       expr_list_for_each_sym(sym_env_list, e, sym) {
+               struct property *prop;
+               const char *value;
+
+               prop = sym_get_env_prop(sym);
+               env_sym = prop_get_symbol(prop);
+               if (!env_sym)
+                       continue;
+               value = getenv(env_sym->name);
+               if (!value)
+                       value = "";
+               fprintf(out, "ifneq \"$(%s)\" \"%s\"\n", env_sym->name, value);
+               fprintf(out, "include/config/auto.conf: FORCE\n");
+               fprintf(out, "endif\n");
+       }
+
+       fprintf(out, "\n$(deps_config): ;\n");
        fclose(out);
        rename("..config.tmp", name);
        return 0;
index 93538e567bda8e31943b857a3b50e567fa5c691a..25ef5d01c0afa961a4b5c57c0154d11b83f69e7f 100644 (file)
@@ -35,10 +35,10 @@ int,                T_TYPE,         TF_COMMAND, S_INT
 hex,           T_TYPE,         TF_COMMAND, S_HEX
 string,                T_TYPE,         TF_COMMAND, S_STRING
 select,                T_SELECT,       TF_COMMAND
-enable,                T_SELECT,       TF_COMMAND
 range,         T_RANGE,        TF_COMMAND
 option,                T_OPTION,       TF_COMMAND
 on,            T_ON,           TF_PARAM
 modules,       T_OPT_MODULES,  TF_OPTION
 defconfig_list,        T_OPT_DEFCONFIG_LIST,TF_OPTION
+env,           T_OPT_ENV,      TF_OPTION
 %%
index ab28b18153a7a6bb4fcb7d7b0c24b919610c94d3..5c73d51339d81b30e88cdd1010d65f273f2e62e1 100644 (file)
@@ -1,4 +1,4 @@
-/* ANSI-C code produced by gperf version 3.0.2 */
+/* ANSI-C code produced by gperf version 3.0.3 */
 /* Command-line: gperf  */
 /* Computed positions: -k'1,3' */
 
@@ -53,9 +53,9 @@ kconf_id_hash (register const char *str, register unsigned int len)
       49, 49, 49, 49, 49, 49, 49, 49, 49, 49,
       49, 49, 49, 49, 49, 49, 49, 49, 49, 49,
       49, 49, 49, 49, 49, 49, 49, 49, 49, 49,
-      49, 49, 49, 49, 49, 49, 49, 18, 11,  5,
+      49, 49, 49, 49, 49, 49, 49, 49, 11,  5,
        0,  0,  5, 49,  5, 20, 49, 49,  5, 20,
-       5,  0, 30, 49,  0, 15,  0, 10, 49, 49,
+       5,  0, 30, 49,  0, 15,  0, 10,  0, 49,
       25, 49, 49, 49, 49, 49, 49, 49, 49, 49,
       49, 49, 49, 49, 49, 49, 49, 49, 49, 49,
       49, 49, 49, 49, 49, 49, 49, 49, 49, 49,
@@ -89,6 +89,7 @@ kconf_id_hash (register const char *str, register unsigned int len)
 struct kconf_id_strings_t
   {
     char kconf_id_strings_str2[sizeof("on")];
+    char kconf_id_strings_str3[sizeof("env")];
     char kconf_id_strings_str5[sizeof("endif")];
     char kconf_id_strings_str6[sizeof("option")];
     char kconf_id_strings_str7[sizeof("endmenu")];
@@ -107,7 +108,6 @@ struct kconf_id_strings_t
     char kconf_id_strings_str21[sizeof("string")];
     char kconf_id_strings_str22[sizeof("if")];
     char kconf_id_strings_str23[sizeof("int")];
-    char kconf_id_strings_str24[sizeof("enable")];
     char kconf_id_strings_str26[sizeof("select")];
     char kconf_id_strings_str27[sizeof("modules")];
     char kconf_id_strings_str28[sizeof("tristate")];
@@ -123,6 +123,7 @@ struct kconf_id_strings_t
 static struct kconf_id_strings_t kconf_id_strings_contents =
   {
     "on",
+    "env",
     "endif",
     "option",
     "endmenu",
@@ -141,7 +142,6 @@ static struct kconf_id_strings_t kconf_id_strings_contents =
     "string",
     "if",
     "int",
-    "enable",
     "select",
     "modules",
     "tristate",
@@ -157,6 +157,9 @@ static struct kconf_id_strings_t kconf_id_strings_contents =
 #define kconf_id_strings ((const char *) &kconf_id_strings_contents)
 #ifdef __GNUC__
 __inline
+#ifdef __GNUC_STDC_INLINE__
+__attribute__ ((__gnu_inline__))
+#endif
 #endif
 struct kconf_id *
 kconf_id_lookup (register const char *str, register unsigned int len)
@@ -174,7 +177,8 @@ kconf_id_lookup (register const char *str, register unsigned int len)
     {
       {-1}, {-1},
       {(int)(long)&((struct kconf_id_strings_t *)0)->kconf_id_strings_str2,            T_ON,           TF_PARAM},
-      {-1}, {-1},
+      {(int)(long)&((struct kconf_id_strings_t *)0)->kconf_id_strings_str3,            T_OPT_ENV,      TF_OPTION},
+      {-1},
       {(int)(long)&((struct kconf_id_strings_t *)0)->kconf_id_strings_str5,            T_ENDIF,        TF_COMMAND},
       {(int)(long)&((struct kconf_id_strings_t *)0)->kconf_id_strings_str6,            T_OPTION,       TF_COMMAND},
       {(int)(long)&((struct kconf_id_strings_t *)0)->kconf_id_strings_str7,    T_ENDMENU,      TF_COMMAND},
@@ -194,8 +198,7 @@ kconf_id_lookup (register const char *str, register unsigned int len)
       {(int)(long)&((struct kconf_id_strings_t *)0)->kconf_id_strings_str21,           T_TYPE,         TF_COMMAND, S_STRING},
       {(int)(long)&((struct kconf_id_strings_t *)0)->kconf_id_strings_str22,           T_IF,           TF_COMMAND|TF_PARAM},
       {(int)(long)&((struct kconf_id_strings_t *)0)->kconf_id_strings_str23,           T_TYPE,         TF_COMMAND, S_INT},
-      {(int)(long)&((struct kconf_id_strings_t *)0)->kconf_id_strings_str24,           T_SELECT,       TF_COMMAND},
-      {-1},
+      {-1}, {-1},
       {(int)(long)&((struct kconf_id_strings_t *)0)->kconf_id_strings_str26,           T_SELECT,       TF_COMMAND},
       {(int)(long)&((struct kconf_id_strings_t *)0)->kconf_id_strings_str27,   T_OPT_MODULES,  TF_OPTION},
       {(int)(long)&((struct kconf_id_strings_t *)0)->kconf_id_strings_str28,   T_TYPE,         TF_COMMAND, S_TRISTATE},
index 187d38ccadd57f489b89cbbf16798cdf5f08ef49..4cea5c85cd0a9297f029d58cf97c4771825f4641 100644 (file)
@@ -217,6 +217,11 @@ n  [A-Za-z0-9_]
                append_string("\n", 1);
        }
        [^ \t\n].* {
+               while (yyleng) {
+                       if ((yytext[yyleng-1] != ' ') && (yytext[yyleng-1] != '\t'))
+                               break;
+                       yyleng--;
+               }
                append_string(yytext, yyleng);
                if (!first_ts)
                        first_ts = last_ts;
index 1d1401807e95809a43e2b72c84ee17d4be2854e8..ec54f12f57b04c10b6f24f7aa561afccb3fba0b7 100755 (executable)
@@ -46,21 +46,24 @@ use strict;
 # Note: This only supports 'c'.
 
 # usage:
-# kernel-doc [ -docbook | -html | -text | -man ]
+# kernel-doc [ -docbook | -html | -text | -man ] [ -no-doc-sections ]
 #           [ -function funcname [ -function funcname ...] ] c file(s)s > outputfile
 # or
 #           [ -nofunction funcname [ -function funcname ...] ] c file(s)s > outputfile
 #
 #  Set output format using one of -docbook -html -text or -man.  Default is man.
 #
+#  -no-doc-sections
+#      Do not output DOC: sections
+#
 #  -function funcname
-#      If set, then only generate documentation for the given function(s).  All
-#      other functions are ignored.
+#      If set, then only generate documentation for the given function(s) or
+#      DOC: section titles.  All other functions and DOC: sections are ignored.
 #
 #  -nofunction funcname
-#      If set, then only generate documentation for the other function(s).
-#      Cannot be used together with -function
-#      (yes, that's a bug -- perl hackers can fix it 8))
+#      If set, then only generate documentation for the other function(s)/DOC:
+#      sections. Cannot be used together with -function (yes, that's a bug --
+#      perl hackers can fix it 8))
 #
 #  c files - list of 'c' files to process
 #
@@ -182,10 +185,10 @@ my $blankline_html = $local_lt . "p" . $local_gt; # was "<p>"
 my %highlights_xml = ( "([^=])\\\"([^\\\"<]+)\\\"", "\$1<quote>\$2</quote>",
                        $type_constant, "<constant>\$1</constant>",
                        $type_func, "<function>\$1</function>",
-                       $type_struct, "<structname>\$1</structname>",
+                       $type_struct_xml, "<structname>\$1</structname>",
                        $type_env, "<envar>\$1</envar>",
                        $type_param, "<parameter>\$1</parameter>" );
-my $blankline_xml = "</para><para>\n";
+my $blankline_xml = $local_lt . "/para" . $local_gt . $local_lt . "para" . $local_gt . "\n";
 
 # gnome, docbook format
 my %highlights_gnome = ( $type_constant, "<replaceable class=\"option\">\$1</replaceable>",
@@ -211,7 +214,7 @@ my $blankline_text = "";
 
 
 sub usage {
-    print "Usage: $0 [ -v ] [ -docbook | -html | -text | -man ]\n";
+    print "Usage: $0 [ -v ] [ -docbook | -html | -text | -man ] [ -no-doc-sections ]\n";
     print "         [ -function funcname [ -function funcname ...] ]\n";
     print "         [ -nofunction funcname [ -nofunction funcname ...] ]\n";
     print "         c source file(s) > outputfile\n";
@@ -225,6 +228,7 @@ if ($#ARGV==-1) {
 
 my $verbose = 0;
 my $output_mode = "man";
+my $no_doc_sections = 0;
 my %highlights = %highlights_man;
 my $blankline = $blankline_man;
 my $modulename = "Kernel API";
@@ -329,12 +333,14 @@ while ($ARGV[0] =~ m/^-(.*)/) {
        usage();
     } elsif ($cmd eq '-filelist') {
            $filelist = shift @ARGV;
+    } elsif ($cmd eq '-no-doc-sections') {
+           $no_doc_sections = 1;
     }
 }
 
 # get kernel version from env
 sub get_kernel_version() {
-    my $version;
+    my $version = 'unknown kernel version';
 
     if (defined($ENV{'KERNELVERSION'})) {
        $version = $ENV{'KERNELVERSION'};
@@ -373,6 +379,29 @@ sub dump_section {
     }
 }
 
+##
+# dump DOC: section after checking that it should go out
+#
+sub dump_doc_section {
+    my $name = shift;
+    my $contents = join "\n", @_;
+
+    if ($no_doc_sections) {
+        return;
+    }
+
+    if (($function_only == 0) ||
+       ( $function_only == 1 && defined($function_table{$name})) ||
+       ( $function_only == 2 && !defined($function_table{$name})))
+    {
+       dump_section $name, $contents;
+       output_blockhead({'sectionlist' => \@sectionlist,
+                         'sections' => \%sections,
+                         'module' => $modulename,
+                         'content-only' => ($function_only != 0), });
+    }
+}
+
 ##
 # output function
 #
@@ -394,7 +423,7 @@ sub output_highlight {
 #      confess "output_highlight got called with no args?\n";
 #   }
 
-    if ($output_mode eq "html") {
+    if ($output_mode eq "html" || $output_mode eq "xml") {
        $contents = local_unescape($contents);
        # convert data read & converted thru xml_escape() into &xyz; format:
        $contents =~ s/\\\\\\/&/g;
@@ -564,8 +593,8 @@ sub output_function_html(%) {
     print "<hr>\n";
 }
 
-# output intro in html
-sub output_intro_html(%) {
+# output DOC: block header in html
+sub output_blockhead_html(%) {
     my %args = %{$_[0]};
     my ($parameter, $section);
     my $count;
@@ -871,7 +900,7 @@ sub output_typedef_xml(%) {
 }
 
 # output in XML DocBook
-sub output_intro_xml(%) {
+sub output_blockhead_xml(%) {
     my %args = %{$_[0]};
     my ($parameter, $section);
     my $count;
@@ -882,15 +911,23 @@ sub output_intro_xml(%) {
     # print out each section
     $lineprefix="   ";
     foreach $section (@{$args{'sectionlist'}}) {
-       print "<refsect1>\n <title>$section</title>\n <para>\n";
+       if (!$args{'content-only'}) {
+               print "<refsect1>\n <title>$section</title>\n";
+       }
        if ($section =~ m/EXAMPLE/i) {
            print "<example><para>\n";
+       } else {
+           print "<para>\n";
        }
        output_highlight($args{'sections'}{$section});
        if ($section =~ m/EXAMPLE/i) {
            print "</para></example>\n";
+       } else {
+           print "</para>";
+       }
+       if (!$args{'content-only'}) {
+               print "\n</refsect1>\n";
        }
-       print " </para>\n</refsect1>\n";
     }
 
     print "\n\n";
@@ -1137,7 +1174,7 @@ sub output_typedef_man(%) {
     }
 }
 
-sub output_intro_man(%) {
+sub output_blockhead_man(%) {
     my %args = %{$_[0]};
     my ($parameter, $section);
     my $count;
@@ -1294,7 +1331,7 @@ sub output_struct_text(%) {
     output_section_text(@_);
 }
 
-sub output_intro_text(%) {
+sub output_blockhead_text(%) {
     my %args = %{$_[0]};
     my ($parameter, $section);
 
@@ -1325,9 +1362,9 @@ sub output_declaration {
 
 ##
 # generic output function - calls the right one based on current output mode.
-sub output_intro {
+sub output_blockhead {
     no strict 'refs';
-    my $func = "output_intro_".$output_mode;
+    my $func = "output_blockhead_".$output_mode;
     &$func(@_);
     $section_counter++;
 }
@@ -1926,9 +1963,7 @@ sub process_file($) {
        } elsif ($state == 4) {
                # Documentation block
                if (/$doc_block/) {
-                       dump_section($section, xml_escape($contents));
-                       output_intro({'sectionlist' => \@sectionlist,
-                                     'sections' => \%sections });
+                       dump_doc_section($section, xml_escape($contents));
                        $contents = "";
                        $function = "";
                        %constants = ();
@@ -1946,9 +1981,7 @@ sub process_file($) {
                }
                elsif (/$doc_end/)
                {
-                       dump_section($section, xml_escape($contents));
-                       output_intro({'sectionlist' => \@sectionlist,
-                                     'sections' => \%sections });
+                       dump_doc_section($section, xml_escape($contents));
                        $contents = "";
                        $function = "";
                        %constants = ();
index e0f54b9d8feca95bf693ec7cc9ad294bd70ab1df..e65d8b33faa4cae649044fb48d8fc87fa012d574 100644 (file)
@@ -25,8 +25,11 @@ cat << EOF > $2/Makefile
 VERSION = $3
 PATCHLEVEL = $4
 
-KERNELSRC    := $1
-KERNELOUTPUT := $2
+lastword = \$(word \$(words \$(1)),\$(1))
+makedir := \$(dir \$(call lastword,\$(MAKEFILE_LIST)))
+
+MAKEARGS := -C $1
+MAKEARGS += O=\$(if \$(patsubst /%,,\$(makedir)),\$(CURDIR)/)\$(patsubst %/,%,\$(makedir))
 
 MAKEFLAGS += --no-print-directory
 
@@ -35,10 +38,11 @@ MAKEFLAGS += --no-print-directory
 all    := \$(filter-out all Makefile,\$(MAKECMDGOALS))
 
 all:
-       \$(MAKE) -C \$(KERNELSRC) O=\$(KERNELOUTPUT) \$(all)
+       \$(MAKE) \$(MAKEARGS) \$(all)
 
 Makefile:;
 
 \$(all) %/: all
        @:
+
 EOF
index 93ac52adb4980b3d23a0b40acd6800e48f75b349..f8efc93eb7004c9ee0004c979acd617ca7d2e8f8 100644 (file)
@@ -2,7 +2,7 @@
  *
  * Copyright 2003       Kai Germaschewski
  * Copyright 2002-2004  Rusty Russell, IBM Corporation
- * Copyright 2006       Sam Ravnborg
+ * Copyright 2006-2008  Sam Ravnborg
  * Based in part on module-init-tools/depmod.c,file2alias
  *
  * This software may be used and distributed according to the terms
@@ -28,12 +28,17 @@ static int vmlinux_section_warnings = 1;
 /* Only warn about unresolved symbols */
 static int warn_unresolved = 0;
 /* How a symbol is exported */
+static int sec_mismatch_count = 0;
+static int sec_mismatch_verbose = 1;
+
 enum export {
        export_plain,      export_unused,     export_gpl,
        export_unused_gpl, export_gpl_future, export_unknown
 };
 
-void fatal(const char *fmt, ...)
+#define PRINTF __attribute__ ((format (printf, 1, 2)))
+
+PRINTF void fatal(const char *fmt, ...)
 {
        va_list arglist;
 
@@ -46,7 +51,7 @@ void fatal(const char *fmt, ...)
        exit(1);
 }
 
-void warn(const char *fmt, ...)
+PRINTF void warn(const char *fmt, ...)
 {
        va_list arglist;
 
@@ -57,7 +62,7 @@ void warn(const char *fmt, ...)
        va_end(arglist);
 }
 
-void merror(const char *fmt, ...)
+PRINTF void merror(const char *fmt, ...)
 {
        va_list arglist;
 
@@ -72,7 +77,8 @@ static int is_vmlinux(const char *modname)
 {
        const char *myname;
 
-       if ((myname = strrchr(modname, '/')))
+       myname = strrchr(modname, '/');
+       if (myname)
                myname++;
        else
                myname = modname;
@@ -83,14 +89,13 @@ static int is_vmlinux(const char *modname)
 
 void *do_nofail(void *ptr, const char *expr)
 {
-       if (!ptr) {
+       if (!ptr)
                fatal("modpost: Memory allocation failure: %s.\n", expr);
-       }
+
        return ptr;
 }
 
 /* A list of all modules we processed */
-
 static struct module *modules;
 
 static struct module *find_module(char *modname)
@@ -113,7 +118,8 @@ static struct module *new_module(char *modname)
        p = NOFAIL(strdup(modname));
 
        /* strip trailing .o */
-       if ((s = strrchr(p, '.')) != NULL)
+       s = strrchr(p, '.');
+       if (s != NULL)
                if (strcmp(s, ".o") == 0)
                        *s = '\0';
 
@@ -154,7 +160,7 @@ static inline unsigned int tdb_hash(const char *name)
        unsigned   i;   /* Used to cycle through random values. */
 
        /* Set the initial value from the key size. */
-       for (value = 0x238F13AF * strlen(name), i=0; name[i]; i++)
+       for (value = 0x238F13AF * strlen(name), i = 0; name[i]; i++)
                value = (value + (((unsigned char *)name)[i] << (i*5 % 24)));
 
        return (1103515243 * value + 12345);
@@ -198,7 +204,7 @@ static struct symbol *find_symbol(const char *name)
        if (name[0] == '.')
                name++;
 
-       for (s = symbolhash[tdb_hash(name) % SYMBOL_HASH_SIZE]; s; s=s->next) {
+       for (s = symbolhash[tdb_hash(name) % SYMBOL_HASH_SIZE]; s; s = s->next) {
                if (strcmp(s->name, name) == 0)
                        return s;
        }
@@ -223,9 +229,10 @@ static const char *export_str(enum export ex)
        return export_list[ex].str;
 }
 
-static enum export export_no(const char * s)
+static enum export export_no(const char *s)
 {
        int i;
+
        if (!s)
                return export_unknown;
        for (i = 0; export_list[i].export != export_unknown; i++) {
@@ -315,7 +322,7 @@ void *grab_file(const char *filename, unsigned long *size)
   * spaces in the beginning of the line is trimmed away.
   * Return a pointer to a static buffer.
   **/
-charget_next_line(unsigned long *pos, void *file, unsigned long size)
+char *get_next_line(unsigned long *pos, void *file, unsigned long size)
 {
        static char line[4096];
        int skip = 1;
@@ -323,8 +330,7 @@ char* get_next_line(unsigned long *pos, void *file, unsigned long size)
        signed char *p = (signed char *)file + *pos;
        char *s = line;
 
-       for (; *pos < size ; (*pos)++)
-       {
+       for (; *pos < size ; (*pos)++) {
                if (skip && isspace(*p)) {
                        p++;
                        continue;
@@ -386,7 +392,9 @@ static int parse_elf(struct elf_info *info, const char *filename)
 
        /* Check if file offset is correct */
        if (hdr->e_shoff > info->size) {
-               fatal("section header offset=%u in file '%s' is bigger then filesize=%lu\n", hdr->e_shoff, filename, info->size);
+               fatal("section header offset=%lu in file '%s' is bigger than "
+                     "filesize=%lu\n", (unsigned long)hdr->e_shoff,
+                     filename, info->size);
                return 0;
        }
 
@@ -407,7 +415,10 @@ static int parse_elf(struct elf_info *info, const char *filename)
                const char *secname;
 
                if (sechdrs[i].sh_offset > info->size) {
-                       fatal("%s is truncated. sechdrs[i].sh_offset=%u > sizeof(*hrd)=%ul\n", filename, (unsigned int)sechdrs[i].sh_offset, sizeof(*hdr));
+                       fatal("%s is truncated. sechdrs[i].sh_offset=%lu > "
+                             "sizeof(*hrd)=%zu\n", filename,
+                             (unsigned long)sechdrs[i].sh_offset,
+                             sizeof(*hdr));
                        return 0;
                }
                secname = secstrings + sechdrs[i].sh_name;
@@ -434,9 +445,9 @@ static int parse_elf(struct elf_info *info, const char *filename)
                info->strtab       = (void *)hdr +
                                     sechdrs[sechdrs[i].sh_link].sh_offset;
        }
-       if (!info->symtab_start) {
+       if (!info->symtab_start)
                fatal("%s has no symtab?\n", filename);
-       }
+
        /* Fix endianness in symbols */
        for (sym = info->symtab_start; sym < info->symtab_stop; sym++) {
                sym->st_shndx = TO_NATIVE(sym->st_shndx);
@@ -505,11 +516,13 @@ static void handle_modversions(struct module *mod, struct elf_info *info,
 #endif
 
                if (memcmp(symname, MODULE_SYMBOL_PREFIX,
-                          strlen(MODULE_SYMBOL_PREFIX)) == 0)
-                       mod->unres = alloc_symbol(symname +
-                                                 strlen(MODULE_SYMBOL_PREFIX),
-                                                 ELF_ST_BIND(sym->st_info) == STB_WEAK,
-                                                 mod->unres);
+                          strlen(MODULE_SYMBOL_PREFIX)) == 0) {
+                       mod->unres =
+                         alloc_symbol(symname +
+                                      strlen(MODULE_SYMBOL_PREFIX),
+                                      ELF_ST_BIND(sym->st_info) == STB_WEAK,
+                                      mod->unres);
+               }
                break;
        default:
                /* All exported symbols */
@@ -578,69 +591,303 @@ static char *get_modinfo(void *modinfo, unsigned long modinfo_len,
  **/
 static int strrcmp(const char *s, const char *sub)
 {
-        int slen, sublen;
+       int slen, sublen;
 
        if (!s || !sub)
                return 1;
 
        slen = strlen(s);
-        sublen = strlen(sub);
+       sublen = strlen(sub);
 
        if ((slen == 0) || (sublen == 0))
                return 1;
 
-        if (sublen > slen)
-                return 1;
+       if (sublen > slen)
+               return 1;
 
-        return memcmp(s + slen - sublen, sub, sublen);
+       return memcmp(s + slen - sublen, sub, sublen);
 }
 
-/*
- * Functions used only during module init is marked __init and is stored in
- * a .init.text section. Likewise data is marked __initdata and stored in
- * a .init.data section.
- * If this section is one of these sections return 1
- * See include/linux/init.h for the details
+static const char *sym_name(struct elf_info *elf, Elf_Sym *sym)
+{
+       if (sym)
+               return elf->strtab + sym->st_name;
+       else
+               return "";
+}
+
+static const char *sec_name(struct elf_info *elf, int shndx)
+{
+       Elf_Shdr *sechdrs = elf->sechdrs;
+       return (void *)elf->hdr +
+               elf->sechdrs[elf->hdr->e_shstrndx].sh_offset +
+               sechdrs[shndx].sh_name;
+}
+
+static const char *sech_name(struct elf_info *elf, Elf_Shdr *sechdr)
+{
+       return (void *)elf->hdr +
+               elf->sechdrs[elf->hdr->e_shstrndx].sh_offset +
+               sechdr->sh_name;
+}
+
+/* if sym is empty or point to a string
+ * like ".[0-9]+" then return 1.
+ * This is the optional prefix added by ld to some sections
  */
-static int init_section(const char *name)
+static int number_prefix(const char *sym)
 {
-       if (strcmp(name, ".init") == 0)
-               return 1;
-       if (strncmp(name, ".init.", strlen(".init.")) == 0)
+       if (*sym++ == '\0')
                return 1;
+       if (*sym != '.')
+               return 0;
+       do {
+               char c = *sym++;
+               if (c < '0' || c > '9')
+                       return 0;
+       } while (*sym);
+       return 1;
+}
+
+/* The pattern is an array of simple patterns.
+ * "foo" will match an exact string equal to "foo"
+ * "*foo" will match a string that ends with "foo"
+ * "foo*" will match a string that begins with "foo"
+ * "foo$" will match a string equal to "foo" or "foo.1"
+ *   where the '1' can be any number including several digits.
+ *   The $ syntax is for sections where ld append a dot number
+ *   to make section name unique.
+ */
+int match(const char *sym, const char * const pat[])
+{
+       const char *p;
+       while (*pat) {
+               p = *pat++;
+               const char *endp = p + strlen(p) - 1;
+
+               /* "*foo" */
+               if (*p == '*') {
+                       if (strrcmp(sym, p + 1) == 0)
+                               return 1;
+               }
+               /* "foo*" */
+               else if (*endp == '*') {
+                       if (strncmp(sym, p, strlen(p) - 1) == 0)
+                               return 1;
+               }
+               /* "foo$" */
+               else if (*endp == '$') {
+                       if (strncmp(sym, p, strlen(p) - 1) == 0) {
+                               if (number_prefix(sym + strlen(p) - 1))
+                                       return 1;
+                       }
+               }
+               /* no wildcards */
+               else {
+                       if (strcmp(p, sym) == 0)
+                               return 1;
+               }
+       }
+       /* no match */
        return 0;
 }
 
+/* sections that we do not want to do full section mismatch check on */
+static const char *section_white_list[] =
+       { ".debug*", ".stab*", ".note*", ".got*", ".toc*", NULL };
+
 /*
- * Functions used only during module exit is marked __exit and is stored in
- * a .exit.text section. Likewise data is marked __exitdata and stored in
- * a .exit.data section.
- * If this section is one of these sections return 1
- * See include/linux/init.h for the details
- **/
-static int exit_section(const char *name)
+ * Is this section one we do not want to check?
+ * This is often debug sections.
+ * If we are going to check this section then
+ * test if section name ends with a dot and a number.
+ * This is used to find sections where the linker have
+ * appended a dot-number to make the name unique.
+ * The cause of this is often a section specified in assembler
+ * without "ax" / "aw" and the same section used in .c
+ * code where gcc add these.
+ */
+static int check_section(const char *modname, const char *sec)
 {
-       if (strcmp(name, ".exit.text") == 0)
-               return 1;
-       if (strcmp(name, ".exit.data") == 0)
+       const char *e = sec + strlen(sec) - 1;
+       if (match(sec, section_white_list))
                return 1;
-       return 0;
 
+       if (*e && isdigit(*e)) {
+               /* consume all digits */
+               while (*e && e != sec && isdigit(*e))
+                       e--;
+               if (*e == '.') {
+                       warn("%s (%s): unexpected section name.\n"
+                            "The (.[number]+) following section name are "
+                            "ld generated and not expected.\n"
+                            "Did you forget to use \"ax\"/\"aw\" "
+                            "in a .S file?\n"
+                            "Note that for example <linux/init.h> contains\n"
+                            "section definitions for use in .S files.\n\n",
+                            modname, sec);
+               }
+       }
+       return 0;
 }
 
-/*
- * Data sections are named like this:
- * .data | .data.rel | .data.rel.*
- * Return 1 if the specified section is a data section
+
+
+#define ALL_INIT_DATA_SECTIONS \
+       ".init.data$", ".devinit.data$", ".cpuinit.data$", ".meminit.data$"
+#define ALL_EXIT_DATA_SECTIONS \
+       ".exit.data$", ".devexit.data$", ".cpuexit.data$", ".memexit.data$"
+
+#define ALL_INIT_TEXT_SECTIONS \
+       ".init.text$", ".devinit.text$", ".cpuinit.text$", ".meminit.text$"
+#define ALL_EXIT_TEXT_SECTIONS \
+       ".exit.text$", ".devexit.text$", ".cpuexit.text$", ".memexit.text$"
+
+#define ALL_INIT_SECTIONS ALL_INIT_DATA_SECTIONS, ALL_INIT_TEXT_SECTIONS
+#define ALL_EXIT_SECTIONS ALL_EXIT_DATA_SECTIONS, ALL_EXIT_TEXT_SECTIONS
+
+#define DATA_SECTIONS ".data$", ".data.rel$"
+#define TEXT_SECTIONS ".text$"
+
+#define INIT_SECTIONS      ".init.data$", ".init.text$"
+#define DEV_INIT_SECTIONS  ".devinit.data$", ".devinit.text$"
+#define CPU_INIT_SECTIONS  ".cpuinit.data$", ".cpuinit.text$"
+#define MEM_INIT_SECTIONS  ".meminit.data$", ".meminit.text$"
+
+#define EXIT_SECTIONS      ".exit.data$", ".exit.text$"
+#define DEV_EXIT_SECTIONS  ".devexit.data$", ".devexit.text$"
+#define CPU_EXIT_SECTIONS  ".cpuexit.data$", ".cpuexit.text$"
+#define MEM_EXIT_SECTIONS  ".memexit.data$", ".memexit.text$"
+
+/* init data sections */
+static const char *init_data_sections[] = { ALL_INIT_DATA_SECTIONS, NULL };
+
+/* all init sections */
+static const char *init_sections[] = { ALL_INIT_SECTIONS, NULL };
+
+/* All init and exit sections (code + data) */
+static const char *init_exit_sections[] =
+       {ALL_INIT_SECTIONS, ALL_EXIT_SECTIONS, NULL };
+
+/* data section */
+static const char *data_sections[] = { DATA_SECTIONS, NULL };
+
+/* sections that may refer to an init/exit section with no warning */
+static const char *initref_sections[] =
+{
+       ".text.init.refok*",
+       ".exit.text.refok*",
+       ".data.init.refok*",
+       NULL
+};
+
+
+/* symbols in .data that may refer to init/exit sections */
+static const char *symbol_white_list[] =
+{
+       "*driver",
+       "*_template", /* scsi uses *_template a lot */
+       "*_timer",    /* arm uses ops structures named _timer a lot */
+       "*_sht",      /* scsi also used *_sht to some extent */
+       "*_ops",
+       "*_probe",
+       "*_probe_one",
+       "*_console",
+       NULL
+};
+
+static const char *head_sections[] = { ".head.text*", NULL };
+static const char *linker_symbols[] =
+       { "__init_begin", "_sinittext", "_einittext", NULL };
+
+enum mismatch {
+       NO_MISMATCH,
+       TEXT_TO_INIT,
+       DATA_TO_INIT,
+       TEXT_TO_EXIT,
+       DATA_TO_EXIT,
+       XXXINIT_TO_INIT,
+       XXXEXIT_TO_EXIT,
+       INIT_TO_EXIT,
+       EXIT_TO_INIT,
+       EXPORT_TO_INIT_EXIT,
+};
+
+struct sectioncheck {
+       const char *fromsec[20];
+       const char *tosec[20];
+       enum mismatch mismatch;
+};
+
+const struct sectioncheck sectioncheck[] = {
+/* Do not reference init/exit code/data from
+ * normal code and data
  */
-static int data_section(const char *name)
 {
-       if ((strcmp(name, ".data") == 0) ||
-           (strcmp(name, ".data.rel") == 0) ||
-           (strncmp(name, ".data.rel.", strlen(".data.rel.")) == 0))
-               return 1;
-       else
-               return 0;
+       .fromsec = { TEXT_SECTIONS, NULL },
+       .tosec   = { ALL_INIT_SECTIONS, NULL },
+       .mismatch = TEXT_TO_INIT,
+},
+{
+       .fromsec = { DATA_SECTIONS, NULL },
+       .tosec   = { ALL_INIT_SECTIONS, NULL },
+       .mismatch = DATA_TO_INIT,
+},
+{
+       .fromsec = { TEXT_SECTIONS, NULL },
+       .tosec   = { ALL_EXIT_SECTIONS, NULL },
+       .mismatch = TEXT_TO_EXIT,
+},
+{
+       .fromsec = { DATA_SECTIONS, NULL },
+       .tosec   = { ALL_EXIT_SECTIONS, NULL },
+       .mismatch = DATA_TO_EXIT,
+},
+/* Do not reference init code/data from devinit/cpuinit/meminit code/data */
+{
+       .fromsec = { DEV_INIT_SECTIONS, CPU_INIT_SECTIONS, MEM_INIT_SECTIONS, NULL },
+       .tosec   = { INIT_SECTIONS, NULL },
+       .mismatch = XXXINIT_TO_INIT,
+},
+/* Do not reference exit code/data from devexit/cpuexit/memexit code/data */
+{
+       .fromsec = { DEV_EXIT_SECTIONS, CPU_EXIT_SECTIONS, MEM_EXIT_SECTIONS, NULL },
+       .tosec   = { EXIT_SECTIONS, NULL },
+       .mismatch = XXXEXIT_TO_EXIT,
+},
+/* Do not use exit code/data from init code */
+{
+       .fromsec = { ALL_INIT_SECTIONS, NULL },
+       .tosec   = { ALL_EXIT_SECTIONS, NULL },
+       .mismatch = INIT_TO_EXIT,
+},
+/* Do not use init code/data from exit code */
+{
+       .fromsec = { ALL_EXIT_SECTIONS, NULL },
+       .tosec   = { ALL_INIT_SECTIONS, NULL },
+       .mismatch = EXIT_TO_INIT,
+},
+/* Do not export init/exit functions or data */
+{
+       .fromsec = { "__ksymtab*", NULL },
+       .tosec   = { ALL_INIT_SECTIONS, ALL_EXIT_SECTIONS, NULL },
+       .mismatch = EXPORT_TO_INIT_EXIT
+}
+};
+
+static int section_mismatch(const char *fromsec, const char *tosec)
+{
+       int i;
+       int elems = sizeof(sectioncheck) / sizeof(struct sectioncheck);
+       const struct sectioncheck *check = &sectioncheck[0];
+
+       for (i = 0; i < elems; i++) {
+               if (match(fromsec, check->fromsec) &&
+                   match(tosec, check->tosec))
+                       return check->mismatch;
+               check++;
+       }
+       return NO_MISMATCH;
 }
 
 /**
@@ -669,7 +916,8 @@ static int data_section(const char *name)
  *   the pattern is identified by:
  *   tosec   = init or exit section
  *   fromsec = data section
- *   atsym = *driver, *_template, *_sht, *_ops, *_probe, *probe_one, *_console, *_timer
+ *   atsym = *driver, *_template, *_sht, *_ops, *_probe,
+ *           *probe_one, *_console, *_timer
  *
  * Pattern 3:
  *   Whitelist all refereces from .text.head to .init.data
@@ -684,77 +932,36 @@ static int data_section(const char *name)
  *   This pattern is identified by
  *   refsymname = __init_begin, _sinittext, _einittext
  *
- * Pattern 5:
- *   Xtensa uses literal sections for constants that are accessed PC-relative.
- *   Literal sections may safely reference their text sections.
- *   (Note that the name for the literal section omits any trailing '.text')
- *   tosec = <section>[.text]
- *   fromsec = <section>.literal
  **/
-static int secref_whitelist(const char *modname, const char *tosec,
-                           const char *fromsec, const char *atsym,
-                           const char *refsymname)
+static int secref_whitelist(const char *fromsec, const char *fromsym,
+                           const char *tosec, const char *tosym)
 {
-       int len;
-       const char **s;
-       const char *pat2sym[] = {
-               "driver",
-               "_template", /* scsi uses *_template a lot */
-               "_timer",    /* arm uses ops structures named _timer a lot */
-               "_sht",      /* scsi also used *_sht to some extent */
-               "_ops",
-               "_probe",
-               "_probe_one",
-               "_console",
-               NULL
-       };
-
-       const char *pat3refsym[] = {
-               "__init_begin",
-               "_sinittext",
-               "_einittext",
-               NULL
-       };
-
        /* Check for pattern 0 */
-       if ((strncmp(fromsec, ".text.init.refok", strlen(".text.init.refok")) == 0) ||
-           (strncmp(fromsec, ".exit.text.refok", strlen(".exit.text.refok")) == 0) ||
-           (strncmp(fromsec, ".data.init.refok", strlen(".data.init.refok")) == 0))
-               return 1;
+       if (match(fromsec, initref_sections))
+               return 0;
 
        /* Check for pattern 1 */
-       if ((strcmp(tosec, ".init.data") == 0) &&
-           (strncmp(fromsec, ".data", strlen(".data")) == 0) &&
-           (strncmp(atsym, "__param", strlen("__param")) == 0))
-               return 1;
+       if (match(tosec, init_data_sections) &&
+           match(fromsec, data_sections) &&
+           (strncmp(fromsym, "__param", strlen("__param")) == 0))
+               return 0;
 
        /* Check for pattern 2 */
-       if ((init_section(tosec) || exit_section(tosec)) && data_section(fromsec))
-               for (s = pat2sym; *s; s++)
-                       if (strrcmp(atsym, *s) == 0)
-                               return 1;
+       if (match(tosec, init_exit_sections) &&
+           match(fromsec, data_sections) &&
+           match(fromsym, symbol_white_list))
+               return 0;
 
        /* Check for pattern 3 */
-       if ((strcmp(fromsec, ".text.head") == 0) &&
-               ((strcmp(tosec, ".init.data") == 0) ||
-               (strcmp(tosec, ".init.text") == 0)))
-       return 1;
+       if (match(fromsec, head_sections) &&
+           match(tosec, init_sections))
+               return 0;
 
        /* Check for pattern 4 */
-       for (s = pat3refsym; *s; s++)
-               if (strcmp(refsymname, *s) == 0)
-                       return 1;
-
-       /* Check for pattern 5 */
-       if (strrcmp(tosec, ".text") == 0)
-               len = strlen(tosec) - strlen(".text");
-       else
-               len = strlen(tosec);
-       if ((strncmp(tosec, fromsec, len) == 0) && (strlen(fromsec) > len) &&
-           (strcmp(fromsec + len, ".literal") == 0))
-               return 1;
+       if (match(tosym, linker_symbols))
+               return 0;
 
-       return 0;
+       return 1;
 }
 
 /**
@@ -764,10 +971,13 @@ static int secref_whitelist(const char *modname, const char *tosec,
  * In other cases the symbol needs to be looked up in the symbol table
  * based on section and address.
  *  **/
-static Elf_Sym *find_elf_symbol(struct elf_info *elf, Elf_Addr addr,
+static Elf_Sym *find_elf_symbol(struct elf_info *elf, Elf64_Sword addr,
                                Elf_Sym *relsym)
 {
        Elf_Sym *sym;
+       Elf_Sym *near = NULL;
+       Elf64_Sword distance = 20;
+       Elf64_Sword d;
 
        if (relsym->st_name != 0)
                return relsym;
@@ -778,8 +988,20 @@ static Elf_Sym *find_elf_symbol(struct elf_info *elf, Elf_Addr addr,
                        continue;
                if (sym->st_value == addr)
                        return sym;
+               /* Find a symbol nearby - addr are maybe negative */
+               d = sym->st_value - addr;
+               if (d < 0)
+                       d = addr - sym->st_value;
+               if (d < distance) {
+                       distance = d;
+                       near = sym;
+               }
        }
-       return NULL;
+       /* We need a close match */
+       if (distance < 20)
+               return near;
+       else
+               return NULL;
 }
 
 static inline int is_arm_mapping_symbol(const char *str)
@@ -812,121 +1034,245 @@ static inline int is_valid_name(struct elf_info *elf, Elf_Sym *sym)
  * The ELF format may have a better way to detect what type of symbol
  * it is, but this works for now.
  **/
-static void find_symbols_between(struct elf_info *elf, Elf_Addr addr,
-                                const char *sec,
-                                Elf_Sym **before, Elf_Sym **after)
+static Elf_Sym *find_elf_symbol2(struct elf_info *elf, Elf_Addr addr,
+                                const char *sec)
 {
        Elf_Sym *sym;
-       Elf_Ehdr *hdr = elf->hdr;
-       Elf_Addr beforediff = ~0;
-       Elf_Addr afterdiff = ~0;
-       const char *secstrings = (void *)hdr +
-                                elf->sechdrs[hdr->e_shstrndx].sh_offset;
-
-       *before = NULL;
-       *after = NULL;
+       Elf_Sym *near = NULL;
+       Elf_Addr distance = ~0;
 
        for (sym = elf->symtab_start; sym < elf->symtab_stop; sym++) {
                const char *symsec;
 
                if (sym->st_shndx >= SHN_LORESERVE)
                        continue;
-               symsec = secstrings + elf->sechdrs[sym->st_shndx].sh_name;
+               symsec = sec_name(elf, sym->st_shndx);
                if (strcmp(symsec, sec) != 0)
                        continue;
                if (!is_valid_name(elf, sym))
                        continue;
                if (sym->st_value <= addr) {
-                       if ((addr - sym->st_value) < beforediff) {
-                               beforediff = addr - sym->st_value;
-                               *before = sym;
-                       }
-                       else if ((addr - sym->st_value) == beforediff) {
-                               *before = sym;
+                       if ((addr - sym->st_value) < distance) {
+                               distance = addr - sym->st_value;
+                               near = sym;
+                       } else if ((addr - sym->st_value) == distance) {
+                               near = sym;
                        }
                }
+       }
+       return near;
+}
+
+/*
+ * Convert a section name to the function/data attribute
+ * .init.text => __init
+ * .cpuinit.data => __cpudata
+ * .memexitconst => __memconst
+ * etc.
+*/
+static char *sec2annotation(const char *s)
+{
+       if (match(s, init_exit_sections)) {
+               char *p = malloc(20);
+               char *r = p;
+
+               *p++ = '_';
+               *p++ = '_';
+               if (*s == '.')
+                       s++;
+               while (*s && *s != '.')
+                       *p++ = *s++;
+               *p = '\0';
+               if (*s == '.')
+                       s++;
+               if (strstr(s, "rodata") != NULL)
+                       strcat(p, "const ");
+               else if (strstr(s, "data") != NULL)
+                       strcat(p, "data ");
                else
-               {
-                       if ((sym->st_value - addr) < afterdiff) {
-                               afterdiff = sym->st_value - addr;
-                               *after = sym;
-                       }
-                       else if ((sym->st_value - addr) == afterdiff) {
-                               *after = sym;
-                       }
-               }
+                       strcat(p, " ");
+               return r; /* we leak her but we do not care */
+       } else {
+               return "";
        }
 }
 
-/**
+static int is_function(Elf_Sym *sym)
+{
+       if (sym)
+               return ELF_ST_TYPE(sym->st_info) == STT_FUNC;
+       else
+               return 0;
+}
+
+/*
  * Print a warning about a section mismatch.
  * Try to find symbols near it so user can find it.
  * Check whitelist before warning - it may be a false positive.
- **/
-static void warn_sec_mismatch(const char *modname, const char *fromsec,
-                             struct elf_info *elf, Elf_Sym *sym, Elf_Rela r)
+ */
+static void report_sec_mismatch(const char *modname, enum mismatch mismatch,
+                                const char *fromsec,
+                                unsigned long long fromaddr,
+                                const char *fromsym,
+                                int from_is_func,
+                                const char *tosec, const char *tosym,
+                                int to_is_func)
 {
-       const char *refsymname = "";
-       Elf_Sym *before, *after;
-       Elf_Sym *refsym;
-       Elf_Ehdr *hdr = elf->hdr;
-       Elf_Shdr *sechdrs = elf->sechdrs;
-       const char *secstrings = (void *)hdr +
-                                sechdrs[hdr->e_shstrndx].sh_offset;
-       const char *secname = secstrings + sechdrs[sym->st_shndx].sh_name;
-
-       find_symbols_between(elf, r.r_offset, fromsec, &before, &after);
-
-       refsym = find_elf_symbol(elf, r.r_addend, sym);
-       if (refsym && strlen(elf->strtab + refsym->st_name))
-               refsymname = elf->strtab + refsym->st_name;
-
-       /* check whitelist - we may ignore it */
-       if (secref_whitelist(modname, secname, fromsec,
-                            before ? elf->strtab + before->st_name : "",
-                            refsymname))
+       const char *from, *from_p;
+       const char *to, *to_p;
+       from = from_is_func ? "function" : "variable";
+       from_p = from_is_func ? "()" : "";
+       to = to_is_func ? "function" : "variable";
+       to_p = to_is_func ? "()" : "";
+
+       fprintf(stderr, "WARNING: %s(%s+0x%llx): Section mismatch in"
+                       " reference from the %s %s%s to the %s %s:%s%s\n",
+                        modname, fromsec, fromaddr, from, fromsym, from_p,
+                       to, tosec, tosym, to_p);
+
+       sec_mismatch_count++;
+       if (!sec_mismatch_verbose)
                return;
 
-       if (before && after) {
-               warn("%s(%s+0x%llx): Section mismatch: reference to %s:%s "
-                    "(between '%s' and '%s')\n",
-                    modname, fromsec, (unsigned long long)r.r_offset,
-                    secname, refsymname,
-                    elf->strtab + before->st_name,
-                    elf->strtab + after->st_name);
-       } else if (before) {
-               warn("%s(%s+0x%llx): Section mismatch: reference to %s:%s "
-                    "(after '%s')\n",
-                    modname, fromsec, (unsigned long long)r.r_offset,
-                    secname, refsymname,
-                    elf->strtab + before->st_name);
-       } else if (after) {
-               warn("%s(%s+0x%llx): Section mismatch: reference to %s:%s "
-                    "before '%s' (at offset -0x%llx)\n",
-                    modname, fromsec, (unsigned long long)r.r_offset,
-                    secname, refsymname,
-                    elf->strtab + after->st_name);
-       } else {
-               warn("%s(%s+0x%llx): Section mismatch: reference to %s:%s\n",
-                    modname, fromsec, (unsigned long long)r.r_offset,
-                    secname, refsymname);
+       switch (mismatch) {
+       case TEXT_TO_INIT:
+               fprintf(stderr,
+               "The function %s %s() references\n"
+               "the %s %s%s%s.\n"
+               "This is often because %s lacks a %s\n"
+               "annotation or the annotation of %s is wrong.\n",
+               sec2annotation(fromsec), fromsym,
+               to, sec2annotation(tosec), tosym, to_p,
+               fromsym, sec2annotation(tosec), tosym);
+               break;
+       case DATA_TO_INIT: {
+               const char **s = symbol_white_list;
+               fprintf(stderr,
+               "The variable %s references\n"
+               "the %s %s%s%s\n"
+               "If the reference is valid then annotate the\n"
+               "variable with __init* (see linux/init.h) "
+               "or name the variable:\n",
+               fromsym, to, sec2annotation(tosec), tosym, to_p);
+               while (*s)
+                       fprintf(stderr, "%s, ", *s++);
+               fprintf(stderr, "\n");
+               break;
+       }
+       case TEXT_TO_EXIT:
+               fprintf(stderr,
+               "The function %s() references a %s in an exit section.\n"
+               "Often the %s %s%s has valid usage outside the exit section\n"
+               "and the fix is to remove the %sannotation of %s.\n",
+               fromsym, to, to, tosym, to_p, sec2annotation(tosec), tosym);
+               break;
+       case DATA_TO_EXIT: {
+               const char **s = symbol_white_list;
+               fprintf(stderr,
+               "The variable %s references\n"
+               "the %s %s%s%s\n"
+               "If the reference is valid then annotate the\n"
+               "variable with __exit* (see linux/init.h) or "
+               "name the variable:\n",
+               fromsym, to, sec2annotation(tosec), tosym, to_p);
+               while (*s)
+                       fprintf(stderr, "%s, ", *s++);
+               fprintf(stderr, "\n");
+               break;
+       }
+       case XXXINIT_TO_INIT:
+       case XXXEXIT_TO_EXIT:
+               fprintf(stderr,
+               "The %s %s%s%s references\n"
+               "a %s %s%s%s.\n"
+               "If %s is only used by %s then\n"
+               "annotate %s with a matching annotation.\n",
+               from, sec2annotation(fromsec), fromsym, from_p,
+               to, sec2annotation(tosec), tosym, to_p,
+               fromsym, tosym, fromsym);
+               break;
+       case INIT_TO_EXIT:
+               fprintf(stderr,
+               "The %s %s%s%s references\n"
+               "a %s %s%s%s.\n"
+               "This is often seen when error handling "
+               "in the init function\n"
+               "uses functionality in the exit path.\n"
+               "The fix is often to remove the %sannotation of\n"
+               "%s%s so it may be used outside an exit section.\n",
+               from, sec2annotation(fromsec), fromsym, from_p,
+               to, sec2annotation(tosec), tosym, to_p,
+               sec2annotation(tosec), tosym, to_p);
+               break;
+       case EXIT_TO_INIT:
+               fprintf(stderr,
+               "The %s %s%s%s references\n"
+               "a %s %s%s%s.\n"
+               "This is often seen when error handling "
+               "in the exit function\n"
+               "uses functionality in the init path.\n"
+               "The fix is often to remove the %sannotation of\n"
+               "%s%s so it may be used outside an init section.\n",
+               from, sec2annotation(fromsec), fromsym, from_p,
+               to, sec2annotation(tosec), tosym, to_p,
+               sec2annotation(tosec), tosym, to_p);
+               break;
+       case EXPORT_TO_INIT_EXIT:
+               fprintf(stderr,
+               "The symbol %s is exported and annotated %s\n"
+               "Fix this by removing the %sannotation of %s "
+               "or drop the export.\n",
+               tosym, sec2annotation(tosec), sec2annotation(tosec), tosym);
+       case NO_MISMATCH:
+               /* To get warnings on missing members */
+               break;
+       }
+       fprintf(stderr, "\n");
+}
+
+static void check_section_mismatch(const char *modname, struct elf_info *elf,
+                                   Elf_Rela *r, Elf_Sym *sym, const char *fromsec)
+{
+       const char *tosec;
+       enum mismatch mismatch;
+
+       tosec = sec_name(elf, sym->st_shndx);
+       mismatch = section_mismatch(fromsec, tosec);
+       if (mismatch != NO_MISMATCH) {
+               Elf_Sym *to;
+               Elf_Sym *from;
+               const char *tosym;
+               const char *fromsym;
+
+               from = find_elf_symbol2(elf, r->r_offset, fromsec);
+               fromsym = sym_name(elf, from);
+               to = find_elf_symbol(elf, r->r_addend, sym);
+               tosym = sym_name(elf, to);
+
+               /* check whitelist - we may ignore it */
+               if (secref_whitelist(fromsec, fromsym, tosec, tosym)) {
+                       report_sec_mismatch(modname, mismatch,
+                          fromsec, r->r_offset, fromsym,
+                          is_function(from), tosec, tosym,
+                          is_function(to));
+               }
        }
 }
 
 static unsigned int *reloc_location(struct elf_info *elf,
-                                          int rsection, Elf_Rela *r)
+                                   Elf_Shdr *sechdr, Elf_Rela *r)
 {
        Elf_Shdr *sechdrs = elf->sechdrs;
-       int section = sechdrs[rsection].sh_info;
+       int section = sechdr->sh_info;
 
        return (void *)elf->hdr + sechdrs[section].sh_offset +
                (r->r_offset - sechdrs[section].sh_addr);
 }
 
-static int addend_386_rel(struct elf_info *elf, int rsection, Elf_Rela *r)
+static int addend_386_rel(struct elf_info *elf, Elf_Shdr *sechdr, Elf_Rela *r)
 {
        unsigned int r_typ = ELF_R_TYPE(r->r_info);
-       unsigned int *location = reloc_location(elf, rsection, r);
+       unsigned int *location = reloc_location(elf, sechdr, r);
 
        switch (r_typ) {
        case R_386_32:
@@ -942,19 +1288,21 @@ static int addend_386_rel(struct elf_info *elf, int rsection, Elf_Rela *r)
        return 0;
 }
 
-static int addend_arm_rel(struct elf_info *elf, int rsection, Elf_Rela *r)
+static int addend_arm_rel(struct elf_info *elf, Elf_Shdr *sechdr, Elf_Rela *r)
 {
        unsigned int r_typ = ELF_R_TYPE(r->r_info);
 
        switch (r_typ) {
        case R_ARM_ABS32:
                /* From ARM ABI: (S + A) | T */
-               r->r_addend = (int)(long)(elf->symtab_start + ELF_R_SYM(r->r_info));
+               r->r_addend = (int)(long)
+                             (elf->symtab_start + ELF_R_SYM(r->r_info));
                break;
        case R_ARM_PC24:
                /* From ARM ABI: ((S + A) | T) - P */
-               r->r_addend = (int)(long)(elf->hdr + elf->sechdrs[rsection].sh_offset +
-                                         (r->r_offset - elf->sechdrs[rsection].sh_addr));
+               r->r_addend = (int)(long)(elf->hdr +
+                             sechdr->sh_offset +
+                             (r->r_offset - sechdr->sh_addr));
                break;
        default:
                return 1;
@@ -962,10 +1310,10 @@ static int addend_arm_rel(struct elf_info *elf, int rsection, Elf_Rela *r)
        return 0;
 }
 
-static int addend_mips_rel(struct elf_info *elf, int rsection, Elf_Rela *r)
+static int addend_mips_rel(struct elf_info *elf, Elf_Shdr *sechdr, Elf_Rela *r)
 {
        unsigned int r_typ = ELF_R_TYPE(r->r_info);
-       unsigned int *location = reloc_location(elf, rsection, r);
+       unsigned int *location = reloc_location(elf, sechdr, r);
        unsigned int inst;
 
        if (r_typ == R_MIPS_HI16)
@@ -985,6 +1333,108 @@ static int addend_mips_rel(struct elf_info *elf, int rsection, Elf_Rela *r)
        return 0;
 }
 
+static void section_rela(const char *modname, struct elf_info *elf,
+                         Elf_Shdr *sechdr)
+{
+       Elf_Sym  *sym;
+       Elf_Rela *rela;
+       Elf_Rela r;
+       unsigned int r_sym;
+       const char *fromsec;
+
+       Elf_Rela *start = (void *)elf->hdr + sechdr->sh_offset;
+       Elf_Rela *stop  = (void *)start + sechdr->sh_size;
+
+       fromsec = sech_name(elf, sechdr);
+       fromsec += strlen(".rela");
+       /* if from section (name) is know good then skip it */
+       if (check_section(modname, fromsec))
+               return;
+
+       for (rela = start; rela < stop; rela++) {
+               r.r_offset = TO_NATIVE(rela->r_offset);
+#if KERNEL_ELFCLASS == ELFCLASS64
+               if (elf->hdr->e_machine == EM_MIPS) {
+                       unsigned int r_typ;
+                       r_sym = ELF64_MIPS_R_SYM(rela->r_info);
+                       r_sym = TO_NATIVE(r_sym);
+                       r_typ = ELF64_MIPS_R_TYPE(rela->r_info);
+                       r.r_info = ELF64_R_INFO(r_sym, r_typ);
+               } else {
+                       r.r_info = TO_NATIVE(rela->r_info);
+                       r_sym = ELF_R_SYM(r.r_info);
+               }
+#else
+               r.r_info = TO_NATIVE(rela->r_info);
+               r_sym = ELF_R_SYM(r.r_info);
+#endif
+               r.r_addend = TO_NATIVE(rela->r_addend);
+               sym = elf->symtab_start + r_sym;
+               /* Skip special sections */
+               if (sym->st_shndx >= SHN_LORESERVE)
+                       continue;
+               check_section_mismatch(modname, elf, &r, sym, fromsec);
+       }
+}
+
+static void section_rel(const char *modname, struct elf_info *elf,
+                        Elf_Shdr *sechdr)
+{
+       Elf_Sym *sym;
+       Elf_Rel *rel;
+       Elf_Rela r;
+       unsigned int r_sym;
+       const char *fromsec;
+
+       Elf_Rel *start = (void *)elf->hdr + sechdr->sh_offset;
+       Elf_Rel *stop  = (void *)start + sechdr->sh_size;
+
+       fromsec = sech_name(elf, sechdr);
+       fromsec += strlen(".rel");
+       /* if from section (name) is know good then skip it */
+       if (check_section(modname, fromsec))
+               return;
+
+       for (rel = start; rel < stop; rel++) {
+               r.r_offset = TO_NATIVE(rel->r_offset);
+#if KERNEL_ELFCLASS == ELFCLASS64
+               if (elf->hdr->e_machine == EM_MIPS) {
+                       unsigned int r_typ;
+                       r_sym = ELF64_MIPS_R_SYM(rel->r_info);
+                       r_sym = TO_NATIVE(r_sym);
+                       r_typ = ELF64_MIPS_R_TYPE(rel->r_info);
+                       r.r_info = ELF64_R_INFO(r_sym, r_typ);
+               } else {
+                       r.r_info = TO_NATIVE(rel->r_info);
+                       r_sym = ELF_R_SYM(r.r_info);
+               }
+#else
+               r.r_info = TO_NATIVE(rel->r_info);
+               r_sym = ELF_R_SYM(r.r_info);
+#endif
+               r.r_addend = 0;
+               switch (elf->hdr->e_machine) {
+               case EM_386:
+                       if (addend_386_rel(elf, sechdr, &r))
+                               continue;
+                       break;
+               case EM_ARM:
+                       if (addend_arm_rel(elf, sechdr, &r))
+                               continue;
+                       break;
+               case EM_MIPS:
+                       if (addend_mips_rel(elf, sechdr, &r))
+                               continue;
+                       break;
+               }
+               sym = elf->symtab_start + r_sym;
+               /* Skip special sections */
+               if (sym->st_shndx >= SHN_LORESERVE)
+                       continue;
+               check_section_mismatch(modname, elf, &r, sym, fromsec);
+       }
+}
+
 /**
  * A module includes a number of sections that are discarded
  * either when loaded or when used as built-in.
@@ -998,257 +1448,21 @@ static int addend_mips_rel(struct elf_info *elf, int rsection, Elf_Rela *r)
  * be discarded and warns about it.
  **/
 static void check_sec_ref(struct module *mod, const char *modname,
-                         struct elf_info *elf,
-                         int section(const char*),
-                         int section_ref_ok(const char *))
+                          struct elf_info *elf)
 {
        int i;
-       Elf_Sym  *sym;
-       Elf_Ehdr *hdr = elf->hdr;
        Elf_Shdr *sechdrs = elf->sechdrs;
-       const char *secstrings = (void *)hdr +
-                                sechdrs[hdr->e_shstrndx].sh_offset;
 
        /* Walk through all sections */
-       for (i = 0; i < hdr->e_shnum; i++) {
-               const char *name = secstrings + sechdrs[i].sh_name;
-               const char *secname;
-               Elf_Rela r;
-               unsigned int r_sym;
+       for (i = 0; i < elf->hdr->e_shnum; i++) {
                /* We want to process only relocation sections and not .init */
-               if (sechdrs[i].sh_type == SHT_RELA) {
-                       Elf_Rela *rela;
-                       Elf_Rela *start = (void *)hdr + sechdrs[i].sh_offset;
-                       Elf_Rela *stop  = (void*)start + sechdrs[i].sh_size;
-                       name += strlen(".rela");
-                       if (section_ref_ok(name))
-                               continue;
-
-                       for (rela = start; rela < stop; rela++) {
-                               r.r_offset = TO_NATIVE(rela->r_offset);
-#if KERNEL_ELFCLASS == ELFCLASS64
-                               if (hdr->e_machine == EM_MIPS) {
-                                       unsigned int r_typ;
-                                       r_sym = ELF64_MIPS_R_SYM(rela->r_info);
-                                       r_sym = TO_NATIVE(r_sym);
-                                       r_typ = ELF64_MIPS_R_TYPE(rela->r_info);
-                                       r.r_info = ELF64_R_INFO(r_sym, r_typ);
-                               } else {
-                                       r.r_info = TO_NATIVE(rela->r_info);
-                                       r_sym = ELF_R_SYM(r.r_info);
-                               }
-#else
-                               r.r_info = TO_NATIVE(rela->r_info);
-                               r_sym = ELF_R_SYM(r.r_info);
-#endif
-                               r.r_addend = TO_NATIVE(rela->r_addend);
-                               sym = elf->symtab_start + r_sym;
-                               /* Skip special sections */
-                               if (sym->st_shndx >= SHN_LORESERVE)
-                                       continue;
-
-                               secname = secstrings +
-                                       sechdrs[sym->st_shndx].sh_name;
-                               if (section(secname))
-                                       warn_sec_mismatch(modname, name,
-                                                         elf, sym, r);
-                       }
-               } else if (sechdrs[i].sh_type == SHT_REL) {
-                       Elf_Rel *rel;
-                       Elf_Rel *start = (void *)hdr + sechdrs[i].sh_offset;
-                       Elf_Rel *stop  = (void*)start + sechdrs[i].sh_size;
-                       name += strlen(".rel");
-                       if (section_ref_ok(name))
-                               continue;
-
-                       for (rel = start; rel < stop; rel++) {
-                               r.r_offset = TO_NATIVE(rel->r_offset);
-#if KERNEL_ELFCLASS == ELFCLASS64
-                               if (hdr->e_machine == EM_MIPS) {
-                                       unsigned int r_typ;
-                                       r_sym = ELF64_MIPS_R_SYM(rel->r_info);
-                                       r_sym = TO_NATIVE(r_sym);
-                                       r_typ = ELF64_MIPS_R_TYPE(rel->r_info);
-                                       r.r_info = ELF64_R_INFO(r_sym, r_typ);
-                               } else {
-                                       r.r_info = TO_NATIVE(rel->r_info);
-                                       r_sym = ELF_R_SYM(r.r_info);
-                               }
-#else
-                               r.r_info = TO_NATIVE(rel->r_info);
-                               r_sym = ELF_R_SYM(r.r_info);
-#endif
-                               r.r_addend = 0;
-                               switch (hdr->e_machine) {
-                               case EM_386:
-                                       if (addend_386_rel(elf, i, &r))
-                                               continue;
-                                       break;
-                               case EM_ARM:
-                                       if(addend_arm_rel(elf, i, &r))
-                                               continue;
-                                       break;
-                               case EM_MIPS:
-                                       if (addend_mips_rel(elf, i, &r))
-                                               continue;
-                                       break;
-                               }
-                               sym = elf->symtab_start + r_sym;
-                               /* Skip special sections */
-                               if (sym->st_shndx >= SHN_LORESERVE)
-                                       continue;
-
-                               secname = secstrings +
-                                       sechdrs[sym->st_shndx].sh_name;
-                               if (section(secname))
-                                       warn_sec_mismatch(modname, name,
-                                                         elf, sym, r);
-                       }
-               }
+               if (sechdrs[i].sh_type == SHT_RELA)
+                       section_rela(modname, elf, &elf->sechdrs[i]);
+               else if (sechdrs[i].sh_type == SHT_REL)
+                       section_rel(modname, elf, &elf->sechdrs[i]);
        }
 }
 
-/*
- * Identify sections from which references to either a
- * .init or a .exit section is OK.
- *
- * [OPD] Keith Ownes <kaos@sgi.com> commented:
- * For our future {in}sanity, add a comment that this is the ppc .opd
- * section, not the ia64 .opd section.
- * ia64 .opd should not point to discarded sections.
- * [.rodata] like for .init.text we ignore .rodata references -same reason
- */
-static int initexit_section_ref_ok(const char *name)
-{
-       const char **s;
-       /* Absolute section names */
-       const char *namelist1[] = {
-               "__bug_table",          /* used by powerpc for BUG() */
-               "__ex_table",
-               ".altinstructions",
-               ".cranges",             /* used by sh64 */
-               ".fixup",
-               ".machvec",             /* ia64 + powerpc uses these */
-               ".machine.desc",
-               ".opd",                 /* See comment [OPD] */
-               "__dbe_table",
-               ".parainstructions",
-               ".pdr",
-               ".plt",                 /* seen on ARCH=um build on x86_64. Harmless */
-               ".smp_locks",
-               ".stab",
-               ".m68k_fixup",
-               ".xt.prop",             /* xtensa informational section */
-               ".xt.lit",              /* xtensa informational section */
-               NULL
-       };
-       /* Start of section names */
-       const char *namelist2[] = {
-               ".debug",
-               ".eh_frame",
-               ".note",                /* ignore ELF notes - may contain anything */
-               ".got",                 /* powerpc - global offset table */
-               ".toc",                 /* powerpc - table of contents */
-               NULL
-       };
-       /* part of section name */
-       const char *namelist3 [] = {
-               ".unwind",  /* Sample: IA_64.unwind.exit.text */
-               NULL
-       };
-
-       for (s = namelist1; *s; s++)
-               if (strcmp(*s, name) == 0)
-                       return 1;
-       for (s = namelist2; *s; s++)
-               if (strncmp(*s, name, strlen(*s)) == 0)
-                       return 1;
-       for (s = namelist3; *s; s++)
-               if (strstr(name, *s) != NULL)
-                       return 1;
-       return 0;
-}
-
-
-/*
- * Identify sections from which references to a .init section is OK.
- *
- * Unfortunately references to read only data that referenced .init
- * sections had to be excluded. Almost all of these are false
- * positives, they are created by gcc. The downside of excluding rodata
- * is that there really are some user references from rodata to
- * init code, e.g. drivers/video/vgacon.c:
- *
- * const struct consw vga_con = {
- *        con_startup:            vgacon_startup,
- *
- * where vgacon_startup is __init.  If you want to wade through the false
- * positives, take out the check for rodata.
- */
-static int init_section_ref_ok(const char *name)
-{
-       const char **s;
-       /* Absolute section names */
-       const char *namelist1[] = {
-               "__dbe_table",          /* MIPS generate these */
-               "__ftr_fixup",          /* powerpc cpu feature fixup */
-               "__fw_ftr_fixup",       /* powerpc firmware feature fixup */
-               "__param",
-               ".data.rel.ro",         /* used by parisc64 */
-               ".init",
-               ".text.lock",
-               NULL
-       };
-       /* Start of section names */
-       const char *namelist2[] = {
-               ".init.",
-               ".pci_fixup",
-               ".rodata",
-               NULL
-       };
-
-       if (initexit_section_ref_ok(name))
-               return 1;
-
-       for (s = namelist1; *s; s++)
-               if (strcmp(*s, name) == 0)
-                       return 1;
-       for (s = namelist2; *s; s++)
-               if (strncmp(*s, name, strlen(*s)) == 0)
-                       return 1;
-
-       /* If section name ends with ".init" we allow references
-        * as is the case with .initcallN.init, .early_param.init, .taglist.init etc
-        */
-       if (strrcmp(name, ".init") == 0)
-               return 1;
-       return 0;
-}
-
-/*
- * Identify sections from which references to a .exit section is OK.
- */
-static int exit_section_ref_ok(const char *name)
-{
-       const char **s;
-       /* Absolute section names */
-       const char *namelist1[] = {
-               ".exit.data",
-               ".exit.text",
-               ".exitcall.exit",
-               ".rodata",
-               NULL
-       };
-
-       if (initexit_section_ref_ok(name))
-               return 1;
-
-       for (s = namelist1; *s; s++)
-               if (strcmp(*s, name) == 0)
-                       return 1;
-       return 0;
-}
-
 static void read_symbols(char *modname)
 {
        const char *symname;
@@ -1288,10 +1502,9 @@ static void read_symbols(char *modname)
                handle_modversions(mod, &info, sym, symname);
                handle_moddevtable(mod, &info, sym, symname);
        }
-       if (is_vmlinux(modname) && vmlinux_section_warnings) {
-               check_sec_ref(mod, modname, &info, init_section, init_section_ref_ok);
-               check_sec_ref(mod, modname, &info, exit_section, exit_section_ref_ok);
-       }
+       if (!is_vmlinux(modname) ||
+            (is_vmlinux(modname) && vmlinux_section_warnings))
+               check_sec_ref(mod, modname, &info);
 
        version = get_modinfo(info.modinfo, info.modinfo_len, "version");
        if (version)
@@ -1365,7 +1578,7 @@ static void check_for_gpl_usage(enum export exp, const char *m, const char *s)
        }
 }
 
-static void check_for_unused(enum export exp, const char* m, const char* s)
+static void check_for_unused(enum export exp, const char *m, const char *s)
 {
        const char *e = is_vmlinux(m) ?"":".ko";
 
@@ -1398,7 +1611,7 @@ static void check_exports(struct module *mod)
                if (!mod->gpl_compatible)
                        check_for_gpl_usage(exp->export, basename, exp->name);
                check_for_unused(exp->export, basename, exp->name);
-        }
+       }
 }
 
 /**
@@ -1458,13 +1671,12 @@ static int add_versions(struct buffer *b, struct module *mod)
 
        buf_printf(b, "\n");
        buf_printf(b, "static const struct modversion_info ____versions[]\n");
-       buf_printf(b, "__attribute_used__\n");
+       buf_printf(b, "__used\n");
        buf_printf(b, "__attribute__((section(\"__versions\"))) = {\n");
 
        for (s = mod->unres; s; s = s->next) {
-               if (!s->module) {
+               if (!s->module)
                        continue;
-               }
                if (!s->crc_valid) {
                        warn("\"%s\" [%s.ko] has no CRC!\n",
                                s->name, mod->name);
@@ -1485,13 +1697,12 @@ static void add_depends(struct buffer *b, struct module *mod,
        struct module *m;
        int first = 1;
 
-       for (m = modules; m; m = m->next) {
+       for (m = modules; m; m = m->next)
                m->seen = is_vmlinux(m->name);
-       }
 
        buf_printf(b, "\n");
        buf_printf(b, "static const char __module_depends[]\n");
-       buf_printf(b, "__attribute_used__\n");
+       buf_printf(b, "__used\n");
        buf_printf(b, "__attribute__((section(\".modinfo\"))) =\n");
        buf_printf(b, "\"depends=");
        for (s = mod->unres; s; s = s->next) {
@@ -1503,7 +1714,8 @@ static void add_depends(struct buffer *b, struct module *mod,
                        continue;
 
                s->module->seen = 1;
-               if ((p = strrchr(s->module->name, '/')) != NULL)
+               p = strrchr(s->module->name, '/');
+               if (p)
                        p++;
                else
                        p = s->module->name;
@@ -1575,7 +1787,7 @@ static void read_dump(const char *fname, unsigned int kernel)
        void *file = grab_file(fname, &size);
        char *line;
 
-        if (!file)
+       if (!file)
                /* No symbol versions, silently ignore */
                return;
 
@@ -1598,11 +1810,10 @@ static void read_dump(const char *fname, unsigned int kernel)
                crc = strtoul(line, &d, 16);
                if (*symname == '\0' || *modname == '\0' || *d != '\0')
                        goto fail;
-
-               if (!(mod = find_module(modname))) {
-                       if (is_vmlinux(modname)) {
+               mod = find_module(modname);
+               if (!mod) {
+                       if (is_vmlinux(modname))
                                have_vmlinux = 1;
-                       }
                        mod = new_module(NOFAIL(strdup(modname)));
                        mod->skip = 1;
                }
@@ -1653,38 +1864,40 @@ int main(int argc, char **argv)
 {
        struct module *mod;
        struct buffer buf = { };
-       char fname[SZ];
        char *kernel_read = NULL, *module_read = NULL;
        char *dump_write = NULL;
        int opt;
        int err;
 
-       while ((opt = getopt(argc, argv, "i:I:mso:aw")) != -1) {
-               switch(opt) {
-                       case 'i':
-                               kernel_read = optarg;
-                               break;
-                       case 'I':
-                               module_read = optarg;
-                               external_module = 1;
-                               break;
-                       case 'm':
-                               modversions = 1;
-                               break;
-                       case 'o':
-                               dump_write = optarg;
-                               break;
-                       case 'a':
-                               all_versions = 1;
-                               break;
-                       case 's':
-                               vmlinux_section_warnings = 0;
-                               break;
-                       case 'w':
-                               warn_unresolved = 1;
-                               break;
-                       default:
-                               exit(1);
+       while ((opt = getopt(argc, argv, "i:I:msSo:aw")) != -1) {
+               switch (opt) {
+               case 'i':
+                       kernel_read = optarg;
+                       break;
+               case 'I':
+                       module_read = optarg;
+                       external_module = 1;
+                       break;
+               case 'm':
+                       modversions = 1;
+                       break;
+               case 'o':
+                       dump_write = optarg;
+                       break;
+               case 'a':
+                       all_versions = 1;
+                       break;
+               case 's':
+                       vmlinux_section_warnings = 0;
+                       break;
+               case 'S':
+                       sec_mismatch_verbose = 0;
+                       break;
+               case 'w':
+                       warn_unresolved = 1;
+                       break;
+               default:
+                       exit(1);
                }
        }
 
@@ -1693,9 +1906,8 @@ int main(int argc, char **argv)
        if (module_read)
                read_dump(module_read, 0);
 
-       while (optind < argc) {
+       while (optind < argc)
                read_symbols(argv[optind++]);
-       }
 
        for (mod = modules; mod; mod = mod->next) {
                if (mod->skip)
@@ -1706,6 +1918,8 @@ int main(int argc, char **argv)
        err = 0;
 
        for (mod = modules; mod; mod = mod->next) {
+               char fname[strlen(mod->name) + 10];
+
                if (mod->skip)
                        continue;
 
@@ -1723,6 +1937,12 @@ int main(int argc, char **argv)
 
        if (dump_write)
                write_dump(dump_write);
+       if (sec_mismatch_count && !sec_mismatch_verbose)
+               fprintf(stderr, "modpost: Found %d section mismatch(es).\n"
+                       "To see additional details select \"Enable full "
+                       "Section mismatch analysis\"\n"
+                       "in the Kernel Hacking menu "
+                       "(CONFIG_SECTION_MISMATCH).\n", sec_mismatch_count);
 
        return err;
 }
index 0ffed17ec20ca2037be0d8bf1467477c912952e3..999f15e0e0083252e7991910eeab55494131b525 100644 (file)
@@ -17,6 +17,7 @@
 #define Elf_Shdr    Elf32_Shdr
 #define Elf_Sym     Elf32_Sym
 #define Elf_Addr    Elf32_Addr
+#define Elf_Sword   Elf64_Sword
 #define Elf_Section Elf32_Half
 #define ELF_ST_BIND ELF32_ST_BIND
 #define ELF_ST_TYPE ELF32_ST_TYPE
@@ -31,6 +32,7 @@
 #define Elf_Shdr    Elf64_Shdr
 #define Elf_Sym     Elf64_Sym
 #define Elf_Addr    Elf64_Addr
+#define Elf_Sword   Elf64_Sxword
 #define Elf_Section Elf64_Half
 #define ELF_ST_BIND ELF64_ST_BIND
 #define ELF_ST_TYPE ELF64_ST_TYPE
index 7c434e037e7f658bdc088cbc760d4eaba36f785d..5e326078a4a2c351bc2a0c1319f8bdee76d554c0 100644 (file)
@@ -89,9 +89,8 @@ clean-dirs += $(objtree)/tar-install/
 # Help text displayed when executing 'make help'
 # ---------------------------------------------------------------------------
 help: FORCE
-       @echo '  rpm-pkg         - Build the kernel as an RPM package'
-       @echo '  binrpm-pkg      - Build an rpm package containing the compiled kernel'
-       @echo '                    and modules'
+       @echo '  rpm-pkg         - Build both source and binary RPM kernel packages'
+       @echo '  binrpm-pkg      - Build only the binary kernel package'
        @echo '  deb-pkg         - Build the kernel as an deb package'
        @echo '  tar-pkg         - Build the kernel as an uncompressed tarball'
        @echo '  targz-pkg       - Build the kernel as a gzip compressed tarball'
index aa0ccdbd1f477b442941fec0f89a2038268bc4fc..28574ae551703157d6cea9e525c6cf43edb85da8 100644 (file)
@@ -69,8 +69,8 @@ cp -v -- "${objtree}/vmlinux" "${tmpdir}/boot/vmlinux-${KERNELRELEASE}"
 # Install arch-specific kernel image(s)
 #
 case "${ARCH}" in
-       i386|x86_64)
-               [ -f "${objtree}/arch/$ARCH/boot/bzImage" ] && cp -v -- "${objtree}/arch/$ARCH/boot/bzImage" "${tmpdir}/boot/vmlinuz-${KERNELRELEASE}"
+       x86|i386|x86_64)
+               [ -f "${objtree}/arch/x86/boot/bzImage" ] && cp -v -- "${objtree}/arch/x86/boot/bzImage" "${tmpdir}/boot/vmlinuz-${KERNELRELEASE}"
                ;;
        alpha)
                [ -f "${objtree}/arch/alpha/boot/vmlinux.gz" ] && cp -v -- "${objtree}/arch/alpha/boot/vmlinux.gz" "${tmpdir}/boot/vmlinuz-${KERNELRELEASE}"
index 67e4b1868e500dcd7331265ffd5027ee52703d3b..ece46ef0ba54e8f8ca48b008991f2aec217f0b1f 100755 (executable)
@@ -65,7 +65,7 @@ sourcedir=${1-/usr/src/linux}
 patchdir=${2-.}
 stopvers=${3-default}
 
-if [ "$1" == -h -o "$1" == --help -o ! -r "$sourcedir/Makefile" ]; then
+if [ "$1" = -h -o "$1" = --help -o ! -r "$sourcedir/Makefile" ]; then
 cat << USAGE
 usage: $PNAME [-h] [ sourcedir [ patchdir [ stopversion ] [ -acxx ] ] ]
   source directory defaults to /usr/src/linux,
@@ -182,10 +182,12 @@ reversePatch () {
 }
 
 # set current VERSION, PATCHLEVEL, SUBLEVEL, EXTRAVERSION
-TMPFILE=`mktemp .tmpver.XXXXXX` || { echo "cannot make temp file" ; exit 1; }
+# force $TMPFILEs below to be in local directory: a slash character prevents
+# the dot command from using the search path.
+TMPFILE=`mktemp ./.tmpver.XXXXXX` || { echo "cannot make temp file" ; exit 1; }
 grep -E "^(VERSION|PATCHLEVEL|SUBLEVEL|EXTRAVERSION)" $sourcedir/Makefile > $TMPFILE
 tr -d [:blank:] < $TMPFILE > $TMPFILE.1
-source $TMPFILE.1
+. $TMPFILE.1
 rm -f $TMPFILE*
 if [ -z "$VERSION" -o -z "$PATCHLEVEL" -o -z "$SUBLEVEL" ]
 then
@@ -202,11 +204,7 @@ echo "Current kernel version is $VERSION.$PATCHLEVEL.$SUBLEVEL${EXTRAVERSION} ($
 EXTRAVER=
 if [ x$EXTRAVERSION != "x" ]
 then
-       if [ ${EXTRAVERSION:0:1} == "." ]; then
-               EXTRAVER=${EXTRAVERSION:1}
-       else
-               EXTRAVER=$EXTRAVERSION
-       fi
+       EXTRAVER=${EXTRAVERSION#.}
        EXTRAVER=${EXTRAVER%%[[:punct:]]*}
        #echo "$PNAME: changing EXTRAVERSION from $EXTRAVERSION to $EXTRAVER"
 fi
@@ -251,16 +249,16 @@ while :                           # incrementing SUBLEVEL (s in v.p.s)
 do
     CURRENTFULLVERSION="$VERSION.$PATCHLEVEL.$SUBLEVEL"
     EXTRAVER=
-    if [ $stopvers == $CURRENTFULLVERSION ]; then
+    if [ $stopvers = $CURRENTFULLVERSION ]; then
         echo "Stopping at $CURRENTFULLVERSION base as requested."
         break
     fi
 
-    SUBLEVEL=$((SUBLEVEL + 1))
+    SUBLEVEL=$(($SUBLEVEL + 1))
     FULLVERSION="$VERSION.$PATCHLEVEL.$SUBLEVEL"
     #echo "#___ trying $FULLVERSION ___"
 
-    if [ $((SUBLEVEL)) -gt $((STOPSUBLEVEL)) ]; then
+    if [ $(($SUBLEVEL)) -gt $(($STOPSUBLEVEL)) ]; then
        echo "Stopping since sublevel ($SUBLEVEL) is beyond stop-sublevel ($STOPSUBLEVEL)"
        exit 1
     fi
@@ -297,7 +295,7 @@ fi
 if [ x$gotac != x ]; then
   # Out great user wants the -ac patches
        # They could have done -ac (get latest) or -acxx where xx=version they want
-       if [ $gotac == "-ac" ]; then
+       if [ $gotac = "-ac" ]; then
          # They want the latest version
                HIGHESTPATCH=0
                for PATCHNAMES in $patchdir/patch-${CURRENTFULLVERSION}-ac*\.*
index 82e4993f0a7368797989b2acd169fe203766fb7a..52f032e409a38b9c9eef8f5b340038d66805eb40 100644 (file)
@@ -12,11 +12,36 @@ cd "${1:-.}" || usage
 if head=`git rev-parse --verify HEAD 2>/dev/null`; then
        # Do we have an untagged version?
        if git name-rev --tags HEAD | grep -E '^HEAD[[:space:]]+(.*~[0-9]*|undefined)$' > /dev/null; then
-               printf '%s%s' -g `echo "$head" | cut -c1-8`
+               git describe | awk -F- '{printf("-%05d-%s", $(NF-1),$(NF))}'
        fi
 
        # Are there uncommitted changes?
-       if git diff-index HEAD | read dummy; then
+       git update-index --refresh --unmerged > /dev/null
+       if git diff-index --name-only HEAD | grep -v "^scripts/package" \
+           | read dummy; then
                printf '%s' -dirty
        fi
+
+       # All done with git
+       exit
+fi
+
+# Check for mercurial and a mercurial repo.
+if hgid=`hg id 2>/dev/null`; then
+       tag=`printf '%s' "$hgid" | cut -d' ' -f2`
+
+       # Do we have an untagged version?
+       if [ -z "$tag" -o "$tag" = tip ]; then
+               id=`printf '%s' "$hgid" | sed 's/[+ ].*//'`
+               printf '%s%s' -hg "$id"
+       fi
+
+       # Are there uncommitted changes?
+       # These are represented by + after the changeset id.
+       case "$hgid" in
+               *+|*+\ *) printf '%s' -dirty ;;
+       esac
+
+       # All done with mercurial
+       exit
 fi