aboutsummaryrefslogtreecommitdiff
path: root/docs/devel/code-provenance.rst
diff options
context:
space:
mode:
Diffstat (limited to 'docs/devel/code-provenance.rst')
-rw-r--r--docs/devel/code-provenance.rst338
1 files changed, 338 insertions, 0 deletions
diff --git a/docs/devel/code-provenance.rst b/docs/devel/code-provenance.rst
new file mode 100644
index 0000000..b5aae2e
--- /dev/null
+++ b/docs/devel/code-provenance.rst
@@ -0,0 +1,338 @@
+.. _code-provenance:
+
+Code provenance
+===============
+
+Certifying patch submissions
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The QEMU community **mandates** all contributors to certify provenance of
+patch submissions they make to the project. To put it another way,
+contributors must indicate that they are legally permitted to contribute to
+the project.
+
+Certification is achieved with a low overhead by adding a single line to the
+bottom of every git commit::
+
+ Signed-off-by: YOUR NAME <YOUR@EMAIL>
+
+The addition of this line asserts that the author of the patch is contributing
+in accordance with the clauses specified in the
+`Developer's Certificate of Origin <https://developercertificate.org>`__:
+
+.. _dco:
+
+ Developer's Certificate of Origin 1.1
+
+ By making a contribution to this project, I certify that:
+
+ (a) The contribution was created in whole or in part by me and I
+ have the right to submit it under the open source license
+ indicated in the file; or
+
+ (b) The contribution is based upon previous work that, to the best
+ of my knowledge, is covered under an appropriate open source
+ license and I have the right under that license to submit that
+ work with modifications, whether created in whole or in part
+ by me, under the same open source license (unless I am
+ permitted to submit under a different license), as indicated
+ in the file; or
+
+ (c) The contribution was provided directly to me by some other
+ person who certified (a), (b) or (c) and I have not modified
+ it.
+
+ (d) I understand and agree that this project and the contribution
+ are public and that a record of the contribution (including all
+ personal information I submit with it, including my sign-off) is
+ maintained indefinitely and may be redistributed consistent with
+ this project or the open source license(s) involved.
+
+The name used with "Signed-off-by" does not need to be your legal name, nor
+birth name, nor appear on any government ID. It is the identity you choose to
+be known by in the community, but should not be anonymous, nor misrepresent
+whom you are.
+
+It is generally expected that the name and email addresses used in one of the
+``Signed-off-by`` lines, matches that of the git commit ``Author`` field.
+It's okay if you subscribe or contribute to the list via more than one
+address, but using multiple addresses in one commit just confuses
+things.
+
+If the person sending the mail is not one of the patch authors, they are
+nonetheless expected to add their own ``Signed-off-by`` to comply with the
+DCO clause (c).
+
+Multiple authorship
+~~~~~~~~~~~~~~~~~~~
+
+It is not uncommon for a patch to have contributions from multiple authors. In
+this scenario, git commits will usually be expected to have a ``Signed-off-by``
+line for each contributor involved in creation of the patch. Some edge cases:
+
+ * The non-primary author's contributions were so trivial that they can be
+ considered not subject to copyright. In this case the secondary authors
+ need not include a ``Signed-off-by``.
+
+ This case most commonly applies where QEMU reviewers give short snippets
+ of code as suggested fixes to a patch. The reviewers don't need to have
+ their own ``Signed-off-by`` added unless their code suggestion was
+ unusually large, but it is common to add ``Suggested-by`` as a credit
+ for non-trivial code.
+
+ * Both contributors work for the same employer and the employer requires
+ copyright assignment.
+
+ It can be said that in this case a ``Signed-off-by`` is indicating that
+ the person has permission to contribute from their employer who is the
+ copyright holder. It is nonetheless still preferable to include a
+ ``Signed-off-by`` for each contributor, as in some countries employees are
+ not able to assign copyright to their employer, and it also covers any
+ time invested outside working hours.
+
+When multiple ``Signed-off-by`` tags are present, they should be strictly kept
+in order of authorship, from oldest to newest.
+
+Other commit tags
+~~~~~~~~~~~~~~~~~
+
+While the ``Signed-off-by`` tag is mandatory, there are a number of other tags
+that are commonly used during QEMU development:
+
+ * **``Reviewed-by``**: when a QEMU community member reviews a patch on the
+ mailing list, if they consider the patch acceptable, they should send an
+ email reply containing a ``Reviewed-by`` tag. Subsystem maintainers who
+ review a patch should add this even if they are also adding their
+ ``Signed-off-by`` to the same commit.
+
+ * **``Acked-by``**: when a QEMU subsystem maintainer approves a patch that
+ touches their subsystem, but intends to allow a different maintainer to
+ queue it and send a pull request, they would send a mail containing a
+ ``Acked-by`` tag. Where a patch touches multiple subsystems, ``Acked-by``
+ only implies review of the maintainers' own areas of responsibility. If a
+ maintainer wants to indicate they have done a full review they should use
+ a ``Reviewed-by`` tag.
+
+ * **``Tested-by``**: when a QEMU community member has functionally tested the
+ behaviour of the patch in some manner, they should send an email reply
+ containing a ``Tested-by`` tag.
+
+ * **``Reported-by``**: when a QEMU community member reports a problem via the
+ mailing list, or some other informal channel that is not the issue tracker,
+ it is good practice to credit them by including a ``Reported-by`` tag on
+ any patch fixing the issue. When the problem is reported via the GitLab
+ issue tracker, however, it is sufficient to just include a link to the
+ issue.
+
+ * **``Suggested-by``**: when a reviewer or other 3rd party makes non-trivial
+ suggestions for how to change a patch, it is good practice to credit them
+ by including a ``Suggested-by`` tag.
+
+Subsystem maintainer requirements
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+When a subsystem maintainer accepts a patch from a contributor, in addition to
+the normal code review points, they are expected to validate the presence of
+suitable ``Signed-off-by`` tags.
+
+At the time they queue the patch in their subsystem tree, the maintainer
+**must** also then add their own ``Signed-off-by`` to indicate that they have
+done the aforementioned validation. This is in addition to any of their own
+``Reviewed-by`` tags the subsystem maintainer may wish to include.
+
+When the maintainer modifies the patch after pulling into their tree, they
+should record their contribution. This is typically done via a note in the
+commit message, just prior to the maintainer's ``Signed-off-by``::
+
+ Signed-off-by: Cory Contributor <cory.contributor@example.com>
+ [Comment rephrased for clarity]
+ Signed-off-by: Mary Maintainer <mary.maintainer@mycorp.test>
+
+
+Tools for adding ``Signed-off-by``
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+There are a variety of ways tools can support adding ``Signed-off-by`` tags
+for patches, avoiding the need for contributors to manually type in this
+repetitive text each time.
+
+git commands
+^^^^^^^^^^^^
+
+When creating, or amending, a commit the ``-s`` flag to ``git commit`` will
+append a suitable line matching the configured git author details.
+
+If preparing patches using the ``git format-patch`` tool, the ``-s`` flag can
+be used to append a suitable line in the emails it creates, without modifying
+the local commits. Alternatively to modify all the local commits on a branch::
+
+ git rebase master -x 'git commit --amend --no-edit -s'
+
+emacs
+^^^^^
+
+In the file ``$HOME/.emacs.d/abbrev_defs`` add:
+
+.. code:: elisp
+
+ (define-abbrev-table 'global-abbrev-table
+ '(
+ ("8rev" "Reviewed-by: YOUR NAME <your@email.addr>" nil 1)
+ ("8ack" "Acked-by: YOUR NAME <your@email.addr>" nil 1)
+ ("8test" "Tested-by: YOUR NAME <your@email.addr>" nil 1)
+ ("8sob" "Signed-off-by: YOUR NAME <your@email.addr>" nil 1)
+ ))
+
+with this change, if you type (for example) ``8rev`` followed by ``<space>``
+or ``<enter>`` it will expand to the whole phrase.
+
+vim
+^^^
+
+In the file ``$HOME/.vimrc`` add::
+
+ iabbrev 8rev Reviewed-by: YOUR NAME <your@email.addr>
+ iabbrev 8ack Acked-by: YOUR NAME <your@email.addr>
+ iabbrev 8test Tested-by: YOUR NAME <your@email.addr>
+ iabbrev 8sob Signed-off-by: YOUR NAME <your@email.addr>
+
+with this change, if you type (for example) ``8rev`` followed by ``<space>``
+or ``<enter>`` it will expand to the whole phrase.
+
+Re-starting abandoned work
+~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+For a variety of reasons there are some patches that get submitted to QEMU but
+never merged. An unrelated contributor may decide (months or years later) to
+continue working from the abandoned patch and re-submit it with extra changes.
+
+The general principles when picking up abandoned work are:
+
+ * Continue to credit the original author for their work, by maintaining their
+ original ``Signed-off-by``
+ * Indicate where the original patch was obtained from (mailing list, bug
+ tracker, author's git repo, etc) when sending it for review
+ * Acknowledge the extra work of the new contributor by including their
+ ``Signed-off-by`` in the patch in addition to the orignal author's
+ * Indicate who is responsible for what parts of the patch. This is typically
+ done via a note in the commit message, just prior to the new contributor's
+ ``Signed-off-by``::
+
+ Signed-off-by: Some Person <some.person@example.com>
+ [Rebased and added support for 'foo']
+ Signed-off-by: New Person <new.person@mycorp.test>
+
+In complicated cases, or if otherwise unsure, ask for advice on the project
+mailing list.
+
+It is also recommended to attempt to contact the original author to let them
+know you are interested in taking over their work, in case they still intended
+to return to the work, or had any suggestions about the best way to continue.
+
+Inclusion of generated files
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Files in patches contributed to QEMU are generally expected to be provided
+only in the preferred format for making modifications. The implication of
+this is that the output of code generators or compilers is usually not
+appropriate to contribute to QEMU.
+
+For reasons of practicality there are some exceptions to this rule, where
+generated code is permitted, provided it is also accompanied by the
+corresponding preferred source format. This is done where it is impractical
+to expect those building QEMU to run the code generation or compilation
+process. A non-exhaustive list of examples is:
+
+ * Images: where an bitmap image is created from a vector file it is common
+ to include the rendered bitmaps at desired resolution(s), since subtle
+ changes in the rasterization process / tools may affect quality. The
+ original vector file is expected to accompany any generated bitmaps.
+
+ * Firmware: QEMU includes pre-compiled binary ROMs for a variety of guest
+ firmwares. When such binary ROMs are contributed, the corresponding source
+ must also be provided, either directly, or through a git submodule link.
+
+ * Dockerfiles: the majority of the dockerfiles are automatically generated
+ from a canonical list of build dependencies maintained in tree, together
+ with the libvirt-ci git submodule link. The generated dockerfiles are
+ included in tree because it is desirable to be able to directly build
+ container images from a clean git checkout.
+
+ * eBPF: QEMU includes some generated eBPF machine code, since the required
+ eBPF compilation tools are not broadly available on all targetted OS
+ distributions. The corresponding eBPF C code for the binary is also
+ provided. This is a time-limited exception until the eBPF toolchain is
+ sufficiently broadly available in distros.
+
+In all cases above, the existence of generated files must be acknowledged
+and justified in the commit that introduces them.
+
+Tools which perform changes to existing code with deterministic algorithmic
+manipulation, driven by user specified inputs, are not generally considered
+to be "generators".
+
+For instance, using Coccinelle to convert code from one pattern to another
+pattern, or fixing documentation typos with a spell checker, or transforming
+code using sed / awk / etc, are not considered to be acts of code
+generation. Where an automated manipulation is performed on code, however,
+this should be declared in the commit message.
+
+At times contributors may use or create scripts/tools to generate an initial
+boilerplate code template which is then filled in to produce the final patch.
+The output of such a tool would still be considered the "preferred format",
+since it is intended to be a foundation for further human authored changes.
+Such tools are acceptable to use, provided there is clearly defined copyright
+and licensing for their output. Note in particular the caveats applying to AI
+content generators below.
+
+Use of AI content generators
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+TL;DR:
+
+ **Current QEMU project policy is to DECLINE any contributions which are
+ believed to include or derive from AI generated content. This includes
+ ChatGPT, Claude, Copilot, Llama and similar tools.**
+
+The increasing prevalence of AI-assisted software development results in a
+number of difficult legal questions and risks for software projects, including
+QEMU. Of particular concern is content generated by `Large Language Models
+<https://en.wikipedia.org/wiki/Large_language_model>`__ (LLMs).
+
+The QEMU community requires that contributors certify their patch submissions
+are made in accordance with the rules of the `Developer's Certificate of
+Origin (DCO) <dco>`.
+
+To satisfy the DCO, the patch contributor has to fully understand the
+copyright and license status of content they are contributing to QEMU. With AI
+content generators, the copyright and license status of the output is
+ill-defined with no generally accepted, settled legal foundation.
+
+Where the training material is known, it is common for it to include large
+volumes of material under restrictive licensing/copyright terms. Even where
+the training material is all known to be under open source licenses, it is
+likely to be under a variety of terms, not all of which will be compatible
+with QEMU's licensing requirements.
+
+How contributors could comply with DCO terms (b) or (c) for the output of AI
+content generators commonly available today is unclear. The QEMU project is
+not willing or able to accept the legal risks of non-compliance.
+
+The QEMU project thus requires that contributors refrain from using AI content
+generators on patches intended to be submitted to the project, and will
+decline any contribution if use of AI is either known or suspected.
+
+This policy does not apply to other uses of AI, such as researching APIs or
+algorithms, static analysis, or debugging, provided their output is not to be
+included in contributions.
+
+Examples of tools impacted by this policy includes GitHub's CoPilot, OpenAI's
+ChatGPT, Anthropic's Claude, and Meta's Code Llama, and code/content
+generation agents which are built on top of such tools.
+
+This policy may evolve as AI tools mature and the legal situation is
+clarifed. In the meanwhile, requests for exceptions to this policy will be
+evaluated by the QEMU project on a case by case basis. To be granted an
+exception, a contributor will need to demonstrate clarity of the license and
+copyright status for the tool's output in relation to its training model and
+code, to the satisfaction of the project maintainers.