Social Coding – Simple Things to Keep in Mind (updated)

The current trend of social coding finally arrived at ERNW! From now on, you will find our public released tools and scripts commonly on Therefore I would like to share some thoughts/guidelines which you have to keep in mind if you want to be a social coder:

Github and other repository hosts are great if you want to share opensource tools with the community, as they will find a common platform with defined workflows to extend/fix the work to get better software for everyone. What some should note is that (especially in terms of decentralized version control systems (DVCSs) like git, bazaar or mercurial) public really means public. Back in earlier days, if you shared your code with others you probably created a source code package of a defined version of your code. They will get the files you published, nothing more (and nothing less). Beginning with websites like sourceforge, a broader range of public VCSs came up (mostly driven by CVS or SVN). At this time, others were able to view your commit history (if you granted access to them) and all your mistakes you’ve done before your published code state (for example accidentally committed sensitive data). Those mistakes can still happen today. The difference with the DVCS used nowadays is that most of the time you have lesser control of your commit history (in the same way if someone had copied your history commit by commit in SVN, but then you may had noticed it because of the high network traffic). With DVCS, everyone gets a full copy (clone) of your repository even on a simple “checkout” (as called in SVN). This means he/she is able to search your history locally and has all the time he/she needs to do it. Even if you delete your repository (or modify the history), the original state is shared over all who have cloned it beforehand (thats one reason why DMCA takedowns are not that powerful/useful against git repositories).  Most of the public hosting platforms even included a search over all repositories (which is really useful if you want to find some tool or try to find the reason why a local tool doesn’t work).

This results in some important rules if you want to publish some work which may started as a private project, never designed to be publicly available:

  • Clean your files…
  • Be careful what you add to your repository (especially with something like git add <directory> which recursivly adds everything)
  • Most of the DVCS require to set a name and email for the commit messages. Such information is part of a commit, therefore part of the history, but not necessarily linked to your public account. This means that if you initially commit something with a private email address, you may want to modify the history to a more public one:
    • git filter-branch --env-filter '
      export GIT_AUTHOR_NAME="Timo Schmid";
      export GIT_COMMITTER_NAME="$GIT_AUTHOR_NAME"' -- --all
      • WARNING: Also changes your revision numbers, do it before someone cloned it (or let your colleagues reclone it)
    • If you already have a github account, you could use the email address they provided to you. It will act as a proxy and forward all mails to your private address.
  • If it isn’t necessary to publish the history, create a new repository containing only the cleaned files. This makes it a lot more easier to verify that no sensitive information will left the building.

So keep this checkpoints in mind and we are happy to see if someone finds our tools useful and may even contribute to their development.


My colleague Benjamin Schendemann mentioned a tool called gitrob recently. It can scan git repositories for typical filenames (same as above but automated), which contain critical data like passwords or other credentials (like any settings or config-file). This would make it even easier for others to spot your mistakes 😉