Modular Repositories With Git

Posted by Krithik Chandrashekar on Feb 23, 2017 2:04:00 PM
Find me on:

 

Submodules

Modular programming is a design technique that has proven to be a generally beneficial approach for software development. It leads to code that is easier to extend, reuse and debug. To take the approach to the next level would be incorporating entire projects as modules under another project. This scenario becomes commonplace when including open source libraries into your projects. 

This strategy generally works best when integrated with formal version control systems like git or svn. Depending on how this is setup with your preferred choice of version control, it could leave you with a reasonably maintainable structure to work with or an unsightly mess that is difficult to work with. This is especially if you do not clearly understand the shortcomings of a particular approach or even the lack awareness of an alternative.

With git specifically, the default and most popular approach seems to be going with the submodule route. And the setup is straightforward, say for instance you want to include a sub-module within an existing git repository.

     $ git submodule add <submodule_source_remote_url_here>

This will add a .gitmodules file with details about the sub-module and an empty directory which takes the name of the git repository. When cloning a fresh copy of this repository, you will need to add the --recursive option to bring in the files for the submodule.

     $ git clone --recursive    

When changes to the submodule are applied independently on the submodule repo itself, changes can be pulled in.

     $ git submodule update --recursive

One of the drawbacks when including external open-source projects as submodules is changes made to the files in the submodule folder will remain local and cannot be propagated directly to the remote source repository for which you may not have permissions.

For this reason, it might be beneficial to host a local fork of the source repository in your own flavor git host (GitHub, Bitbucket, GitLab etc.) for which you have permissions to make changes and then propagate them down.

     $ cd <submodule_folder>

    $ git add .

    $ git commit -m "write something meaningful"

    $ git push

    $ cd ../

    $ git add <submodule_folder>

    $ git push

  

Subtrees

Another alternative that is less commonly used is subtrees. Working with subtrees will add a few more steps to the process compared to submodules but might be the better solution to this problem. With subtrees, you no longer just maintain a reference file to the source files of your module, but you include all the files of the modules instead. Managing subtrees comes down to just managing remotes

In your main project

  • Main Project Remote: The remote for the project that you are working on that will use the module
  • (Module) Sub-tree Project Remote: This will point to the locally hosted fork of the original repository

 The first remote should automatically be present if you have cloned the repository from the remote url. It needs to be added explicitly if you have initialized a non-git folder as a git project. The second remote will need to be added explicitly.

    $ git clone <url_main_project>

    $ git remote add -f <name_module_remote> <url_module_remote>

    $ git subtree add --prefix <submodule_folder_name> <submodule-remote-name> master --squash

This should bring in all files from the module repository. The '--squash' argument is necessary to prevent history from the subtree getting mangled to the main repository's git history.    

   

In your module project

  • (Module) Project Remote:  This will point to the locally hosted fork of the original repository
  • (Module) Project Source Remote: This will point to the source repository from whence it was forked

 The second remote for the module project is optional and exists only if you are using an external project as a module.

This way, you can pull upstream changes to the from the source to the fork first and then have the option to propagate the same changes to any projects that make use of this module.

    $ git remote add -f <source-remote-name> <source-url>

    $ git pull <source-remote-name> <working-branch>

 

Pulling changes from module to main project

If you have made commits to your module project and you want to bring in those changes to your main project.

            $ git subtree pull --prefix=<submodule_folder_name> <submodule_remote_name> master --squash

 

Pushing changes from main project to module project

            $ git subtree push --prefix=<submodule_folder_name> <submodule_remote_name> master

 

 

Topics: Repositories