Git Troubleshooting & Tips
This troubleshooting guide is mostly for data stewards. It is not a git tutorial, but rather a small start.
ARCitect – first aid checklist
Section titled ARCitect – first aid checklistThis checklist should help to identify common issues with an ARC. It may help data stewards and users to identify the overall status of the ARC and a user’s setup.
Git installation
Section titled Git installationCheck whether git and git-lfs are installed and executable.
- Open command line via ARCitect —> Tools —> Command Window
- Type and execute
git --version
andgit-lfs --version
If instead of showing the versions for both tools this returns an error, something may be wrong with the Git installation or storage location of the ARC.
Commits
Section titled CommitsCheck commits and compare those of the local ARC vs. the ARC in the DataHUB.
- Local: ARCitect —> History Menu
- DataHUB: ARC —> right sidebar —> Number of Commits
The commits (incl. commit message, date and committer details) in the DataHUB should be the same as in the local ARC. If not, this might help to identify at what time point (i.e. between which commits) something unexpected happened.
Check the ARC size and compare that of the local ARC vs. the ARC in DataHUB.
- Local: Open the ARC folder via ARCitect —> Explorer. Right-click the folder name and inspect the size via
Properties
(Windows) orGet Info
(macOS) - DataHUB: ARC —> right sidebar —> Project storage
Note, that the size of your local ARC is approximately double the size of that in the DataHUB. This is due to git’s version control mechanism. If the size is very different, the ARC synchronization was likely not successful.
Large files
Section titled Large filesCheck whether large files are properly uploaded into LFS.
- Local: Inside ARCitect, large files should be flagged with
LFS
in the file tree - DataHUB:
- ARC —> right sidebar —> Project storage: The size of
LFS
should be high, that ofRepository
should be low - Just like in ARCitect, large files should be flagged with
LFS
in the file tree
- ARC —> right sidebar —> Project storage: The size of
If large files are unexpectedly not flagged as LFS, please check the details on Git-LFS and some trouble shooting in the section below.
Remote
Section titled RemoteCheck the remote connection.
- ARCitect —> DataHUB sync —> Remote
Make sure, that the remote URL is correct and aligns with that of the ARC in the DataHUB. This may not be the case, if the local ARC’s folder name was changed or if the ARCs URL in the DataHUB has changed to moving the ARC or adapting the URL.
Branch
Section titled BranchCheck the current branch.
- ARCitect —> Commit Menu —> Dropdown “Branch”
For most use cases, the main
branch should be selected.
If the branch dropdown does not display main
, something may be wrong with the status of your ARC. Please contact a data steward for help.
Status
Section titled StatusCheck the current git status.
- Open ARC in a command line via ARCitect —> Tools —> Command Window
- Run
git status
See Git status for details.
Config
Section titled ConfigCheck the git configuration
- Open ARC in a command line via ARCitect —> Tools —> Command Window
- Run
git config --list
See Git config for details.
Gitignore
Section titled GitignoreCheck whether a .gitignore
file exists in the ARC.
If no .gitignore
exists, this can lead to unexpected behavior for temporary, hidden files.
This article explains how to add a .gitignore
file.
Storage
Section titled StorageIdentify the storage location.
- Is the ARC stored on a mounted external hard drive, network or server?
- Is the ARC stored in a cloud folder (e.g. Dropbox, iCloud, Sciebo, OneDrive)?
If so, create a new test ARC in the same and another “local-only” location (i.e. non-cloud, non-server) to check whether the issue persists. If this solves the issue, something may be tricky with your cloud or network connection. Please contact a data steward for help.
ARC intactness
Section titled ARC intactnessIdentify whether the ARC is intact with all expected files being present.
- Was the ARC only partially moved or copied from one location to another (i.e. without the hidden
.git
folder)? - Was the ARC downloaded from the DataHUB without LFS objects and tried to upload to another remote?
If so, make sure to move or copy the complete ARC folder and make sure to download the ARC including all LFS objects (not recommended for large ARCs).
Debugging
Section titled Debugging-
(if required) Install Git on user machine
-
navigate to the ARC in trouble (via one of many options below)
- On macOS: you can drag&drop the ARC folder from Finder into a terminal
- On macOS: right-click ARC folder—>“Services”—> “New Terminal at Folder”
- On windows: open folder via Explorer; type “cmd” or “powershell” into the address field on top of Explorer
- On linux / macOS terminal:
cd path/to/ARC
- From inside ARCitect: Tools -> Command Window
- try some of the git commands and debugging below
Error messages
Section titled Error messagesThis is a list of common error messages, if there is an error with the setup or ARC synchronization.
The errors are displayed during synchronization via ARCitect (pop-up windows in the menus Commit or DataHUB sync) or during ARC Commander’s arc sync
.
error message* | possible reason | possible solution |
---|---|---|
remote: HTTP Basic: Access denied fatal: Authentication failed for 'https://git.nfdi4plants.org/UserName/ARCname' | Your computer is not “linked” to your DataHUB account | Access Denied |
error: failed to push some refs to 'https://git.nfdi4plants.org/UserName/ARCname' hint: Your push was rejected due to missing or corrupt local objects. | You tried to upload LFS-tracked files that are not present on your computer | Missing LFS Objects |
remote: GitLab: LFS objects are missing. Ensure LFS is properly set up or try a manual "git lfs push --all" | You tried to upload LFS-tracked files that are not present on your computer | Missing LFS Objects |
LFS: PUT "<https://git.nfdi4plants.org/.../...>" read tcp ... i/o timeout | You ran into a time out, likely due to very large single files | Prevent LFS time out error |
error: failed to push some refs to 'https://git.nfdi4plants.org/UserName/ARCname' hint: Updates were rejected because the remote contains work that you do not have locally. | Your local ARC is out of sync with the remote. | ARC not in sync with the DataHUB |
ERROR: Can not sync with remote as no remote repository address was specified. | There is no URL specified for your ARC’s remote | Git remote |
ERROR: GIT: fatal: repository 'https://git.nfdi4plants.org/UserName/ARCname.git' not found | The remote URL does not exist | Git remote |
ERROR: GIT: fatal: detected dubious ownership | This is an error typically seen when working on mounted network drives | Dubious ownership |
fatal: credential-cache unavailable; no unix socket support | Likely happens on Windows, if a gitconfig contains credential.helper=cache | Adjust the Git Credential helper setting |
fatal: Need to specify how to reconcile divergent branches. | Your ARC contains multiple branches that progressed independently and need to be merged | Contact a data steward. |
error: unable to create file <path/to/file> : Filename too long | Likely occurs on Windows, if your ARC or files in your ARC are stored in a deeply nested folder, i.e. a folder in a folder in a folder … | Allow very long file names |
UNC paths are not supported. Defaulting to Windows directory. | Might be due to working on a network drive or server. | tbd Please contact a data steward for support. |
Your two favorite Git commands: status and log
Section titled Your two favorite Git commands: status and loggit status
Section titled git statusTo get a good summary of the ARC including
- the branch you are on
- files that were committed since last commit
- files that were modified, but not committed (tracked)
- typically anything buggy
If everything’s clear and committed, this should prompt something like
Your branch is up to date with … nothing to commit, working tree clean
git log
Section titled git logNow, to compare the status of the local clone vs. that of the remote (i.e. the DataHUB) with a bit more confidence and wording, use
This displays the commit history (messages) of the ARC reverse-chronologically, i.e. top-most = latest. So if the top commit message of the local ARC is different from the last commit message displayed in the DataHUB, the ARC is out of sync.
If you like it prettier, remember “a dog”…
Hit q to close the log.
Git configuration
Section titled Git configurationThe gitconfig is basically the settings and preferences for your git installation. There are three types of gitconfigs. Depending on the tool (ARCitect, ARC Commander) and operating system (macOS, Linux, Windows), different git settings may be received from different config files.
flag | meaning |
---|---|
—global | current user on that computer |
—system | system-wide (all users) |
—local | current repository (ARC) |
Checking the git config
Section titled Checking the git configThe following command lists all configurations and where they originate (—show-origin) from and what there scope is (—show-scope).
In order to only show e.g. the global gitconfig use
Recommended git configurations
Section titled Recommended git configurationsWhen executed inside an ARC folder, the git config --list
should contain the following configurations
configuration | explanation |
---|---|
user.name | Should display the user’s DataHUB account name |
user.email | Should display the user’s DataHUB account email address |
credential.helper | Whether and how DataHUB credentials are stored. Should be credential.helper=store (Windows, Linux) or credential.helper=osxkeychain (macOS) |
core.longpaths=true | Allows to have very long file names or nested folder structures. |
init.defaultbranch=main | Provides that newly created ARCs work on a main branch |
filter.lfs.process=git-lfs filter-process , filter.lfs.required=true , filter.lfs.clean=git-lfs clean -- %f , filter.lfs.smudge=git-lfs smudge -- %f | These four settings are required for LFS |
lfs.activitytimeout=0 | Circumvents a time our error, when trying to push ARCs to the DataHUB with very large files. |
Changing git config
Section titled Changing git configEditing the respective gitconfig is ideally done via command line.
Adapt user name and email
Section titled Adapt user name and emailSet main as default branch
Section titled Set main as default branchGit Credential Helper
Section titled Git Credential HelperThe gitconfig contains a setting, whether and how to save git credentials on your machine called credential.helper
.
On Windows, you might run into the error fatal: credential-cache unavailable; no unix socket support
, if it is set to credential.helper=cache
.
This can be solved by either of the following:
- Remove “credential.helper=cache” via
git config --global --unset credential.helper
. - Overwrite the setting with “store” instead of “cache” via
git config --global credential.helper store
.
Allow very long file names
Section titled Allow very long file namesUsers (especially on windows) run into errors with long overall file names (i.e. full path). This setting should fix it:
Git remote
Section titled Git remoteFor ARCs the “remote” is the DataHUB. The remote address (ARC url) is stored in the git of the local ARC. Display the URL, to which the local ARC is connected via
Adding a remote during arc sync
Section titled Adding a remote during arc syncA default remote is usually added by ARC Commander or ARCitect.
If the ARC does not yet exist in the DataHUB, and you created it via ARC Commander and synced it via arc sync
, you will see this error:
This is not to worry about, the ARC was created in the DataHUB during this process.
If you only see the error ERROR: GIT: fatal: repository 'https://git.nfdi4plants.org/UserName/ARCname.git/' not found
, but not the following lines mentioning that the ARC was created automatically, make sure to use the “force”, i.e. arc sync --force ...
.
Adding a remote via git
Section titled Adding a remote via gitIf above command does not display any remote, you can add one via
Editing a remote
Section titled Editing a remoteYou can edit a remote via
Branches
Section titled BranchesAs of now, the DataPLANT tools focus on working on a single branch (main
).
It can still happen that your ARC has multiple branches e.g. by accident (see git config
—> init.defaultbranch
) or because some git-affine collaborator knows how to create them.
To display the branches of the local ARC, use
If you also want to display branches that exist on the remote (but not locally), use
Git LFS
Section titled Git LFSGit LFS is basically the system in the back to simplify working with git and (ARCs containing) large data files. ARC commander and ARCitect offer options to download (clone) an ARC without large files; speeding up the process and avoiding waste of data storage, if you are only interested e.g. in the metadata.
In order to properly upload large(r) files to the DataHUB via “pure git” (i.e. on the command line) or via ARC Commander or ARCitect, Git-LFS needs to be initiated on every computer (and user account) before using these tools.
Initiating git-lfs
Section titled Initiating git-lfsChecking whether LFS (large file storage) works properly for your ARCs
Section titled Checking whether LFS (large file storage) works properly for your ARCs- In ARCitect, you can see large files (defined by the threshold in the commit menu) flagged as
LFS
in the file tree - In the DataHUB LFS files are also flagged as
LFS
. In addition, you can click in the right sidebar of your ARC in the DataHUB on “Project Storage”. Here, the major amount of your data should be stored in “LFS”, while only a minor part is stored in “Repository”.
Via command line
Section titled Via command line- If you have git-lfs installed and know how to use there command line, simply run
git lfs install
. - You can check for the proper configuration via
git config --list --show-origin --show-scope
. Amongst others, the config should contain the following lines
Manually
Section titled ManuallyIn your home folder (Windows: C:/Users/<UserName>
, macOS: Users/<UserName>
), create or edit the file called .gitconfig to include the following lines.
Prevent LFS Time out error
Section titled Prevent LFS Time out errorWhen users try to upload very large files, i.e. not the overall push size, but single-very-large-files, they might run into a time out error. This setting should fix it:
Missing LFS objects
Section titled Missing LFS objectsThe following errors are related to missing LFS object:
Possible reasons, why this happens:
- you have downloaded (cloned) an ARC without the large files (i.e. only the pointer files) and try to upload it to another location on the DataHUB (i.e. new remote due to a transfer to other user, group, etc. or renamed ARC)
- you moved a pointer file (instead of an actual large file) from one ARC on your computer to another ARC and tried to upload
In this case you would have to download all LFS objects from the original remote first -> ask a data steward for help.
Step-by-step track large file(s) via LFS
Section titled Step-by-step track large file(s) via LFSDone in small steps plus logging. Note this works on shells like macOS terminal, linux terminal, Git Bash (available for Windows). This likely does not work on Windows Powershell and definitely not in Windows command prompt.
-
Track files via LFS (this adds them to .gitattributes)
-
git track the
.gitattributes
file first -
Git add the large files
-
Git commit (and write what’s happening to a log file)
-
Git push (and write what’s happening to a log file)
Check the status of LFS-tracked files
Section titled Check the status of LFS-tracked filesList LFS-tracked files
Section titled List LFS-tracked filesTo get a list of LFS-tracked files including the size of the original file, run
This will display the object ID (oid), the relative path to the file and the object size. The oid is also stored in the pointer file at the file’s position.
Debug LFS-tracked files
Section titled Debug LFS-tracked filesTo get a report of all LFS-tracked files including there status, use
Amongst others, this report will print for every LFS file, whether it is downloaded (checkout: true; download: true
) to the local ARC or not (checkout: false; download: false
).
Common issues and error messages
Section titled Common issues and error messagesARC files opened in multiple programs
Section titled ARC files opened in multiple programsA common source for issues are multiple programs that work on the ARC in parallel.
-
In particular, working on the ARC with multiple softwares that have Git integration may lead to confusion. For instance, while you sync the ARC using ARCitect or ARC Commander, the changes may still be displayed as un-committed in VSCode, RStudio, PyCharm or other third-party software.
-
Many softwares produce hidden temporary files. By default these files are not shown or synced by the ARCitect or ARC Commander. They might still sometimes lead to confusion, e.g. not being able to commit changes. This is especially the case for office software (Excel, Word, LibreOffice, etc.), where e.g. one of the ISA files (
isa.investigation.xlsx
,isa.study.xlsx
,isa.assay.xlsx
) or another office file stored in the ARC may be open. However, also ARCs opened in Windows Explorer or macOS Finder sometimes led to issues. -
Before syncing an ARC, close all ARC-files and Explorer / Finder windows
-
Avoid to edit, delete, or move files, while the ARC is being synced to the DataHUB
ARC not in sync with the DataHUB
Section titled ARC not in sync with the DataHUBYour local ARC is likely out of sync with the remote. This happens, if you or an invited colleague work(s) on the same ARC from a different location (e.g. the DataHUB or another computer). Before working on your ARC, make sure to update the local clone via one of these
- ARCitect —> Versioning —> Pull
arc sync
git pull
(-> this would also prompt a message if changes need to be merged)
Access denied
Section titled Access deniedSometimes you run into permission issues such as
This is due to missing or outdated DataHUB credentials on your computer. It usually helps to just retrieve new ones. If not, you might have to remove existing credentials stored on your computer.
Authenticate the computer
Section titled Authenticate the computerOption 1: via ARC Commander
Option 2: “by hand”
- Login to the DataHUB
- Create a new Personal Access Token (PAT) with scope
api
- Run a git command (e.g.
arc sync
,git pull
) to trigger being asked for git credentials- Provide your DataHUB username
- Use the token instead of your password
Delete stored credentials
Section titled Delete stored credentialsIf (new) authentication alone does not help, you might need to delete existing tokens or passwords first.
-
Run
git config --get-regexp "credential"
to find out whether and where credentials are stored -
This typically displays one of the following
credential.helper store
credential.helper osxkeychain (only on macOS)
-
If
credential.helper store
is displayed, the credentials are typically stored in~/.git-credentials
, a hidden text file stored in the user’s home folder. Edit this file and delete the row(s) containing “git.nfdi4plants.org” (https://<UserName>:<Token>@git.nfdi4plants.org
). -
On macOS (if
credential.helper osxkeychain
is displayed) open the app “Keychain Access”, search and delete passwords for “git.nfdi4plants.org”.
Dubious ownership
Section titled Dubious ownershipThe error ERROR: GIT: fatal: detected dubious ownership
typically occurs when working on a mounted network drive (Fileshare, File Server, NAS). Very simplified: the user on the computer and the owner of the network drive differ and git tries to safe you from working in a folder you do not own.
You can add the path to the ARC to the list of safe directories via the command
You can circumvent this error by adding all directories to your list of safe directories via the command
Get more log
Section titled Get more logTo help troubleshooting add (some or all) variables GIT_CURL_VERBOSE=1 GIT_TRACE=1 GIT_TRACE_PACKET=1
before your git command to get more info, e.g.