#!/bin/sh

usage() {
cat << HOOOEY

=head1 NAME

replication_ABC.sh(1) - zfs replication from one ZFS server to another to a third

=head1 SYNOPSIS 

 on server A:

env NEVER_DELETE_REMOTE_SNAPSHOTS=1 replication_ABC.sh tank/thefs serverB tank/repltarget/thefs ABC_thefs '-i /data/ssh/replicationkey' zfssendopt='' zfsrecvopt=''

 on server B:

env SECOND_TO_THIRD=1 replication_ABC.sh tank/repltarget/thefs serverC tank/repl3/thefs ABC_thefs '-i /data/ssh/replicationkey' zfssendopt='' zfsrecvopt=''

=head1 CMD LINE ARGUMENTS

Each replication relationship is described by a four tulple which are the required argements that define a replication flow

	1) lfs: local file system source , must be a zfs dataset  eg: tank/usr/home

	2) rhost: remote host replication taret running rsh compatible transport   eg:tardis.co.uk

	3) rfs: remote file system  target must be an existant dataset.  eg: dozer/repltarget/tank_usr_home

	4) flowtag: a string that describes the flow  eg: ash-home-lab_to_tardis_transatlantic_flow NOTE: use the SAME flow tag on AB and BC.

	5) sshopt: in quotation marks options to pass to ssh, e.g. the ssh key to authorize login

	6) zfssendopt: in quotation marks options to pass to zfs send, e.g. zfssendopt=''

	7) zfsrecvopt: in quotation marks options to pass to zfs recv, e.g. zfsrecvopt='-x encryption'

=head1 DESCRIPTION

zfs replication script with contributions from many folks.

=head2 resuming using token

If the source and target support zfs send and receive with a resume
token, this script will reach out to the target and find a resume
token if there is one. However if the version of zfs on the source or
the target is too old to support resume token functionality, you must
set SUPPORT_RESUME_TOKEN=0 or it will error out.

=head2 A->B->C replication

We use the SAME "tag" for A-B and B-C. This is necessary because we
want to use the most recent available snapshot on B as the older half
of the delta when going from B to C.

If running on A, set NEVER_DELETE_REMOTE_SNAPSHOTS=1 . This is
important so that snapshots on B will be available for use in the BC
copy. See HOW (NOT TO) EXPIRE SNAPSHOTS ON B below.

If running on B, set SECOND_TO_THIRD=1 . This has mostly 2 effects:

1. Never create a new snapshot on B. (If you did this, you would in
effect change B such that incremental A-B replication would break, or
the A-B replication would destroy the new snapshot when running with
zfs recv -F). Instead, find the most recent already available snapshot
on B.

2. Do not delete snapshots on B. See HOW (NOT TO) EXPIRE SNAPSHOTS ON
B below.

=head1 HOW (NOT TO) EXPIRE SNAPSHOTS ON B

The middle zfs server has a problem. Either replication script might
delete snapshots that the other one might need. The simplest example
is, suppose a level 0 copy is in progress from B to C. Meanwhile, many
incrementals happen from A to B. If, after the level 0 finishes, an AB
incremental expires the (very old) snapshot that was used for the
level 0, then the next BC copy will not be able to be incremental and
will have to redo the level 0 again. A scenario where the BC script
deletes a snapshot needed by the AB script is also possible, though
harder to intuitively imagine.

For this version of this script and documentation, the solution is
that snapshots on B are never automatically deleted. They must be
deleted manually by the admin checking out of band that neither AB nor
BC scripts need that snapshot.

=head1 OLDER DOCUMENTATION from Ash and Mohan

 The code had some bugs and some incompleteness.
 It has been substantially modified for NBER use. - Mohan 2021/05/27

 Shared with no implied warranty or suitability for purpose; this will probably lose data.

 aeria zfs mover 
 ash@aeria.net

 ./mover.sh barrel/tmp          toohey.aeria.lab  z/repli/reaver/tmp        reaver-tmp_to_toohey_flow '-i <sshkey> zfssendopt='...' zfsrecvopt='-x encryption'
            (local file system) (remote host)     (remote file system)      (flowtag)			(sshopt)	(zfssendopt)	(zfsrecvopt)

 NBER example:
 To replicate freenas2:tank/homedirs to datacenter1:tank/freenas2_replication/homedirs passing no options for zfs send but passing '-x encryption' for zfs receive do the following:
 sh ./replication_from_truenas_to_another.sh tank/homedirs datacenter1.nber.org tank/freenas2_replication/homedirs freenas2_homedirs_to_datacenter1_freenas2_replication '-i /data/ssh/replication' zfssendopt='' zfsrecvopt='-x encryption'

 reference: https://www.slideshare.net/MatthewAhrens/openzfs-send-and-receive
zfs  restartable delta sigma replication 

 Initally the mover will transmit a bulk snapshot to prime later instances of incremntal sends.
 Snapshots are created and destroyed autmatically and retained only slightly longer than needed. 
 The mover is restartable; zfs resume tokens are queried for restarts; no extra configuration is required. 
 Each flow is rentrant safe.  Feel free to schedule it in a tight loop or  from cron.  
 Not BUGS exactly:
 Recursion is not suported. 
 This script must manage all the snapshots on both zfs  datsets. External removal of snapshots is discouraged.  
 Snapshots retained longer than strictly required because paranoia abounds. 
 Flowtag is used as part of a regex on remote zfs list operations to identify sn apshots that belog to this replication context. 
 If a flowtag is a superstring of another, behaviour is undefined. 
 No user properites, throttling, rebuffering, readonly status or holds are managed. 

=head1 FIXME

 Needs more argument checks; always more. Invarients. Paranioa. 
 Push is not always nice; Needs a pull mode to run in a 'bunker' where network a ccess is asymmetric due to nat. 
 Bulk transport mode; get ssh and perpaps userland out of the way. Perform 3rd party orchestration remotely away from the push host. 

=head1 CONTRIBUTORS

 Ash Gokhale from Klara Systems. (Ash Gokhale <ashfixit@gmail.com>)
 Mohan Ramanujan from NBER (mohan@nber.org)
 Alex Aminoff from nber (aminoff@nber.org)
 ... with advice from others at Klara Systems

=cut

HOOOEY

}
if [ $# -ne 7 ]; then 
	usage 
	exit  0
fi

SUPPORT_RESUME_TOKEN="${SUPPORT_RESUME_TOKEN:=1}"
SECOND_TO_THIRD="${SECOND_TO_THIRD:=0}"
NEVER_DELETE_REMOTE_SNAPSHOTS="${NEVER_DELETE_REMOTE_SNAPSHOTS:=0}"
MAXOLDSNAPSHOTS=8

#local dataset to send
lfs=${1}
#remote host to receive
rhost=${2}
#remote zfs dataset
rfs=${3} 
#tracking tag which we use to brand snapshots for our exclusive use to 
#please use a decriptive name that describes the replication relationship
mytag=${4} 
thedate=`date +"%s"`
sshopt=${5}
zfssendopt=`echo ${6} | sed 's/zfssendopt=//g'`
zfsrecvopt=`echo ${7} | sed 's/zfsrecvopt=//g'`

# special case so that we can sub this in for old replication script
# for AB copy to datacenter1

send_verbose_arg=" -v "

# I advocate any rsh compatible pipe transpport,  ssh is ok I guess; netcat transport orchestration would be better 
rsh=" ssh  "  
#gild the ssh opts
#rsh=" $rsh -o CompressionLevel=9  -o Compression=yes"  
rsh=" $rsh ${sshopt} "		# The external ssh parameters added here - mohan
rsh=" $rsh -o ConnectionAttempts=5 "  
#rsh=" $rsh -o ForwardX11=no -o LogLevel=INFO "  
#rsh=" $rsh -v  "  

echo "=================================================================================================="
echo "`date`"
echo "Running replication of ${lfs} from `hostname` to ${rhost} ${rfs}"
echo "the time is now: ${thedate}. we are sending $lfs.$mytag to $rhost:$rfs "

if [ `uname -r | cut -d. -f 1` -eq 9 ]; then
    echo Setting SUPPORT_RESUME_TOKEN=0 because FreeBSD version is 9 at NBER
    SUPPORT_RESUME_TOKEN=0
fi
if [ $rhost = 'datacenter1.nber.org' ]; then
    echo Setting NEVER_DELETE_REMOTE_SNAPSHOTS=1 because remote host is datacenter1 at NBER
    NEVER_DELETE_REMOTE_SNAPSHOTS=1
fi
holdtag=REPL
if [ $SECOND_TO_THIRD -eq 1 ]; then
    holdtag=REPLBC
fi

echo "  does this NAS support a resume token:${SUPPORT_RESUME_TOKEN}"
echo "  copy existing snapshots only to a third NAS:${SECOND_TO_THIRD}"
echo "  never delete remote snapshots:${NEVER_DELETE_REMOTE_SNAPSHOTS}"
echo "External arguments passed:"
echo "local dataset: lfs=${1}"
echo "remote host to receive: rhost=${2}"
echo "remote zfs dataset: rfs=${3}"
echo "snapshot tag: mytag=${4}"
echo "Options to pass to ssh: sshopt=${5}"
echo "Options to pass to zfs send: zfssendopt=${zfssendopt}"
echo "Options to pass to zfs recv: zfsrecvopt=${zfsrecvopt}"
echo "--------------------------------------------------------------------------------------------------"


lockfile_cleanup() {
	rm $mover_lockfile $remote_flow_snapnumbers_file $local_flow_snapnumbers_file || echo "can't kill lockfile"
}

hold_cleanup() {
    if [ "x$thishold" = "x" ]; then
        # no hold to clean up
    else
        echo cleaning up hold $thishold
        zfs release $thishold
    fi
}

catch_trap() {
    echo "caught trap pid $$  $* for $mytag -  cleaning up locks and dying"
    hold_cleanup
    lockfile_cleanup
    exit -99
}
child_trap (){
    if [ "$?" != "0" ]; then 
	# trap context elides some of the normal shell context
	echo "got abnormal exit code $? from $! $*"
	catch_trap
    else
	echo -n "."
    fi 
}

mover_lockfile=`mktemp /tmp/.mover-$mytag.lockXXX` ||  exit -4
remote_flow_snapnumbers_file=`mktemp /tmp/.mover$mytag-rfs.XXX` || exit -5
local_flow_snapnumbers_file=`mktemp /tmp/.mover$mytag-lfs.XXX` || exit -6

echo tracking remote flow in $remote_flow_snapnumbers_file, local flow  in $local_flow_snapnumbers_file

trap catch_trap TERM INT KILL BUS FPE 2 CHLD
trap child_trap CHLD

snapshot_now () {
    #XX parameterise and armour
    ##xxx we might not shoot a snap untill unless there are no snaps to send or 
    # resumable replication can proceed
    if [ $SECOND_TO_THIRD -eq 1 ]; then
        echo "Must not create a new snapshot when SECOND_TO_THIRD is set"
        echo "Instead we get most recent lfs snapshot"
        get_lfs_snaps
        lastlocalsnap=`tail -1 $local_flow_snapnumbers_file`
        if [ "x$lastlocalsnap" = "x" ]; then
            echo "FATAL ERROR: running in SECOND TO THIRD mode but there are no local snaps on this filesystem to use"
            exit -20
        fi
        echo "lastlocalsnap:${lastlocalsnap}"
        nowsnapname="${lfs}@${lastlocalsnap}"
        echo "nowsnapname:$nowsnapname"
    else 
        echo "Taking new snapshot=> nowsnapname=${lfs}@$mytag.${thedate}"
        nowsnapname="${lfs}@$mytag.${thedate}"
        zfs snapshot -r $nowsnapname || exit -10
    fi
    zfs hold  -r $holdtag $nowsnapname || exit -10
    thishold="$holdtag $nowsnapname"
    #update the local snapshot flow catalog
    get_lfs_snaps
}

get_rfs_snaps () {
	# parameters @$rfs , $mytag
	# side effect updates rfscount
	echo "Remote snapshots in flow:"
	is_rfs=`${rsh} ${rhost} "zfs list -H ${rfs} >& /dev/null"; echo $?`
	if [ ${is_rfs} == '1' ]; then
		echo "rfs does not exist"
		rfscount=0
	else
		$rsh $rhost "zfs list -Hr  -t all -o name ${rfs}" | grep $mytag | cut -f2 -d@ >  $remote_flow_snapnumbers_file
		echo "Remote_flow_snapnumbers_file:" ; cat $remote_flow_snapnumbers_file
		rfscount=`wc -l $remote_flow_snapnumbers_file | cut -b1-8`
	fi
}

get_rfs_resume_token (){
    if [ $SUPPORT_RESUME_TOKEN -eq 0 ]; then
        resume_token=""
        return
    fi
	in_host=$1
	in_remote_dataset=$2
	is_rfs=`${rsh} ${in_host} "zfs list -H ${in_remote_dataset} >& /dev/null"; echo $?`
	if [ ${is_rfs} == '1' ]; then
		echo "rfs does not exist therefore no resume_token"
		resume_token=""
	else
		resume_token=`$rsh $in_host "zfs get -H -o value receive_resume_token $in_remote_dataset"`
		if [ ${resume_token} == '-' ]; then
			resume_token=""
			echo "No resume_token found."
		else
			echo "Resume token from ${in_host}:${in_remote_dataset} = ${resume_token}"
		fi
	fi
}

get_lfs_snaps() {
	echo "Current local snapshots:"
	zfs list -Hr -t all -o name  ${lfs}  | grep $mytag |  cut -f2 -d@ > $local_flow_snapnumbers_file
	cat $local_flow_snapnumbers_file
}

get_lfs_snaps
get_rfs_snaps 

if [ $rfscount -eq 0 ]; then
    echo "no remote snapshots found for $rhost: $rfs full  initial bulk tx from $lfs to $rfs "
    # checking SUPPORT_RESUME_TOKEN is inside function call
    get_rfs_resume_token  $rhost $rfs	
    if [ ${#resume_token} -le 30 ]; then 
	arg_resume_token=""
	echo "Begin Level 0 replication."
        if [ $SECOND_TO_THIRD -eq 1 ]; then
            echo "no token found we shall use last local snapshot"
        else
	    echo "no token found we shall need a new initial transmit snapshot"
        fi
	snapshot_now
	## side effecct generates $nowsnapname
        echo "Command used to run: zfs send ${zfssendopt} $send_verbose_arg ${nowsnapname}  | $rsh $rhost \"$zfs_recv_buffer zfs recv ${zfsrecvopt} -sF $rfs\""
	zfs send ${zfssendopt} $send_verbose_arg ${nowsnapname}  | $rsh $rhost "$zfs_recv_buffer zfs recv ${zfsrecvopt} -sF $rfs"
    else # resume token processing
	arg_resume_token="-t $resume_token"
	# we don't use nowsnap; but rather the old snapshot;
        # which we really hope is around because 
	# we have no idea about it's name apriori from the token data
	# so please never delete our flowtag snaphots unless you are willing to give up replication
        echo "Command used: zfs send ${zfssendopt} $send_verbose_arg $arg_resume_token    | $rsh $rhost \" $zfs_recv_buffer zfs recv  ${zfsrecvopt} -sF $rfs\""
	zfs send ${zfssendopt} $send_verbose_arg $arg_resume_token    | $rsh $rhost " $zfs_recv_buffer zfs recv  ${zfsrecvopt} -sF $rfs"
    fi 
else 
    echo "remote snapshots exist."
fi #no remote snapshots 

#echo generating fresh catch up  snapshot to operate on
#snapshot_now
# The snapshot_now has to happen if level 0 is already done. - mohan
if [ $rfscount -gt 0 ]; then
    echo generating fresh catch up snapshot to operate on or using existing if second to third
    snapshot_now
fi

if  [ $rfscount -eq 0 ]; then 
    #if we got here and have no remote snapshots 
    #something is stale after an initial bulk action  
    # check the remote end for news.
    echo refresh remote snapshots after bulk action 
    get_rfs_snaps
fi 

echo ""
echo "Checking local and remote snapshots:"
echo "Local snapshots:"
cat $local_flow_snapnumbers_file
echo "Remote snapshots:"
cat $remote_flow_snapnumbers_file
echo "joining local and remote snaps"
join $local_flow_snapnumbers_file $remote_flow_snapnumbers_file

echo  -n "Last common snapshot in flow: "
lastcommon=`join $local_flow_snapnumbers_file $remote_flow_snapnumbers_file | tail -1`
echo $lastcommon

expire_local() {
    if [ $SECOND_TO_THIRD -eq 1 ]; then
        echo "Never delete local snapshots if we are in second to third mode"
        return
    fi
    
	echo -n "expire local versions before $frs $lastcommon"
	echo ""
	ln=`grep -n  "$lastcommon" $local_flow_snapnumbers_file | cut -d: -f1`
	echo "Row of last common snapshot: ln=$ln"
        echo "Entries in local_flow_snapnumbers_file="; cat $local_flow_snapnumbers_file
	ln=$(($ln - 1))
	echo "Line ${ln} is the event horizon."
	if [ $ln -eq 0 ]; then
		echo not enough snapshots not found,  
		echo should be at least two snaps in the mag always, mabe unless restartability is working
	else
		head -$ln $local_flow_snapnumbers_file
	fi
	########################## delete  old local  versions
	if [ $ln -gt $MAXOLDSNAPSHOTS ]; then
	    # destroy them 4 a time; to avoid buildup
	    # XX not clear if we ever need to kill 2 because we have restartable trasmits.
            # Unless of course you are on FreeBSD 9 and SUPPORT_RESUME_TOKEN=0
	    for i in `head -$ln $local_flow_snapnumbers_file | head -4  `; do
		echo "Deleting Local:   $lfs@$i"
                #			echo "  $lfs@$i"
		# -d is a defferable destroy to avoid stalling replication
                holdlines=`zfs holds $lfs@$i | wc -l`
                if [ $holdlines -gt 1 ]; then
                    echo "  Releasing hold tag $holdtag"
                    zfs release -r $holdtag $lfs@$i
                fi
	        zfs destroy -r -d $lfs@$i
	    done
	fi
}

expire_remote () {
    if [ $NEVER_DELETE_REMOTE_SNAPSHOTS -eq 1 ]; then
        echo "not expiring remote snapshots"
        return
    fi
	echo  expire needs $rfs, $lastcommon 
	echo "Remote snapshots lastcommon=$lastcommon"
	###########################delete old remote versions
	rln=`grep -n  "$lastcommon" $remote_flow_snapnumbers_file | cut -d: -f1`
	echo "There are rln=${rln} current remote snapshots:"
	echo "cat remote_flow_snapnumbers_file =";cat $remote_flow_snapnumbers_file
	rln=$(($rln - 1))
	echo "old remote versions:"
	if [ $rln -gt  6 ]; then
	    for  i in `head -$rln $remote_flow_snapnumbers_file | head -4  `; do
		echo "Deleting Remote:   "$rfs@$i
                #			echo "   "$rfs@$i
		# -d is a defferable destroy to avoid stalling replication 
		$rsh $rhost "zfs destroy  -d $rfs@$i"
	    done
	fi
}

echo "newer local versions after lastcommon=${lastcommon}"
len=`wc -l $local_flow_snapnumbers_file | cut -b1-8`
lln=`grep -n  "$lastcommon" $local_flow_snapnumbers_file | cut -d: -f1`
ln=$(($len - $ln  -1 ))
#tail -$ln $local_flow_snapnumbers_file
lln=$(($len - $lln  -1 ))
if [ ${lln} -gt 1 ]; then
	tail -$lln $local_flow_snapnumbers_file
else
	echo "lln <1, only one local snapshot in flow."
fi

latestlocal=`tail -1 $local_flow_snapnumbers_file`
if [ ${lastcommon} != ${latestlocal} ]; then
	echo " tx with common baseline:  $lfs@$lastcommon with delta $lfs@$latestlocal" 
	get_rfs_resume_token $rhost $rfs
	if [ ${#resume_token} -le 30 ]; then 
		echo incremental proceeding
		echo "Sending incremental delta from $lfs@lastcommon  upto $lfs@latestlocal"
                echo "Command used: zfs send  ${zfssendopt} $send_verbose_arg -i  $lfs@$lastcommon $lfs@$latestlocal  |  $rsh $rhost \"$zfs_recv_buffer zfs recv ${zfsrecvopt} -sF $rfs\""
		zfs send  ${zfssendopt} $send_verbose_arg -i  $lfs@$lastcommon $lfs@$latestlocal  |  $rsh $rhost "$zfs_recv_buffer zfs recv ${zfsrecvopt} -sF $rfs"
	else
		echo resume proceeding
		arg_resume_token="-t $resume_token"
		echo "Sending resume_token= $resume_token"

                echo "Command used: zfs send ${zfssendopt} $send_verbose_arg $arg_resume_token |  $rsh $rhost \"$zfs_recv_buffer zfs recv ${zfsrecvopt} -s $rfs\""
		zfs send ${zfssendopt} $send_verbose_arg $arg_resume_token |  $rsh $rhost "$zfs_recv_buffer zfs recv ${zfsrecvopt} -s $rfs"
	fi
else
	echo "lastcommon=${lastcommon} and latestlocal=${latestlocal} are the same.  We are upto date."
fi
expire_remote 
expire_local
hold_cleanup
lockfile_cleanup
logger "replication done ${lfs} to ${rhost}:$rfs@$nowsnapname for flow $mytag"
echo "This batch of replication completed.......`date`"
echo "=================================================================================================="
