Monday, July 15, 2019

DB2 Split Brain - it's still a thing...

Synopsis -
DB2 "Split Brain" is still very much a vulnerability in DB2 v11.5 (multiplatform).
The concept and meaning is discussed, along with the point that the vulnerability is actually multiplied with increased complexity in server frameworks and recent DB2 featuresets, coupled with DB2 Administrators making decisions to enact HADR connectivity overrides in emergency outages, by relying on potentially misleading diagnostic cues.
Brief practical and theoretical research on multiple DB2 Split Brain scenarios across DB2 versions, even given multiple built-in preventative mechanisms.
A summary is given on how best to procedurally avoid Split Brain.

Main Article -
I was recently asked a DB2 interview question : "How does Split Brain occur".
DB2 High Availability being a specialty of mine (a few years ago), I racked my brain for what I thought the simplest summary was, and my response :
Split Brain is when you have a [disconnected] HADR Standby force take-over as Primary, (but the original Primary is still active) and application connections add log records to the new Primary which put the two databases in an irreconcilable state where they cannot reconnect in HADR.

The interviewers had an even simpler summary - i.e. Split Brain occurs (or can occur) when HADR databases are no longer the same.

Reflecting on it, their simpler answer is indeed more valid than my own.  This is not only for the reason that two HADR Standard / Primary mode databases which become divergent from each other cannot be reconnected as a Primary + Standby, (i.e. divergent transaction logs as opposed to a Primary transaction log which is merely further along than the Standby has replayed - the latter situation can still potentially be brought to peer state so long as Primary and Standby are HADR connected and log replay/catchup is allowed to progress before any [forced] takeover).
It is also true because (in HADR_SYNCMODEs other than SYNC or NEARSYNC), a vulnerability opens up every time a HADR Standby falls behind the Primary in log apply, or is in a 'Catch-up' state.
If that catch up state is not successfully brought back to a peer state, transactions from the original Primary side will be lost, and a loss of connectivity to (or outage of) the Primary server before all HADR logs are received by the Standby, means that peer state can never be achieved if the Standby has to be forced into a takeover as Primary state.
If applications start connecting and writing logs to the new Primary while the old Primary is disconnected from Standby, but not from applications/batch connections, the two databases become irreconcilable, with some transaction log records essentially unrecoverable except with a 3rd party log extraction tool (and most likely practically unusable after the fact in a high activity OLTP environment - it would be more expedient for selective application resubmission after assessing referential integrity to find missing/incorrect row/column values).
Bringing both databases back to connected HADR Primary+Standby state after this forced takeover situation (where the two were not in peer state at time of takeover) will require the old Primary database to be restore/replaced from an online backup of the new Primary and started in rollforward pending as HADR Standby, in the same manner as the original Standby was established.   Once in peer state, the original Primary can perform an unforced takeover and the servers can resume their normal roles, which is usually required where the Standby server has been configured with lower capacity (CPU,memory) or is physically located remotely in a DR site.

So, with all that said, I was immediately intrigued by their follow-up statement that DB2 has protections against Split Brain such that it doesn't (cannot) occur in the more recent versions of DB2. (e.g. v10 and later).
I had to do some research to confirm this, because it seemed to me that regardless of the DB2 version, Split Brain will always be a potential situation which can only be avoided by never performing a HADR 'takeover by force', and thus never having two connectable databases at the same time. That still allows for Read_On_Standby (ROS) which only allows Uncommitted Read, (unlogged Selects) and is completely safe.
I hasten to add that by 'connectable', I definitely do not mean that DB2 will allow databases with divergent log streams to re-connect or re-start in HADR mode.  DB2 already has multiple safeguards including detection when a Standby candidate has log records which the Primary candidate does not contain, preventing a HADR Start as Primary to connect to a divergent started Standby (e.g.  with SQL1768N rc7).   'Connectable' includes the obvious non-HADR 'Standard' mode database where manual intervention is performed by DB2 admins under pressure to get the database online again for users. 
As with most things, the inevitability of human error and manual intervention means that all built-in mechanisms to prevent data inconsistency are not foolproof.

It may actually be that conceptually, our ideas of what 'Split Brain' means, are 'divergent' :)
It is not an uncommon phenomenon for IBM's official terminology to differ from usage in the global DB2 support community.
My idea incorporates the broad definition of data inconsistency regardless of the current state of HADR connectivity, whereby the log streams have diverged - i.e. databases have log records which the other lacks, e.g. due to a takeover by force outside of peer state. (or more specifically, outside of peer window), along with subsequent committed transactions on the new HADR Primary.
One way a Split Brain can occur even from a Peer State, is where the original Primary server is still running with connected applications, but the Standby loses connectivity to Primary and an over-enthusiastic DB2 support person thinks that HADR error actually means the Primary server is down and a HADR takeover [by force] is required.  Subsequently, there are two HADR Primary DB2 databases running and connectable - if any application connects to the new Primary and writes a transaction log, that becomes a Split Brain scenario.
Normally, you might expect there to be other factors preventing Split Brain from occurring in an enterprise/multi-server environment purely because applications typically connect to DB2 not locally, but remotely via separate middleware such as Websphere Application Server (WAS), and that normally restricts the application to DB2 network route to a single port number on a single IP address on a specific network adapter.
Additionally, reliance on the Automatic Client Re-route mechanism should serve as a preventative measure, whereby an attempted connection on server A will be autoredirected to Server B if DB2 HADR is currently set to Standby on Server A, (and vice versa, direct connections to Server B redirect to A when Server B is Standby)
Unfortunately, in security vulnerability parlance, this merely widens the 'attack surface' - increasing the number of moving parts (points of failure) which can go wrong.
It is especially true when there are multiple network adapters in play (for firewall security zone demarcation as well as load balancing), where some are designated for remote admin and some are internal application comms - if external zone admin routes fail but internal zone application comms remain, it can be difficult for the support teams or dashboard/monitoring tools to confirm whether applications are still connected to DB2, because they cannot connect in order to run basic diagnostic commands.
Unless there is an explicit process or mechanism preventing applications connecting to one database of a DB2 HADR pair without confirmation that it is the only connectable (Primary / Standard mode) database, the Split Brain vulnerability exists.
I assert this because that situation occurred for our support team years ago, (luckily only on a Disaster Recovery test takeover scenario), whereby the HADR takeover by force was issued on the DR test Standby after confirmation from the system support team that the old Primary was stopped, but some application activity occurred on the original Primary even after the HADR takeover by force on the DR test Standby, creating the Split Brain.
In this scenario, two related points of failure existed -
1) After the shutdown step on the four Primary DB2 servers was given, the Admin Network Adapters indicated that all four were not connectable, which was taken as the signal to proceed as though all four were stopped.  Unfortunately, it turned out that at least one Primary DB2 server was still running, just that the Admin network adapter was stopped, and some batch applications were still connected and processing in DB2 through the internal adpater - some through direct ip/port, not all through Websphere Admin Server (WAS) middleware.
2) The middleware/websphere team had to switch over their registered application server ip addresses, because in a real DR scenario, DB2's Automatic Client Reroute (ACR) cannot autoredirect when the Primary server is down.  There were multiple load balanced application servers to switch over and this took time, allowing transaction processing unbeknown to the system support teams but later discovered by the application team.

To practically test my assertion on current DB2 version 11.5, I went so far as to create a pair of virtualised DB2 11.5 on SLES12x64 servers, pairing the SAMPLE db in HADR via a host-only network adapter for internal DB2 connectivity in addition to the NAT adapter for external/remote (admin) connectivity.
(sles12x64a db2inst1 SAMPLE HADR Primary, sles12x64b db2inst1 SAMPLE HADR Standby)
The Split Brain scenario is still possible purely by virtue of the ability to perform db2 takeover by force on the Standby server while the old Primary is still running but network disconnected from the Standby.
It is also possible when HADR is stopped after any takeover and databases become connectable as Standard mode.
Thankfully, if connectivity between Primary and Standby exists at the time of a forced takeover, there is a DB2 mechanism which prevents subsequent connections on the old Primary:
i.e. start HADR, then attempt connect to old Primary after a forced takeover on still-connected Standby:
db2inst1@sles12x64b:~> db2 start HADR on db sample as Standby
DB20000I  The START HADR ON DATABASE command completed successfully.

db2inst1@sles12x64a:~> db2 start HADR on db sample as Primary
DB20000I  The START HADR ON DATABASE command completed successfully.

db2inst1@sles12x64b:~> db2 takeover db sample by force
SQL0104N  An unexpected token "db" was found following "TAKEOVER".  Expected
tokens may include:  "HADR".  SQLSTATE=42601
db2inst1@sles12x64b:~> db2 takeover HADR on db sample by force
DB20000I  The TAKEOVER HADR ON DATABASE command completed successfully.
db2inst1@sles12x64b:~> db2 connect to sample
   Database Connection Information
  Database server        = DB2/LINUXX8664
 SQL authorization ID   = DB2INST1
 Local database alias   = SAMPLE

db2inst1@sles12x64a:~> db2 connect to sample
SQL1776N  The command cannot be issued on an HADR database. Reason code = "6".

Normally, those two databases cannot now be HADR reconciled because both were essentially in PRIMARY state, and to start a database as Standby, it needs to be in rollforward pending state.
However, DB2 has a utility called db2rfpen to force reset a database into rollforward pending state.
We will assume for convenience & purpose of this testing that the takeover by force occurred within Peer Window (otherwise we already have Split Brain).  We will also assume for the same reasons that the old Primary did not have any remaining connected transactions commit after the takeover by force.  The negation of any of these assumptions would indicate a Split Brain scenario has already occurred in terms of divergence of log streams & committed transactions in database, even if those databases are currently preventing new connections.  As stated above, DB2's internal safefguards will at least ensure a successful HADR Start/reconnect will not occur if the log streams are divergent, but that doesn't prevent connections and transactions if HADR is then stopped and databases are in standard mode.
Reconciling such divergent databases requires choosing one to be discarded and overwritten with a fresh full backup of the other database chosen as the best new Primary.
In order to reconcile and restart HADR in this otherwise cleanly forced takeover scenario,
since the Standby issued a takeover by force and potentially had transaction logs subsequently applied to it, and the old connected Primary was prevented from accepting new connections, the logical database to choose to start as Standby would be the old Primary.
Attempting to restart SAMPLE on the old Primary right now gives us the following error:
db2inst1@sles12x64a:~> db2 deactivate db sample
DB20000I  The DEACTIVATE DATABASE command completed successfully.
db2inst1@sles12x64a:~> db2 stop HADR on db sample
DB20000I  The STOP HADR ON DATABASE command completed successfully.
db2inst1@sles12x64a:~> db2 start HADR on db sample as Standby
SQL1767N  Start HADR cannot complete. Reason code = "1".

SQL1767N   rc1 The database was not in roll forward-pending or roll forward-in-progress state when the START HADR AS Standby command was issued.

Not to worry, a quick n dirty db2rfpen + repeat start HADR as Standby has that sorted:
db2inst1@sles12x64a:~> db2rfpen on sample
                    ____    D B 2 R F P E N    ____                     
                 IBM - Reset ROLLFORWARD Pending State                  
  The db2rfpen tool is a utility to switch on the database rollforward  
  pending state.                                                        
  It will also reset the database role to STANDARD if the database is   
  identified using the database_alias option.                           
  In a non-HADR environment, this tool should only be used under the     
  advisement of DB2 service.                                            
  In an HADR environment, this tool can be used to reset the database   
  role to STANDARD.                                                     
  SYNTAX: db2rfpen on < database_alias | -path log_file_header_path >  
Primary Global LFH file    = /home/db2inst1/db2inst1/NODE0000/SQL00001/SQLOGCTL.GLFH.1
Secondary Global LFH file  = /home/db2inst1/db2inst1/NODE0000/SQL00001/SQLOGCTL.GLFH.2
Path to LFH files          = /home/db2inst1/db2inst1/NODE0000/SQL00001/MEMBER0000
Original rollforward pending state is Off.
Setting rollforward pending State to On.
Setting backup end time to: 1562854483
db2inst1@sles12x64a:~> db2 start HADR on db sample as Standby
DB20000I  The START HADR ON DATABASE command completed successfully.

Thanks to lack of log divergence, the start as Primary on the other side also works and reconnects the DB2 HADR pair.
db2inst1@sles12x64b:~> db2 start HADR on db sample as Primary
DB20000I  The START HADR ON DATABASE command completed successfully.
db2inst1@sles12x64b:~> db2pd -d sample -HADR
Database Member 0 -- Database SAMPLE -- Active -- Up 0 days 00:00:08 -- Date 2019-07-15-
                            HADR_ROLE = PRIMARY
                          REPLAY_TYPE = PHYSICAL
                        HADR_SYNCMODE = NEARSYNC
                           Standby_ID = 1
                        LOG_STREAM_ID = 0
                           HADR_STATE = PEER
                           HADR_FLAGS = TCP_PROTOCOL
                  PRIMARY_MEMBER_HOST = sles12x64b
                     PRIMARY_INSTANCE = db2inst1
                       PRIMARY_MEMBER = 0
                  Standby_MEMBER_HOST = sles12x64a
                     Standby_INSTANCE = db2inst1
                       Standby_MEMBER = 0
             HADR_CONNECT_STATUS_TIME = 07/15/2019 15:47:52.622282 (1563169672)
          HEARTBEAT_INTERVAL(seconds) = 30
                     HEARTBEAT_MISSED = 0
                   HEARTBEAT_EXPECTED = 0
                HADR_TIMEOUT(seconds) = 120
        TIME_SINCE_LAST_RECV(seconds) = 5
             PEER_WAIT_LIMIT(seconds) = 0
           LOG_HADR_WAIT_CUR(seconds) = 0.000
    LOG_HADR_WAIT_RECENT_AVG(seconds) = 0.000000
   LOG_HADR_WAIT_ACCUMULATED(seconds) = 0.000
                  LOG_HADR_WAIT_COUNT = 0
            PRIMARY_LOG_FILE,PAGE,POS = S0000009.LOG, 0, 85596001
            Standby_LOG_FILE,PAGE,POS = S0000008.LOG, 0, 81520001
                  HADR_LOG_GAP(bytes) = 0
     Standby_REPLAY_LOG_FILE,PAGE,POS = S0000008.LOG, 0, 81520001
       Standby_RECV_REPLAY_GAP(bytes) = 0
                     PRIMARY_LOG_TIME = 07/12/2019 00:14:43.000000 (1562854483)
                     Standby_LOG_TIME = 07/12/2019 00:14:43.000000 (1562854483)
              Standby_REPLAY_LOG_TIME = 07/12/2019 00:14:43.000000 (1562854483)
         Standby_RECV_BUF_SIZE(pages) = 512
             Standby_RECV_BUF_PERCENT = 0
           Standby_SPOOL_LIMIT(pages) = 13000
                Standby_SPOOL_PERCENT = 0
                   Standby_ERROR_TIME = NULL
                 PEER_WINDOW(seconds) = 120
                      PEER_WINDOW_END = 07/15/2019 15:49:55.000000 (1563169795)
             READS_ON_Standby_ENABLED = N

Now, if anyone had stopped HADR on the old primary db and activated it as standard mode, we'd have transaction connectivity re-enabled and the Split Brain would effectively be irreconcilable.

There are a high number of permutations whereby Split Brain can be artificially induced, many of which can only be partially simulated on two virtual servers, so I have limited my practical experimentation to the above for reasons of expediency, and turned today to the theoretical using good-old internet and IBM knowledge base/manuals.

As a result of my brief online research :
Renowned DB2 expert Steve Pearson perhaps explains it best in this article, with a clear and specific definitional delineation of Split Brain prevention. The initial Q&A has a number of follow-ups which still make relevant points, even though they were written in 2006 when HADR was a new concept.
to whit:  
Server A (HADR Primary), Server B (HADR Standby) -
Server A Fails, Server B takes over as new Primary.
Server A restarts, but DB2 is still in HADR Primary state on Server A - what prevents applications connecting?
DB2 will NOT ALLOW new connections to a restarted HADR Primary until it successfully reconnects to a HADR Standby.

That was true back in DB2 v8.2, v9.1 HADR, just as it remains true today in v11.5, however it doesn't negate the scenario where Server A was never stopped & restarted, nor does it prevent panicked support teams simply issuing a db2 deactivate db <dbname> and db2 stop HADR on <dbname> on a restarted HADR Primary which puts the database back in standard mode.  This makes both databases connectable by all and sundry, without due regard for the new Primary still running along happily with application connections on the other server, or the transaction logs now completely divergent.

In the earliest versions of DB2 HADR (e.g. v8.2, v9.1), the only IBM definition of Split Brain is simply given in the context of issuing a 'start HADR on db <dbname> as Primary by force' command:

Caution: Use the START HADR command with the AS PRIMARY BY FORCE option with caution. If the Standby database has been changed to a Primary and the original Primary database is restarted by issuing the START HADR command with the AS PRIMARY BY FORCE option, both copies of your database will be operating independently as primaries. (This is sometimes referred to as split brain or dual Primary.) In this case, each Primary database can accept connections and perform transactions, and neither receives and replays the updates made by the other. As a result, the two copies of the database will become inconsistent with each other.
As of DB2 v9.7, that particular vulnerability was removed when takeover by force is issued against a Standby on an active connected HADR pair.  The old HADR Primary is set to an obsolete state whereby it cannot be restarted as HADR Primary, even by force, giving an SQL1776N rc6: This database is an old primary database. It cannot be started because the standby has become the new primary through forced takeover.
However, as the example below shows, it does not prevent Split Brain from transactions connecting after stopping hadr (db back to Standard mode):
db2inst1@sles12x64b:~> db2pd -d sample -hadr
Database Member 0 -- Database SAMPLE -- Standby -- Up 0 days 00:00:03 -- Date 2019-07-16-
                            HADR_ROLE = STANDBY
db2inst1@sles12x64b:~> db2 takeover hadr on db sample by force
DB20000I  The TAKEOVER HADR ON DATABASE command completed successfully.
db2inst1@sles12x64b:~> db2pd -d sample -hadr
Database Member 0 -- Database SAMPLE -- Active -- Up 0 days 00:03:29 -- Date 2019-07-16-
                            HADR_ROLE = PRIMARY
                          REPLAY_TYPE = PHYSICAL
                        HADR_SYNCMODE = NEARSYNC
                           STANDBY_ID = 1
                        LOG_STREAM_ID = 0
                           HADR_STATE = DISCONNECTED
db2inst1@sles12x64b:~> db2 connect to sample
   Database Connection Information
 Database server        = DB2/LINUXX8664
 SQL authorization ID   = DB2INST1
 Local database alias   = SAMPLE

db2inst1@sles12x64a:~> db2 deactivate db sample
DB20000I  The DEACTIVATE DATABASE command completed successfully.
db2inst1@sles12x64a:~> db2 takeover hadr on db sample by force
SQL1776N  The command cannot be issued on an HADR database. Reason code = "6".
db2inst1@sles12x64a:~> db2 stop hadr on db sample
DB20000I  The STOP HADR ON DATABASE command completed successfully.
db2inst1@sles12x64a:~> db2 connect to sample
   Database Connection Information
 Database server        = DB2/LINUXX8664
 SQL authorization ID   = DB2INST1
 Local database alias   = SAMPLE

Before DB2 9.5, there was no integration of DB2 HADR with Tivoli System Automation for MultiPlatforms (TSAMP) which later became DB2 HA and better integrated in v9.7, and subsequently replaced by DB2 PureScale in v9.8 to combine the best features of High Availability Clustering with Partitioning, while still allowing for HADR features to exist in hybrid scenarios including multiple Standby/replay targets.

Going to IBM's Knowledge Centre for the latest DB2 LUW (v11.5 as of this article), the issue of Split Brain is now covered in at least 10 locations,
mostly dealing with methods for DB2 Administrators to use to avoid Split Brain scenarios, as well as describing the built-in features designed to prevent it (so long as they are not manually overridden) -
additional scenarios for Split Brain which never existed before concepts such as hybrid cluster+HADR and multiple standby HADR are now covered in v11.5:
This alone is sufficient indication that IBM still acknowledges Split Brain as a vulnerability even with all the safeguards in place from the automation in PureScale and built-in connection prereq checking for HADR commands and log shipping/replay.

So in summary, in an active connected HADR pair, Split Brain was always, and is still prevented in the situation as described by Steve Pearson, and since v9.7, also prevented where an active, connected HADR pair experiences a forced takeover on the Standby.  Even so, the potential for Split Brain (through disconnection and manual intervention) even in DB2 v11.5 still exists in my descriptions above.   It was unquestionably mitigated by the PureScale / TSA scripted mechanisms to only force a HADR takeover in the event of an actual cluster Primary server failure, or forcing a takeover/failover and node restart in the event of a cluster quorum failure, taking out a lot of the human error factor. However, the vulnerability remains wherever a hybrid or unmanaged HADR pair exists, and steps are not taken to ensure only one database is connectable by applications at a time.
If you take one point from this entire article, to avoid Split Brain, it would be to never succumb to the pressure of forcing a HADR database 'back online' to transaction processing as fast as possible at any cost, without first absolutely ensuring that the other database server(s) have well and truly been shut down and disconnected from all application connections on all network routes, and that a subsequent restart of those offline database servers must not have any attempts made to force the database to standard mode.  Instead, in re-establishing HADR pairing, those databases must only ever be overwritten from fresh backups of the new Primary (if the takeover was forced/in non-peer state), or restarted as standby and allowed to run log catchup if the takeover was unforced and the interim log activity with time elapsed is not going to take longer than a backup/restore.

My appreciation to anyone who made it all the way to the end of this article, it appears I have not reduced my verbosity as a result of temporal progression.
-Paul (Morte) Descovich.

Sunday, December 9, 2018

Oculus VR for Unity - review of Sample Framework unitypackage v1.31

Before I reviewed the full Oculus Sample Framework unitypackage, I reviewed the core Oculus Integration Utilities which are included in the Sample Framework:

For 1.32
This week's update of the 1.32 brings the scripts and the plugins to 1.32 and SDK to 1.33:
Debug.Log() Console output:
Unity v2018.2.20f1, Oculus Utilities v1.32.0, OVRPlugin v1.32.0, SDK v1.33.0.
OVRManager:Awake() (at Assets/Oculus/VR/Scripts/OVRManager.cs:1078)

Oculus appear to be shifting focus towards putting  assets inside the Avatar plugin dlls accessible via CAPI class, and removing some meshes, etc from individual use.

This version 1.32 includes some assets (but not the scenes) under Oculus/SampleFramework/Core, including the locomotion, teleport scripts, which is a good thing and raises my start rating from 3 to 4 :)  (Full Oculus Sample Framework would get a full 5 if it ever comes back to Asset Store, as would the OVR Updated Interaction unity package which should still be available here along with useful touch controller meshes )
Also helping raise the star rating is the inclusion of LeftControllPf and RightControllerPf which are essentially standalone animated prefabs of the Oculus Touch Controllers, used by attaching them under a customised TrackedRemote prefab, setting Controller var to LTouch/RTouch, dragging the ...ControlPf prefabs to replace both Model vars, adding those "Touch" TrackedRemote prefabs for each of OVRCameraRig or OVRPlayerController's LeftHandAnchor/RightHandAnchor sub GameObjects..

Also new are the social/network sample scenes under Oculus/Platform/Samples.

Review of v1.31
Test Developing with Rift CV1, I have downloaded and tested some core sample scenes in the latest Oculus Integration Utilities v1.31 for Unity - most scenes have something useful to use, even if not everything works, or is incompatible as-is with Unity 2018 (requires some code tweaking for deprecated methods)
- something of note is the Oculus/Avatar/Samples/SocialStarter/Assets/MainScene, which does the loading of a custom avatar, complete with autofading linerender rays & pointer spheres which match those in the default Oculus Home user interface, however the code which creates those rays and pointers is not externally exposed to developers
- it appears to be an internal function of the Oculus.Avatar.CAPI class, of which we only can see headers, because it is an API inside the OVRPlugin.

Unfortunately that does not help developers who need lightweight high performance/low overhead code for mobile platforms, like Gear VR which I am attempting. Importing ONLY the VR subtree hierarchy from this unitypackage works for SOME scenes which have no platform or avatar code dependencies, but if you start using certain prefabs or scripts, a web of cross-dependencies on the AudioManager,Avatar,LipSync,Platform,Spatializer,VoiceMod subtrees will soon emerge, necessitating a monolithic import of bloat, even if you remove the various 'Sample' subtrees.

As official companion unity packages to the Oculus Integration/Utilities, I have tested the line render technique shown in the Oculus Sample OVR Updated Controller Interaction (separate unitypackage) and certain locomotion scenes in Oculus Sample Framework (separate unitypackage)
- both also designed for earlier versions of Unity, in order to test locomotion/teleport/rotation as well as UI interaction, but that brought additional complexity because of differing versions of OVR scripts in the sample scene prefabs which are incompatible with the same named scripts in the separate Oculus/VR/Scripts hierarchy.

Additionally, the Oculus Unity UI interaction for Touch/GearVR/Go/Malibu controllers seems broken when dealing with anything more complex than a toggle, button, input field, or slider bar. (i.e. Things like a Unity UI.DropDown result in reverting to selection via a non-visible Gaze Cursor.)
There is a user-submitted hack for this here:
(thx @jhocking !)
It also bears repeating to say Unity UI Navigation must be disabled on all in-scene UI components to avoid the 'stuck focus' bug, (selects the previous UI element on click rather than the UI element currently being pointed to).

I will try the hack, and try to use something other than a dropdown or scroll field, but just be aware not everything is integrating well....
Update: I updated the hack to be implemented via CoRoutine only on Event DropDown Clicked, rather than every frame in Update.  It currently works for GazePointer and correctly switches between Controller raycasts so long as activeController for the EventSystem's OVRInputModule is being correctly set.

A number of issues and lack of obvious functions here are mitigated by another unitypackage "Oculus Sample Framework" v1.31, which very handily also contains the contents of THIS unitypackage as well as all the samples in a single download.

And now for the review of the Sample Framework v1.31:

The v1.31 version Oculus Sample Framework unitypackage combines updated/tweaked Sample Framework scenes and assets for a number of scenarios, with the core OVRPlugin 1.31 & Integration contained in the separately importable Oculus Integration Utilities v1.31 package.

Compared to the previous v1.30.1 of the Sample Framework, v1.31 shows an interesting migration by Oculus towards implementing separate LocalAvatar prefabs with animated hand mesh superimposed over controllers, and away from static controller mesh representations attached to the PlayerController/OVRCameraRig.

Highly customisable Locomotion with Teleport is still supported in the example scene for SampleScenes/FirstPerson/Locomotion2/TeleportAvatar, but the latest PlayerController prefab in the core v1.31 Integration package already supports very basic linear motion with snap rotation (without teleport).

The GazePointerRing is used as a pointer in that scene, rather than the nice Oculus Dashboard/Home ray pointer alpha shaded linerender, but you can see and copy how a straight laser linerender and a curved multisegment linerender from controller is handled via the teleport mechanism.

To see how a custom/personalised LocalAvatar works with those nice alpha shaded ray pointers and animated hands, see the Oculus/Avatar/Samples/SocialStarter/MainScene, which loads a custom avatar (so long as you set up the Oculus app id via Oculus Platform/Oculus Avatar editor inspectors)
Bug: -There are 2 audio listeners in the scene. Please ensure there is always exactly one audio listener in the scene.
Fix: disable Audio Listener for GO "Room Camera" & also "Main Camera" since OVRPlayerController contains OVRCameraRig with Audio Listener
Bug:  the HelpPanel GO on avatar left hand  has a null material
Fix: Drag Help (Rift) & GearHelp (GearVR) materials to OVRPlayerController GO PlayerController Script
(Dragging material Help to HelpPanel GO - does not fix in Play mode, it gets set back to null)
Basic UI interaction is also demonstrated in the Locomotion2/TeleportAvatar scene.

The overall download size of >500Mb seems large but it is possible to just take the specific scripts and scene prefabs you need, along with the core /Oculus/Avatar(2Mb),Platform(23Mb),VR(20mb) asset subtrees, for a more reasonable size. Note that the Oculus/VR subtree contains the actual OVRPlugin core dlls, so it is not an optional import :D

Start here for Oculus Unity doco:

and here for guides on adding linerender to controller using Sample Framework (for GearVR but also works for Rift touch and other rotational pointer type controllers like Go, Malibu)

Happy VR coding with this asset, thx Oculus.

Sunday, January 7, 2018

Augmented Reality Vuforia Demo for Android & Unity


Demonstration of Augmented Reality using Vuforia libraries, on 2 separate builds for Android (Android Studio , Unity 2017.2)

3D AR object tracking using ImageTarget & Virtual Buttons

Unity native AR integration requires Android v7 / iOS ARKit,
however adding Vuforia AR libraries allows backlevel Android support (back to v4.1 as of Unity 2017.2)

Please see the transcript for links & background info.

Transcript + timecodes

Hi, This is Morte from PCAndVR,
with a video demonstrating Augmented Reality using Vuforia libraries, on builds for Android (using Android Studio & Unity 2017.2)

If you're not familiar with Augmented Reality (AR), it means using a device to synchronise virtual objects into the sensory field (e.g. visual/auditory) through which we experience the real world.
A key feature of AR is locking a virtual reference frame into a live camera image of either a specific image target, a 3D object, or a surface (ground plane).  This reference frame becomes a stable centre point for generated 3D objects to exist such that they appear synchronised with the real world in the device's camera image.
Because an AR object or image target can be tracked in 3D space, it actually provides a form of head tracking for Mixed Reality (XR) if you use a mobile Virtual Reality (VR) head mount with a camera opening such as Samsung's Gear VR or custom Google Cardboard headsets.

Without head tracking, VR headsets can only provide 6DoF (degrees of freedom) rotation from a single point, and no translational movement in virtual space, adding to motion sickness.
AR devices can be something as simple as a recent model mobile phone or tablet with camera.
Head mounted AR devices can be more useful for hands-free operation and keeping virtual objects in the field of view, but typically there are weight and portability issues with tethered head mount devices, and image quality limitations on current untethered glass-frame devices.

This video will demo 3D AR object tracking using ImageTarget & Virtual Buttons, as they are reliable and simple to implement.
First up, we have the Android Studio built app, which largely contains the same feature set as the Unity app, albeit using simpler models and older image targets.
The 1st Android Studio example uses the stones image target.  The added 3D object is a teapot, originating from the famous 1975 Utah teapot 3D model.

Using the app we can show the teapot from various distances, angles, and even zoom in on the inside of the spout, with the teapot visually appearing fairly stable against the stones image target.

The image targets are printed on A4 paper from files in the supplied library, and are designed to be high contrast, with numerous well defined edges and non-repeating features.  This allows detection and tracking even if only a portion of the image is visible in the camera field.

The 2nd Android Studio example uses the wood background image target, with four virtual buttons.   Each button gets mapped with a virtual rectangle surrounding it to detect occlusion by real world objects, effectively meaning the button has been pressed.  In this case, it changes the colour of the teapot to match the colour of the virtual button covered by my fingers: red, blue, yellow, and green.

Next, we move to the Unity build of the same type of example features, with image targets and non-interactive 3D objects, and virtual buttons to allow interactive 3D object behaviour using the same image targets.

Here we have a standing astronaut, which can jitter if the AR device is not held steady, an effect worsened by the model's height, and relatively small target image size
next is an oxygen cylinder which is significantly more stable due to the 3D model being smaller and positioned closer to the target surface,
and an animated drone, which while small, can jitter in the hovering position above the target surface.

I coded a custom 3D image of my own showing an empty gamelevel platform, using the stones image target and downloaded .xml file, but I could have used any suitable custom image with the Vuforia web xml generator.

Finally, the Unity build's virtual buttons feature shows the astronaut waving after a button press,
the oxygen cylinder displaying an infographic dropdown visual,
the drone emitting a blue projection field.
and the fissure changing from white steam to dark red.

In concluding, we saw some limitations of smaller, less complex image targets in terms of feature detection & target tracking,
with resultant visual instability of mapped 3D objects, worsening further above the surface of the target,
and objects disappearing at low incident angles to the target, where feature detection is not possible.  
Both these limitations can be mitigated somewhat through use of 3D object targets or larger complex multiple image targets on 3D objects,
since the detection should still clearly see features of one target even if those of another become occluded or at a low angle.

For further background info & links, please see the transcript for this video.

And that's it, so thanks for watching!

Background info

If you are thinking of doing your own Vuforia AR builds for Android,
The 1st demo uses an app built on Android Studio using the Vuforia Android SDK:

A number of books & online tutorials exist, but can be confusing due to the deprecated legacy Android Vuforia library API which used QualComm. & QCAR. naming.
Now Vuforia library naming is standalone, and Android native builds require Android Studio, replacing Eclipse.
However it still required significant manual tweaking and customisation for me to convert deprecated code from legacy Eclipse format into Gradle-based Android Studio framework.
It is really not worth pursuing Vuforia libraries for Android Studio since the Vuforia Unity SDK makes the entire process simple and modularised, utilising drag n drop gameobjects.

It is already far easier to go straight to Unity for both Augmented, Virtual, & mixed reality  (XR) features.
Unity native AR integration requires Android v7 / iOS ARKit,
however adding Vuforia AR library support allows backlevel Android support (back to v4.1 as of Unity 2017.2)
Don't bother with the downloadable legacy Vuforia library .unitypackage for Unity 5.x.x or earlier, there are too many deprecations to contend with.

Aside from using the image targets from the supplied library, creating your own image targets for use in an app consists of uploading an image to your Vuforia web account, and processing them for the appropriate platform. 
Internally, Vuforia converts images into feature points, downloaded from the site as an xml file used in the app as a form of 2D vector UV mapping. 
These feature points are used for tracking in conjunction with the Vuforia image detection & processing algorithms, and any chosen 3D objects can be overlaid onscreen, locked to the detected target so that movement of the device's camera shows the chosen 3D objects from the appropriate angle and distance, matching the real world target.

This is a general introduction to Vuforia:

This link helps you get started with Vuforia in Unity:

Some tips & tricks for Vuforia on Unity:

Vuforia's legal doco allows for free use of a subset of features during app development:

Vuforia requires setting up a Licence for each Unity project/application you develop

If you use the supplied example code from the .unitypackage in Asset Store (for Unity 2017.2 or later), be sure to use the link here in the notes to understand how to successfully apply custom image targets without them being overriden by the sample code:

Google Daydream View VR requires specific compatible Android v7 phones such as Google's Pixel 2 or Samsung Note 8/Galaxy S8, but reference headsets do not allow rear camera visibility, so no AR or XR on those.

Sunday, November 12, 2017

Marblesared on Google Play (finally)

Well it took a while but finally it is there for actual sale in the Google Play store, with a free demo, and a separate closed beta release for all you beta testers out there.


It took this long partly due to multi-factored complications with bundle package naming restrictions.
 - The original beta was called com.pcandvr.marblesared.
 - My automated bundle naming code needed to distinguish alpha/beta/production bundles when publishing, so I couldn't leave that name as is for beta, & couldn't customise the naming just for production due to other G.P. restrictions.
 - Google Play does not allow changing a 'free' app to a 'paid' app, so I couldn't make com.pcandvr.marblesared the prod paid bundle
 - Google Play lets you 'unpublish' an app bundle so that it no longer appears in Google Play to users as something they can download, but it does not let you delete the app bundle.  Ever.  Users who do not uninstall the app are free to continue to use that app on their devices.
- Google Play does not let you re-use old app bundle names for new apps, even after you unpublish the old app.  That is understandable - users should never face the issue of an update to an old app resulting in a different new app on their device.

So, ultimately, I ended up with three new app bundle names to meet a repeatable naming scheme:
com.pcandvr.marblesared_beta       (free)
com.pcandvr.marblesared_demo     (free)
com.pcandvr.marblesared_prod      (paid)

Some annoying news for beta testers - you'll need to uninstall your current Marblesared app, and reinstall the new beta here:

I looked briefly again at Apple iTunes/App Store & Microsoft UWP
Both Apple & Microsoft Developer programs require payment of 100USD (150AUD) every year.
Steam was a 100USD fee per title, but it was a once-off cost per title.

With current lack of sales of Marblesared, across, Steam, & Patreon,
coupled with the recent news of Microsoft dropping Windows Phone as a delivery platform,
balanced against significant losses already for Steam & PCAndVR business set-up,
those are additional costs I simply can't justify just at the moment - I will reconsider when I have more titles to publish, or a separate income stream to sustain that additional cost factor.

Wednesday, November 8, 2017

Simple, low cost cooling solution for Samsung Gear VR S6

After more than a year of my Samsung Galaxy S6 phone overheating after only about 10 minutes of play in my Innovator Edition Gear VR (SM-R321) headset, regardless of setting flight mode, wireless on/off, screen brightness, power saving, do-not-disturb mode, per these tips

I was losing enthusiasm and running out of $$$ to waste on the followup S7 & S8 and newer Gear VR headset.
I did get the latest Gear VR (SM-R324) headset + Touch Controller bundle for a reasonable price on ebay,
but I was still left with the problem of overheating, which apparently is especially bad on the Galaxy S6.
I was also quite dispirited in the wallet area upon learning that Google Daydream View would require a Google Pixel 2 or equivalent latest OLED phone (like the Samsung Galaxy S8).

I have read about numerous solutions specific to certain Gear VR models, and/or phone models, including clip-on fan mounts, heatsinks which required pulling the back cover and battery out of the phone, custom drilled or 3d-printed fan mounts and crazy looking aluminium foil heatsink rigs, and highly risky frozen coolant gel packs strapped to the phone (these people don't understand the concept of condensation occurring even outside a sealed cold gel pack, simply due to ambient humidity).

I decided the best & safest route was a very simple air-cooling solution with a fan small enough to fit just inside the gear vr faceplate, but not too small that airflow would be insignificant & ineffective. 
Being powered by the same usb connector as the gear VR itself would be nice but I decided against attempting to pull apart the headset for fear of irreversible damage.

Instead, I used three cheap items to make this a battery pack free-swivel-around-play-till-you-drop solution,
which can also be a mains-powered play-without-moving-too-much solution.
I am most impressed with the fact not only does the new Gear VR Touch Controller work with the Galaxy S6,
there is also no hardware dependency on the Gear VR headset - it works just fine on my old Innovator Edition SM-R321 model.  

So, on top of the Samsung Gear VR + Galaxy S6 + Touch Controller + micro USB charger cable

I got the following items:
- standard USB-A male to female 1.8M cable
- 60x60X10mm 5V USB Cooler Fan
- dual port 1A + 2.1A power pack

For the mains powered option, I got a dual-port USB mains adapter with at least 2A output on each port.
The official Samsung USB 5V @ 2A mains adapter seems to be the only thing which actually puts the S6 into 'fast charging' mode.

I wanted this solution to be VERY simple & easy to mount/dismount the phone from the Gear VR headset, and to plug/unplug the power source.
I also wanted the power source to be mounted on the headset somehow but the lightest power packs were too low on the mAh rating, and the best power-per-buck  vs low weight tradeoff got me a 6600mAh pack which even at 216g was still a bit too heavy to consider strapping to the actual headset, especially if I wanted to avoid neck strain and/or lie back on a pillow or lounge.
Since the overheating on the Galaxy S6 occurs primarily just under the camera lens, I positioned the fan so it would hang down just below the lens protrusion with the S6 mounted, and swing back up over to rest on the top of the Gear VR when unmounting the S6.

I achieved this of course with the amazingly high tech adhesive technology of ..... (wait for it)

sticky tape.

And here's the result, giving endless VR playtime, no overheating, cool-to-the-touch even after one hr solid play (eyestrain prevented further testing).

You can see in the side-on shots, the spinning fan pushes itself away from the phone face, which is fine, since that allows the airflow to disperse across the phone backplate for more effective thermal displacement.

I will definitely be able to enjoy VR again now - although Gear VR does have that unfortunate degree-of-freedom limitation where physical motion-in-space of the headset and controller are not represented, as they are with both the Oculus Rift VR headset and new touch controllers... 
Also the CPU/GPU limit on phones doesn't allow the same level of detail you can generate with nVidia GTX high end cards.  
I only have a GTX 970 & Intel 3770K and they are perfectly adequate for Oculus Rift VR.

Tuesday, October 31, 2017

Marblesared Build 2017/10/31 - All Hallow's Eve edition

Marblesared Splash image blog link

Hey Players,
Here is the All-Hallows Eve edition 31/10/2017 build of "Marblesared" (Marbles Are Red) for PC and Android.

I can give the link or Steam Key for the full game to beta testers or reviewers on request,
or to Patreon Patrons.
Otherwise it is available on Steam,, Facebook Gameroom for USD2.99.
Free Demo version with 6x4 levels x skills rather than 33x4 also available.

Link to Marbleared blog page

Thursday, October 5, 2017

Marblesared Early Release on Steam for Windows - 20171005

Marblesared Splash image blog link

Hey Players,
Marblesared has been [Early] Released on Steam today for the Windows PC platform!
Marblesared Steam

No game updates per se,
Steam's early release is the same as the 20170920 version I announced ...  on 20th Sep :-)

If anyone wants a Steam key for the PC version, lemme know, you deserve it for being beta testers.

You don't need a Steam key for the Android platform, it is still sitting in free beta mode for now in Google Play.

One day soon I hope to have a MacOSX build (which I can add to Steam) & iOS build
- but with OSX 10.13 High Sierra released on 25th Sep & 10.13.1 beta bug fixes already out to devs,
and iOS11 upgrades being pushed out to iDevices as we speak, with numerous battery power drain issues noted,
the question of build stability is up in the air.
Thanks to Microsoft libraries for IL2CPP & UWP SDK not liking Unity3D at present,

Microsoft Win10 / Universal Windows Platform (UWP) builds & MS App Store remain a pipe dream, even tho normal windows 32bit builds work just fine on windows 10.

You've Played Hard - you can give Rolling Balls a well earned rest for now :-)