346 - Chasing disk errors | Scoins.net | DJS

346 - Chasing disk errors

I have been having issues with backing up data since the last Apple mac OS X upgrade, from 12.15 Catalina to 13.1 Big Sur. Eventually I noticed that my most recent back-up was not at all recent, and it could have all too easily been that I really needed yesterday's backup to rescue one of those cumulative errors that occur. 

When your system offers peace of mind by organising backups for you, you grab this with both hands. You only have to experience data loss once too realise you never ever want to experience this again. Imagine losing all your books, or all your music or photos; imagine losing all of these at the same time. Of course you want a backup and of course you want to be told this is taken care of, for it to be something not worried about and that the backing up is happening all of the time in the background. So, all those worries appeased, you are cheerfully not worrying, having no need to check. Until you are forced to, at which point it would be very easy to go into some ascending panic helix.

I've been through this recently. I have two largish external drives, of 1TB and 4TB, and both are used only for backups. The system on OSX is called Time Machine, which one's keyboard neatly compresses to ™. But, unbeknownst to me and as far as I can tell unheralded, the upgrade from what really was OSx to OSxi required and expected that the formatting for the backup disks change. You don't care what they're called, but one's internal drive is formatted as default unless you push it to something else, while the external drive should be in APFS (apple's proprietary filing system?). Of course, reformatting a drive is exactly the same as erasing it and indeed that's what the reformat process is called. So it is something you don't want to do lightly and you do it with a lot of paranoia. Check what backups you have just in case, reformat a disk to APFS, back up to it, test that the backup is there (possibly more than once), cycle and repeat down the chain of backups. Often this process will include tests to see whether your disk might be damaged, where 'repair' usually means fixing some of the checks that the information tests as complete. OSX provides a disk utility, within which is DiskFirstAid, which is nice and simple to run and gives a level of detail that manages to assure without drowning one in a surfeit of technogabble.

Which was all fine. Backup system restored and working. The smaller (1TB is small?) hard drive basically holds one complete backup and so soon declares itself unable to store the relatively smaller files that are the changes since the previous 'last' backup. Not a problem. But, after about ten days and having had my paranoia detector reset, I go back to look at the back-ups and find I have only a week's worth. That is, backups are somehow failing to occur.  A lengthy cycle of testing and checking ensues, with the paranoia setting move steadily higher and, after at least a day, I find there is a fault on the internal drive. At this point, you think either "So what" or "Oh, shit". One has, of course, the ability to go hunt across the internet for recommendations, advice and recorded experience. You have no easy way of testing whether someone else's experience applies to you and, if it does, no way of knowing how completely it applies. So you do every test you can discover short of binning your data. There are several extraordinary (i.e., not ordinary) ways of booting up a Mac, some of which—depending on the OS running—do a full system check as part of the boot process. If you're feeling really brave you can load Terminal (which looks to me like screens used to look in the 70s and to any Apple-only user like having landed on the Moon) and type fsck (full system check) while also aware that a mistype might well have disastrous consequences. Not least, system commands like fsck can often produce several screenfuls of response, much of which looks, at face value, as if disaster has already struck. Indeed, this is a general problem with what I think of as true software diagnostics, they tell all. 

Having spent time in the past doing this sort of hunting, and on occasion even been paid for so doing, the moment the 'error' flag has gone up, a small part of you is resigned to the possibility that this is the end. End of an era for this data, for example; end of this particular disk; end of this machine, perhaps. With suitable care, something might be rescued, but that is the approach; that the disaster has already happened and sufficient care might result in a rescue. So one's attitude is not 'put me back immediately' but more 'it's all gone wrong, how much can I rescue'.

So apparently I have a fault on the internal drive. The wide reading suggests this might be an imposed error from contamination and the wise advice is to try reinstalling the OS and, if that doesn't fix the problem, to reformat the disk. Of course, several of these solutions include bits of process that mean you cannot also access the internet, so you need a second device acting as rescue manual and research tool. It very soon becomes clear that people who do this for a living must be working on several such problems simultaneously, or that they are being paid for sitting doing nothing for large parts of their day, because mostly the machine is doing the work. While it might in truth be very very fast, the vast amounts of detail it is sifting through means that, from the outside, it is unbelievably slow. Hours, even. Reinstalling the OS might well take several hours. So very soon you realise you need to write down (pens and pencils, remember them?) what you've been doing, and what you're intending to do on <result fork>.

On Day 3 of this I'm committed to a reformat of the internal drive. I've checked that I have backups so often I'm beginning to doubt myself. I've written down the plan. I've run the 'repair' function on the internal drive, which has two partitions, reports these separately and one is labelled Data – and the repair says there is a problem, which I've now also looked up several times, so that I have multiple tabs on the search engine window, with some duplication. I've identified an error code; I found how to discover which code I have, and I've looked up what I'm supposed to be able to do about it. The error code is five digits long is worrying enough ("Argh! So very much can go wrong!) but equally that might mean that the design is so good that 50,000 helpful codes have been generated to identify the ways you can screw up. Perhaps St Peter has a similar code set for denying people access to heaven.....

Decision to 'just do this', feeling a bit like committing to something dire. Re-installing the OS (several hours pass)  produces no improvement. Bite hard and reformat the disk, but the expected APFS is not among the alternatives listed, so I pick the previous best default decision. Disk check says all good (now). Reconnect backup drive to so reinstallation of data and the system says the two drives are empty. Try not to panic, plug these instead into the laptop, which shows that the drives have no visible content but that the free space (when you look in the right place, which in itself might be a whole chapter of a book describing this tale) matches what there was of the data, so something is there and I just can't see it. Eventually I conclude that perhaps the APFS formatting on the hard disks is not readable on the laptop, which is running an older OS. Go to bed to think on the problem.

Following early morning try reinstalling the OS by loading the system software from the internet, which adds time. When it starts at the 'where do you want to install this?' point I'm set aback by the OSX being El Capitan (version 10, not 11, 11.2 or 11.3). Long pause and wonder if this is a commitment to having to cycle through upgrades. It is around this point that my wonderful wireless keyboard and mouse need to be replaced with cabled versions. I let it start and then spend ages trying to find (i) how to nudge it upwards from OSx towards OSxi and (ii) why the check what OS would load didn't move later. Lengthy search comes up with a suggestion that the internal clock has to be nudged back to 2017 or 2016. So the OS (or an OS) has loaded and no progress has been made, so I change the clock (that's in Terminal, which feels like a terminal decision itself) eventually realise I have to disconnect the internet or the clock will reset, and I have to do that again only with cable and wi-fi disconnected; load up the OS and it works. In the sense that I now have a computer with working software and no disk errors on the internal drive. But the back-up disks are still blank as far as this OS is concerned. Sleep on this yet again, after a fruitless search of what-might-be on both connected machines. 

So I am pretty sure that I have a back-up, even two of them. But the computer says they aren't there. This is probably because OSX (the real version 10) wants OS journaled formatting. So that means that I need a route to upgrading from OSx to OSxi. Maybe the subtle difference is that there is no longer a declared disk fault; so maybe I ought to repeat the external reinstall of the OS. Vey early on yet another morning, that is what I do. And then do it again but with the date changed to 2017. For reasons I do not understand, this results in being offered OS is Big Sur (only), which is OSxi. So I tell it to install that and somewhere during this process the internal drive's formatting becomes APFS too. That looks like a success and, another four hours later, I have a working iMac with Big Sur (That's OS 11.3loaded. Is this a win? Do I have data available? Connect one backup disk and this time the two gadgets can see each other. So I can go into Time Machine and reload the data. Except I can't; ™ refuses to do it.

By now there is an element of the unexpected long haul about this. There's a small part of one's mind wondering if this is what it's like when a war advertised as 'over by Christmas' runs to four years, or six, or a pandemic that will be 'over by the summer' runs into the next summer. How much more can go wrong? More internet searching shows that the 'migration assistant' would be another way of reloading the data, and that might be somehow more thorough; it's worth a try, so off we go and yet another 4-6 hours later I have a restored disk full of data. Yippee, perhaps, though my feeling at the time was much more like the feeling one has reaching yet another false peak on a mountain ascent.

Is this a solved problem? Well it is and it isn't. My backups have proved they do exist, but that doesn't mean that everything is as it was—minus the disk fault—before I began this escapade. Many passwords are to be re-entered and I discover a new problem, that my carefully maintained password file is in the system that is demanding a password be found before it will let me in to look at the same file I want. This is a new version of those occasions in life where, in order to fix a problem ones faulty thinking relies upon the problem not being there. 

Example: Oh the car's not working, I'll just nip to the garage...duh! 

Example: I've found the blockage in the sink U-bend, I'm just washing it out.... Oh, I should have used the sink plug.

Example: The lights are off I'll just go to the fuse-board - and you try to turn the lights on to see which circuit needs attention. So where do you keep a torch?

So I'm adding one more example: Where do you keep your backup back-up list of passwords? How does anyone access your computer after you've died? What, you haven't thought of that?

I worked it out, but it's the password that you've changed are the hard ones to remember.

It turns out that there are subsequent hassles. I've had to re-install a chunk of Microsoft Office, which refused to let me at the Excel files (including, specifically, the back-up password list file, now on paper). It took ages to sort out the password to my web-host and this somehow required the next upload to be the whole site (about an hour?). My emailer, Apple Mail, demanded that it reload all the Mail (but I think what it did was re-index it) which slowed everything down while it tidied away some 30,000 saved mails. I've been exploring all sorts of software listed as on the machine that I clearly don't use (file list option 'last accessed' date to find out) and have done something of a clear-out (decluttering, it's called) while trying to find unresolved issues in advance of actually needing stuff to work. I'm sure more will arise over the next couple of weeks, but for now I think the system is working properly. That is a poor measure, since my sensitivity to what 'properly' means has shifted. 

DJS 20210525

Top pic not mine, from Google images.

Just because you're paranoid, doesn't mean they aren't after you. Joseph Heller, Catch-22

Yes, I'm paranoid — but am I paranoid enough? David Foster Wallace, Infinite Jest

Some people think this is paranoia, but it isn't. Paranoids only think everyone is out to get them. Wizards know it.  Terry PratchettSourcery

I thought General Patton said the one about being paranoid enough. I found a similar quote attributed to Tom Clancy. 

Gen. George S. Patton has many good quotes attached to his name. Among them:

“If everybody is thinking alike, then somebody isn't thinking.”             Too right; the boss surrounded by yes-men is doomed.

“Don't tell people how to do things, tell them what to do and let them surprise you with their results.” 

“Lead me, follow me, or get the hell out of my way.” 

“Better to fight for something than live for nothing.”

“The test of success is not what you do when you are on top. Success is how high you bounce when you hit the bottom.”

“Do your duty as you see it, and damn the consequences.”                      Been there, more than once.

An active mind cannot exist in an inactive body.”   I don't agree completely, but I have long felt that an active body has a more easily active mind.

“Coward: someone who in a bad situation thinks with his feet”

You have to make the mind run the body. Never let the body tell the mind what to do… the body is never tired if the mind is not tired.” 

“A leader is a man who can adapt principles to circumstances.”


Evening of 25th, spent trying to find out why Mail is really struggling to download Mail from somewhere. Weird, because there were very few mails on either the scoins.net server or the me.com server. When this 500 mails eventually arrived, they turn out to be old stuff, and at the same time a very few (maybe two) mails were briefly seen on my laptop but never appeared on the bigger machine, which is what rules, so those seen briefly have vanished. Also weird. Overnight OS xi upgraded from 11.3 to 11.4.

Internal names (bracketed) after wines and then apples, while public names big cats (Cheetah to Mountain Lion) and then places in California state.

Mavericks is a surfing location, Yosemite is a national park, El Capitan is a rock formation within that park, the Sierra (Nevada) mountains run into the High Sierra region in the NE corner of the state, while the Mojave is a desert in the SE. Catalina is an island down south and Big Sur is a mountainous section of the central coast. I marked a google Earth view showing these locations. And mapoftheweek inserted below.

List here uses a short return, opt-enter, which may come 'undone' on your display.

OS X 10 beta: Kodiak - 13 September 2000
OS X 10.0: Cheetah - 24 March 2001
OS X 10.1: Puma - 25 September 2001
OS X 10.2: Jaguar - 24 August 2002
OS X 10.3 Panther (Pinot) - 24 October 2003
OS X 10.4 Tiger (Merlot) - 29 April 2005
[OS X 10.4.4 Tiger (Chardonnay)]
OS X 10.5 Leopard (Chablis) - 26 October 2007
OS X 10.6 Snow Leopard - 28 August 2009
OS X 10.7 Lion (Barolo) - 20 July 2011
OS X 10.8 Mountain Lion (Zinfandel) - 25 July 2012
OS X 10.9 Mavericks (Cabernet) - 22 October 2013
OS X 10.10: Yosemite (Syrah) - 16 October 2014
OS X 10.11: El Capitan (Gala) - 30 September 2015
macOS 10.12: Sierra (Fuji) - 20 September 2016
macOS 10.13: High Sierra (Lobo) - 25 September 2017
macOS 10.14: Mojave (Liberty) - 24 September 2018
macOS 10.15: Catalina (Jazz) - 7 October 2019
macOS 11: Big Sur - 12 November 2020
       wikipedia on the topic  No more name changes?

11.1 14Dec20        The switch from Intel chip
                              to the Apple M1 chip applies here,      

11.2  01Feb21          but new enough Intel machines
                                  (like mine, late 2015) will run OS11

11.3  26Apr21                   

11.4  24 May21          

11.5   available in beta form already                

Covid            Email: David@Scoins.net      © David Scoins 2021