May 012013
 

Some notes for my fellows at University of Liverpool. Quite often we’ve to manage a lot of content, which means get it, digitize it, and manage it to finally end up writing something about it.

Let’s see it from where we want to have it:

All content should be digitized, and it should be indexable. Which means, all articles and books must somehow end up being PDFs with actual content searchable, i.e. they need to be OCR’d if they were scanned.

So first see how we get to those searchable PDFs.

1) Hard Cover books: Hack off the back using e.g. QCM-8200M heavy duty desktop cutter. Scan them using a fast scanner, e.g. Fujitsu ScanSnap S1500M. Run OCR on them using e.g. Abbyy command line version, wrapped into a recursive script like the one I published on pdfocrwrapper.cvs.sourceforge.net.

2) DRM’d version (VitalSource BookShelf): Unfortunately, these files are not indexable, so we need to export them. Fortunately, that’s relatively easy: Just use http://mnott.de/index.php/archives/371 as a reference (works on Mac). If you had the not yet OCR’d version, you’ll end up with a PDF as if it was scanned, which you can then again run through Abbyy.

3) ePub: ePub’s are not that nice as you’ll really want PDFs. So there’s a bunch of options of converting those; the best one I found is http://epub2pdf.com/

4) Adobe Digital Editions: There is, again, another option that you may get an DRM lock on your PDFs. If that’s so, try http://apprenticealf.wordpress.com/2010/11/18/dedrm-applescript-for-mac-os-x-10-5-10-6/

5) Books that are not available through the library: Buy them. Likewise, a google search for your author and title, with an appended set of keywords like download pdf, often helps. Make sure, though, to stay legal and pay for what you use; same is true for previews like using http://mybooksmgr.com/content/books-manager-1.2.4-setup.jar

Now that we’ve what we need, as searchable PDF, our Mac will automatically index them (I assume somewhat the same to be true for Windows).

So the first thing to do once we have the content, we need to put it into a library manager. As I’m using LaTeX, I obviously use BibDesk on Mac.

For reading PDF’s, there’s probably no better application on the Mac than Skim. It also operates well with GoodReader on the IPad, and both can sync using Dropbox.

For writing, I use LaTeX and as frontend TeXclipse. The formatting and automated referencing is so powerful, I even write forum posts with that using a scratch project. I’ve created some simple scripts which do word counting and conversion to HTML for me for copy-pasting into the forum.

HTH,

M

 Posted by at 13:09
May 012013
 

Here are two apple scripts I quickly hacked together to export from VitalSource Bookshelf:

The first one takes a digitized ebook, i.e. the stuff that is already OCRd:

-- -----------------------------------
--
-- BookShelf 2 PDF DRM Remover
--
-- © Matthias Nott
--
-- -----------------------------------
--
-- Count "Chapters" using Command-Down
--
-- If you have a scanned file rather than an ebook,
-- find out the number of pages you are allowed to
-- print at once and use BookshelfScan2PDF.
--
-- -----------------------------------


--
-- Set first chapter
--
set firstchapter to text returned of ¬
	(display dialog ¬
		"Enter first chapter to print" with title ¬
		"First Chapter" default answer ¬
		"1" buttons {"OK", "Cancel"} ¬
		default button 1 cancel button 2)

--
-- Set last chapter
--
set lastchapter to text returned of ¬
	(display dialog ¬
		"Enter last chapter to print" with title ¬
		"Last Chapter" default answer ¬
		"1" buttons {"OK", "Cancel"} ¬
		default button 1 cancel button 2)

--
-- Bring Application to Front
--
tell application "VitalSource Bookshelf"
	activate
end tell


tell application "System Events"
	--
	-- Jump to root chapter
	--
	tell application process "VitalSource Bookshelf"
		click menu item "Buch – Home" of menu "Buch" of menu bar item "Buch" of menu bar 1
	end tell
	
	--
	-- Jump to first chapter
	--	
	repeat with i from 1 to firstchapter - 1 by 1
		tell application "System Events" to keystroke (ASCII character 31) using command down
	end repeat
	
	--
	-- Iterate over all chapters
	--	
	repeat with i from firstchapter to lastchapter by 1
		--
		-- Set the File Name by Chapter Number
		--
		set fileName to i
		if i < 10 then
			set fileName to "00" & i
		else if i < 100 then
			set fileName to "0" & i
		else
			set fileName to i
		end if
		
		--
		-- Start Print Dialog and press Enter
		--
		tell application "System Events" to keystroke "p" using command down
		tell application "System Events" to keystroke return
		
		--
		-- Press the "PDF" button
		--		
		activate application "VitalSource Bookshelf"
		tell application "System Events"
			tell process "Bookshelf"
				click menu button "PDF" of window "Drucken"
			end tell
		end tell
		
		--
		-- Go down twice to get the Save as PDF option, then hit return
		--
		tell application "System Events" to keystroke (ASCII character 31)
		tell application "System Events" to keystroke (ASCII character 31)
		tell application "System Events" to keystroke return
		
		--
		-- Enter the filename
		--
		tell application "System Events" to keystroke fileName as string
		tell application "System Events" to keystroke return
		
		--
		-- Pause for 4 seconds to allow for printing
		--
		delay 4
		
		--
		-- Go to next chapter
		--
		tell application "System Events" to keystroke (ASCII character 31) using command down
	end repeat
	
	
end tell



The second one takes a file that was not yet OCRd in VitalSource:

-- -----------------------------------
--
-- BookShelf 2 PDF DRM Remover
--
-- © Matthias Nott
--
-- -----------------------------------
--
-- This prints an entire book to PDF
-- by chunks - use it if your DRM
-- document is a scanned document.
--
-- -----------------------------------

--
-- Bring Application to Front
--		
activate application "VitalSource Bookshelf"
tell application "System Events"
	--
	-- Jump to root chapter
	--
	tell application process "VitalSource Bookshelf"
		click menu item "Buch – Home" of menu "Buch" of menu bar item "Buch" of menu bar 1
	end tell
	
	
	--
	-- Start Print Dialog
	--
	tell application "System Events" to keystroke "p" using command down
end tell

--
-- Get the Printing Information:
--
-- fromPage will contain the first Page to print
-- toPage will contain the last Page to print
-- nPages will contain the number of Pages we are allowed to print at once
-- firstToPage will contain the number of the last page of the first print run
--		
activate application "VitalSource Bookshelf"
tell application "System Events"
	tell process "Bookshelf"
		tell application "System Events" to keystroke "p" using command down
		
		set fromPage to (get value of text field 1 of window "Print")
		set nPages to (get value of text field 3 of window "Print")
		
		-- Click spinner to get last page
		click button 2 of incrementor 1 of window "Print"
		set toPage to (get value of text field 1 of window "Print")
		
		-- Click spinner to get back to first page
		click button 1 of incrementor 1 of window "Print"
		
		set firstToPage to (get value of text field 2 of window "Print")
	end tell
end tell

-- -----------------------------------
-- Print first chunk
-- -----------------------------------
set chunk to 1
set fileName to "001"

activate application "VitalSource Bookshelf"
tell application "System Events" to keystroke return
--
-- Press the "PDF" button
--		
activate application "VitalSource Bookshelf"
tell application "System Events"
	tell process "Bookshelf"
		click menu button "PDF" of window "Drucken"
	end tell
end tell

--
-- Go down twice to get the Save as PDF option, then hit return
--
tell application "System Events" to keystroke (ASCII character 31)
tell application "System Events" to keystroke (ASCII character 31)
tell application "System Events" to keystroke return

--
-- Enter the filename
--
tell application "System Events" to keystroke fileName as string
tell application "System Events" to keystroke return

--
-- Pause for 4 seconds to allow for printing
--
delay 4

-- -----------------------------------
-- Print rest of book
-- -----------------------------------

set curPage to firstToPage

repeat with i from firstToPage + 1 to toPage by nPages
	
	
	--
	-- Set the File Name by chung Number
	--
	set chunk to chunk + 1
	
	set fileName to chunk
	if chunk < 10 then
		set fileName to "00" & chunk
	else if chunk < 100 then
		set fileName to "0" & chunk
	else
		set fileName to chunk
	end if
	
	--
	-- Start Print Dialog and set values
	--
	tell application "System Events" to keystroke "p" using command down
	
	set newToPage to i + nPages + 1
	
	activate application "VitalSource Bookshelf"
	tell application "System Events"
		tell process "Bookshelf"
			keystroke i as string
			keystroke tab
			keystroke tab
			keystroke nPages as string
			keystroke tab
			click button "Fortfahren" of window "Print"
		end tell
	end tell
	
	
	--
	-- Press the "PDF" button
	--		
	activate application "VitalSource Bookshelf"
	tell application "System Events"
		tell process "Bookshelf"
			click menu button "PDF" of window "Drucken"
		end tell
	end tell
	
	--
	-- Go down twice to get the Save as PDF option, then hit return
	--
	tell application "System Events" to keystroke (ASCII character 31)
	tell application "System Events" to keystroke (ASCII character 31)
	tell application "System Events" to keystroke return
	
	--
	-- Enter the filename
	--
	tell application "System Events" to keystroke fileName as string
	tell application "System Events" to keystroke return
	
	--
	-- Pause for 4 seconds to allow for printing
	--
	delay 4
	
	
end repeat



Now, that leaves you with a bunch of PDFs which you can easily concatenate with an automator script:

Thumbnail

 

 Posted by at 12:34
Mar 272013
 

If we develop the topic of cloud and big data in a more general context, the first thing to understand is that this is not about any particular vendor. We’re talking open source here, the moment you remove lock-ins—see Flume, Hadoop, Hive, Impala, to name just the Apache stack as an example. It also is not about the pure amount of data that we can handle. It is not about Tera-, Peta- or Exabytes.

It likewise is not really about knowing where you’re shopping and what. Those are rather “mundane” applications.

It is really about generating decision relevant information from that data.

And it is then about having the possibilities to implement products based on these decisions at marginal costs. Consider 3D printing. Today, we can not only print toys. We can print houses. We can print Nylon stockings. We can even print the functional equivalent of kidneys. In the future, a very small number of experts will be required to convert the observed “templates” into mathematical models to serve to drive reproduction machines.

Since quite a while already, we can create about any gene code—only if the argument would arise that biological processes are too complicated. They are quite complicated. So are the “printers.” Three years ago, the Craig Venter Institute generated the first bacterium DNA, creating “the first species to have its parents be a computer” (see http://news.bbc.co.uk/2/hi/science/nature/8695992.stm).

Against this background, the current Cloud strategies that the big players are running are pale with regards to what actually is happening. Cloud is not about credit card transactions. It is not about flight data. It is not about where someone is or when he goes to the restroom (Netflix). Those are specific applications that allow some people today to collect some relatively low hanging fruits.

It is about aggregating all available information. It is about all you can know, historically as well as what is currently generated—or what do you think Google Glasses is really about? Today, we have the necessary resources to do that. And we do have the algorithms to make sense from unstructured data.

In terms of strategy theory, Cloud and Big Data hammer in the last nail to “Inside-Out’s” coffin. You reduce information cost to zero and at the same time create choice.

That’s going to have massive impacts in the future. Very likely, learning processes are going to change fundamentally. The way you learn and the speed with which you learn is going to dramatically change and does so already: About two hundred years ago, it was unlikely for the average person in the UK to come across as much information in his lifetime as you can read in one single edition, today, of the New York Times. What’s more, the production processes are going to be largely simplified. Software development processes suddenly can be applied to hardware if you can allow for mistakes—resulting in massively reduced release cycles and much higher innovation rate. You could think that the information age and the way it works is specific for, well, intangible goods. It no longer is.

Intellectually, it will depend much less on your IQ what you can know and understand and hence utilize. It will not depend on your genes when you want to do what or when you want to go to sleep. In a more general sense, the “IQ” is a very extensible concept (individually as well as socially). As an example, for quite a while, people thought that we’re just using like 10 % of our brains. That’s nonsense. We’re always using everything that’s made available to us. Whether we use it in a sensible way is a different matter altogether. And by that I don’t mean time-wasting activities like watching “Brits got Talent” (they surely have). I also refer to what our brain does in order to have available, in extreme situations, a massive reserve that allows us to “fight or flight.” Since these days, the number of lurking sabretooth tigers has decreased remarkably, that potential should not sit unused most of the time, and it needn’t be: We can understand those processes both from the point of view of psychology (making resources available using, e.g., Hypnosis, NLP, Meditation, and others) as well as chemically (which brings us to Coffee, Coca-Cola, Red Bull, and their successors). As one result, I am very convinced that we’ll still be there to see that sleep is going to be a very optional activity.

Now, what does that mean to strategy?

Like in the Coca-Cola examples, the bargaining power of the suppliers is massively reduced as they merely deliver very fundamental products that our “3D printers” can turn into anything we like.

A new entrant can be deterred only by either investment into knowing the “production formula” of the goods we want to “print”, e.g. secured by patents (a temporary factor only) but countered by “generics” (“open source”).

Substitutes are ubiquitous the moment you can add value in a different way. Just think about becoming a car rental company. If you can add value by providing for a different kind of service, the actual task (there: getting from a to b), can be substituted by different means (helicopters, boats rather than cars) or different levels of service (driver, luxury car, etc.).

Existing competitors over the same products or services are likely to operate all at the PPF, if we assume production processes are optimal and information freely available.

Buyers have a strong bargaining power (as we’re the suppliers of goods that anyone else with the same information can likewise produce).

With this little game of thoughts, let me get some “Tea, Earl Grey, hot” and go back more boring things.

Best regards,

Matthias

 Posted by at 23:01
Feb 282013
 

I recently had my APC Smart-UPS 3000 RM report to me that I needed to replace the battery. Sure thing, I ordered a replacement from Ebay, plugged in and thought everything should be just fine.

Well, it wasn’t. The battery was not charged, and neither reset nor power cycling the UPS helped. Today, I was lucky enough to find this Video on Youtube:

http://www.youtube.com/watch?v=pAwMRn15z60

Fortunately, also, I was having an old 9 pole m-f serial cable which I happily plugged into the back side of the APS first, and then into my server’s serial port. At which time everything was powered down.

It of course helps to RTFM before the fact, so I saw this scheme:

APC-Serial

And as I had a spare female serial connector (it helps to never throw those kinds of things away; I used to make those cables myself about 20 years ago) and of course a soldering iron, I modified the cable that I had accordingly.

Since my server is running on Linux, instead of using Hyperterm, I used minicom with the following /etc/minicom/minirc.dfl:

pu port /dev/ttyS1
pu baudrate 2400
pu rtscts No
pu xonxoff No

Under ubuntu, you can install minicom just like apt-get install minicom. Then run minicom as root, and it should connect you to the APC. If not, you can use minicom -o -s to modify the settings.

The idea from the above Youtube video in essence goes like this: You somehow use a terminal program to connect to your APC UPS, then

  1. Press Shift+Y to get a SW prompt
  2. Press 1 two times with about 2 seconds interval to get a PROG prompt
  3. Press 0 to get a status, where everything except 8C is not what you want
  4. Press – as often as it takes to get the status set to 8C
  5. Press Shift+R to save and exit

In the Youtube video, it also says to use + to confirm the value; this was not working for me; + is just the opposite of -, i.e. it cycles the other way round.

In the comments to the Youtube video, I found a note that if you have an APC Network adapter plugged into the back of your UPS, then hitting those 1 two times at an interval of 2 seconds would not give you the PROG prompt. This was the case for me; I could remove the network card while the UPS was running, and got to the PROG prompt immediately after. After saving the value (Shift+R) I could just plug back in the network card; after a couple of seconds, I was able to reconnect using telnet or http as usual.

So bottom line: Make sure to build (or order) the correct serial cable.

 

 

 Posted by at 13:40
Jan 302013
 

Moving to VirtualBox is just a great idea when you’re coming from VMWare Server 2.0 which is no longer supported. VirtualBox is a lot more modern, also allows for a headless setup, and it is a lot faster. The only drawback I saw is that VMWare Server allows for memory overcommitment (though I did have issues with that, see my OOM post) – which means I just had to upgrade my server from 32 GB to 64 GB.

Now some problem that I ran into is this:

cp: not writing through dangling symlink `/etc/initramfs-tools/modules'

when doing the occasional apt-get upgrade. Turns out that when you move to vbox, you’re obviously no longer able to run the vmware tools that you may have installed. Following this post, it is an easy fix:

$ cd /etc/initramfs-tools/
$ rm modules
$ ln -s modules.AfterVMwareToolsInstall modules

Hope, this helps someone.

 

 Posted by at 09:07
Jan 302013
 

Today’s one of those days where you start with a clear plan of what you want to do, and then you end up fixing something entirely different. Since I’ve built most of my virtual machines based on a small set of templates, at some point I got bored by having to keep them updated manually (and there’s a lot of them). So one day I switched them all to automatic updates.

Now’s the other day where all of them decided to throw errors. Basically, my boot drive was chosen to small, and the automated update process didn’t remove older kernel images.

Easy to fix, though I’ve to find a way to automate that (you’ve to make sure you’re running on the latest kernel, i.e. a reboot), if you use this script from the command shell (I found it somewhere on the net but have lost track):

dpkg -l 'linux-*' | sed '/^ii/!d;/'"$(uname -r | sed "s/\(.*\)-\([^0-9]\+\)/\1/")"'/d;s/^[^ ]* [^ ]* \([^ ]*\).*/\1/;/[0-9]/!d' | xargs sudo apt-get -y purge
 Posted by at 09:00
Aug 082012
 

Right. I do know CVS is old and all, but I like it. Actually I’ve a ton of stuff in my CVS server. And have written all kinds of scripts that digest that stuff and do stuff with it. So no, I’ll not switch any time soon.

Having said that, and having installed Mountain Lion lately, I found out, cvs was no longer working. So here’s what I did (and I assume you already have a CVS root and just want to get it working locally on your MacOS):

  1. Install XCode. This breaks an eventually preexisting cvs installation. cvs is now located in /Developer/usr/bin.
  2. Checked to have the following in /etc/services:
cvspserver 2401/udp # cvspserver
cvspserver 2401/tcp # cvspserver
  1. Create a /etc/xinetd.d/cvspserver with something like
service cvspserver
{
 port = 2401
 socket_type = stream
 protocol = tcp
 user = root
 wait = no
 type = UNLISTED
 server = /Developer/usr/bin/cvs
 server_args = -f --allow-root /pgm/cvs pserver
 disable = no
}
  1. Make sure to see the server line pointing to the cvs executable, and the server_args pointing to your CVS root (/pgm/cvs in my case).
  2. killall -1 xinetd
  3. Test with (replace mnott by your username):
export CVSROOT=:pserver:mnott@127.0.0.1:/pgm/cvs
cvs login
 Posted by at 09:58
Aug 072012
 

Apple Mail seems to use its own location for downloading attachments that you double-click on. On my machine, this was in

~/Library/Containers/com.apple.mail/Data/Library/Mail\ Downloads

I wanted those to go into

~/Downloads

And since Apple as a company keeps thinking about even the illiterate, it hides the Library folder, and even if you unhide it for the Finder and create a “Shortcut”, it won’t work. So, open Terminal.app and do

mv ~/Library/Containers/com.apple.mail/Data/Library/Mail\ Downloads/* ~/Downloads
rm -rf ~/Library/Containers/com.apple.mail/Data/Library/Mail\ Downloads
ln -s ~/Downloads ~/Library/Containers/com.apple.mail/Data/Library/Mail\ Downloads

To fix that.

Interestingly, Apple Mail’s preferences did point to ~/Downloads all the time. But that’s used only when you do a “Save to Downloads Folder.”

Oh well…

 Posted by at 13:58
Aug 062012
 

Your Mileage May Vary. But on my machine (TM), the battery life substantially dropped on my 2010 MBP after installation of ML. By substantially I mean really really badly. From 4 hours previously to down to about 1.5 hours.

Surfing around about that problem, I did find the programm gfxCardStatus which allows me to define which graphics card I’m using. Interestingly, I was using the NVIDIA GeForce GT330M instead of the Intel HD Graphics. Using gfxCardStatus, I can switch that to use only Intel HD Graphics – which is totally sufficient for my day-kind-of-work. And yes, the projected battery life went back to where it was before.

I have the impression that with ML (it wasn’t the case with Lion, at least not for me), something makes my MBP use the external graphics card all the time, with no reason to do so.

gfxCardStatus, moreover, allows me to choose to allow having the external graphics whenever I’m on power supply, and to use the internal graphics when on battery.

Actually, I’m going to run on the internal graphics most of the time anyway – when watching flash videos, for example, finally the MBP heats up a lot less, and also remains silent.

I’m no gamer, so I’ve no need for that extra boost of graphics power anyway.

But, as I said, YMMV.

 Posted by at 16:21
Dec 022011
 

I’ve had increasing problems with two MacBook Pro, one I5, one I7, both from 2010. Reading about badly applied thermal paste on the net, I decided to give that a try. Using this excellent walkthrough from iFixit.com, I was able to eliminate the problems entirely. Monitoring the I7, I can see that the processor temperature dropped from 60 °C down to hardly over 35 °C doing the exact same job. The fan’s are very silent right now. The whole process takes about 30 minutes if you do it the second time; the first time may take you like an hour.

So here’s some material that I used, all from iFixit (the only additional tool I used was a head lamp with a magnifying glass that I recently bought in a road-side store in Taipei). First of all, I used a Magnetic Project Mat. This really helps keeping all those tiny screws in place. Next, I used the Pro Tech Base Toolkit. Those screw drivers are really excellent, very good selection of bits, and excellent grip. Don’t even try with your 5$ selection of screw drivers from Home Depot… The spudger from the set comes in handy for removing all those tiny ribbon cables. The anti static wrist wrap should be a no-brainer for you. Then, I used the “pro” Version of the Probe Set. The tiny tip served very well to remove any remaining old thermal paste from in between of those resistors. By the way, there was so much excess paste on the NVidia Chip that it had run over the resistors (see images below) – one effect that I had was that when watching Flash movies, I had diagonal flashes across the screen quite often. That’s also history now. Next, I used Arctic Silver ArctiClean to remove the old thermal paste. The easiest thing is to use QTips to remove the old paste, as you can work very precisely with them – and thus avoid spreading the old paste everywhere. The removal kit comes with a solvent for the old paste, and once you’ve done that, you use the second component (they are labelled “1″ and “2″) to prepare the surface for the new thermal paste. Bigger bits of paste can be removed with one of the tweezers from the Pro Tech Base Toolkit. Following some guides on the net, for the first computer I applied the new thermal paste using a very tiny droplet of Arctic Silver Thermal Paste on the tip of my index finger that I wrapped into some plastic film. For the second computer, I used QTips wrapped in plastic film, which allowed even more precise application. Since the Thermal Paste is basically metallic silver, you want to avoid at all costs spreading it outside of the application area. I applied a very thin film on both the chips as well as their counter parts of the heat pipes.

Total cost of material used (except the lamp): 141.75 $.

Here’s the first computer after umounting the heat pipes:

As you can clearly see, there is much too much thermal paste, it is everywhere where it shouldn’t be, i.e. outside the chips but not really on them. No wonder that the fans were inefficient to remove the heat!

Here’s the same view, after removing the old thermal paste:

Then, with the new paste applied:

And here are the products used:

The second computer had the exact same problem. This confirms this to be a general problem with MacBook Pro’s:

Again, after cleaning:

This shows how (and, in particular, how little) to apply:

After application of the thermal paste:

So that’s all for now. While assembling everything back together, you can actually benefit from the occasion to remove the battery, pry the white lamp that stupidly keeps flashing when the laptop is in standby mode, and put some black adhesive over the lamp. I haven’t documented that, but you’ll find out yourself. The lamp sits in front of the battery, so you’ll have to remove the battery first (and you’ll need to remove the three very special screws for that; the right bit is part of the Pro Tech Base Toolkit); then you can use one of the metal spudgers of the same toolkit to remove the lamp. It is just glued there, but comes off pretty easily; it can be put back just as easily afterwards).

Hope, this helps…

 Posted by at 20:03