Here at Bugfender we obviously think logging is a good idea. Which is why it might be surprising that we want to talk about what not to log, but that’s the topic of this post (along with an alternative to logging sensitive information that can help with debugging). Some things are simply too sensitive (like access tokens) while other things are subject to privacy or security regulations (like medical information or credit cards – regulated in the USA by HIPPA and PCI DSS respectively).
But Isn’t Bugfender Secure?
Of course! We take security very seriously at Bugfender. We carefully encrypt data in transit between the mobile device and Bugfender and when you are using our web app to view logs. Also, all log data is private and can only be viewed by authorized users on your account (though under special circumstances – such as customer support or maintenance – we may view logs if necessary).
But it’s still a good idea for you to limit where security critical or private data is stored. Partially because it reduces the chances of exposure if a breach happens. But also because developers or administrators may not think to keep logs as carefully secured as a password database or personal information in a database. It’s simply too easy for log data to get pasted into an email or attached to a bug report. If you keep your logs free of sensitive information then they are easier to manage on a daily basis.
What Are 1000 Apps Doing to Reduce Their Bugs to Zero?
Down to Zero is a practical guide to solving bugs. Our team has been developing mobile apps for over a decade and we'd like to share some tips we've picked up along the way.
Regulatory compliance is also more complex the more systems you involve. While our on-premise edition includes the technical safeguards for HIPPA compliance, the self-service cloud version does not yet meet all of the non-technical requirements. Further, we currently operate out of data centers in several locations (currently the US and Europe), which can complicate regulatory compliance. If you have specific security requirements please contact us (and keep reading for a tip that may handle some of your requirements). But in general, it’s simply easier to avoid regulatory issues by not logging sensitive information.
Information You Should Not Log
So what kind of sensitive information are we talking about not logging? Here is a brief list to help you get started.
Plaintext passwords or API secrets – you should never store plain text passwords on any system. Not only does it weaken the security of your system, but because users often reuse passwords, it can weaken the security of other systems.
API Secrets and database passwords – API secrets are used to authenticate your app to your own or 3rd party APIs and should be treated similarly to plaintext passwords (though they need to be stored once so that you can use them). Always remember to use a public version of an API token if possible in your mobile apps (for example, Stripe calls these publishable keys).
Access tokens – any token or cookie that you give to a user to authenticate later sessions generally should not be logged or stored anywhere except your database and on clients that need them.
Personally identifiable information – as much as possible, it’s best to not store information about users other than user ids that you assign them. This is especially true for information such as phone numbers, addresses, and government ID numbers. Also, US regulations such as COPA sometimes limit the sharing of personal information about children with 3rd parties.
Bank account or payment information – credit card numbers or other payment information should not be logged.
Information users have output out of collection – this is jurisdiction and app specific, but remember not to log any information that a user has opted out of you collecting. This can include some forms of activity and session information if users have requested to not be tracked.
How Can I Debug If I Can’t Log?
Now, following these recommendations can make debugging a real pain sometimes. What if I need to know if an access token is being stored correctly or identify a user without logging their name or email address?
One solution is to store a hash of the data instead of the data itself. This allows you to verify that the client has the data you think it does without logging the actual data. Instead, you log the output of the hash function, the digest, and that let’s you check the validity of the data.
Note that for this to work, you also have to know the data you want to verify as you are reviewing the logs. This isn’t encryption. You have to know the data so that you can independently hash it to compare with the digest logged by the client.
How Hashed Log Entries Work
Let’s say that you have a secret stored on the client (e.g., “some secret”). You don’t want to log it because that would give the secret away. So instead you replace that with the digest of the secret (e.g., “4a4184feb9301962444895ac72e15cc17a73b82b”) and log that instead.
Because you also know the secret, when you are reviewing the log you can independently hash the secret and compare that digest with what was logged. If the digests match, you know that the client stored the secret correctly. If not, well, then you get to go and do some debugging.
In terms of example iOS code, it would look something like this (we’ll see the implementation behind this in a bit):
BFLog(@"Logging started"); // HashLogEntry hashes the input string and returns the digest NSString *digestOfSecretToBeLogged = HashLogEntry(@"SOME_SECRET"); BFLog(@"I'm logging the digest of a secret - here it is: %@", digestOfSecretToBeLogged);
And the log output looks like this:
2017-03-29 14:57:59.549 BugfenderBlog[9383:403] Logging started 2017-03-29 14:57:59.554 BugfenderBlog[9383:403] I'm logging the digest of a secret - here it is: f2bc123c08c3cc45c6c8843ddcbc3702f1dbabc068d58b903712b26024c5026d:9182162d616ab1e55505adb73439279777e1165c064cac6709d6f7a5cf840043143dc0e3f617b7f49edb239be6e4338cc9f7f97664bbc6ca1f14865cbb228161
You would see the same thing in the Bugfender dashboard:
You can then copy the hashed secret out of the log, and then check it (here using a small Python script):
./check_hash.py "SOME_SECRET" "f2bc123c08c3cc45c6c8843ddcbc3702f1dbabc068d58b903712b26024c5026d:9182162d616ab1e55505adb73439279777e1165c064cac6709d6f7a5cf840043143dc0e3f617b7f49edb239be6e4338cc9f7f97664bbc6ca1f14865cbb228161" hash matches
The output “hash matches” indicates that the log contained the same secret passed into the script.
Why Logging the Hash Is (Typically) Safe
The basic idea behind hashing is mapping an input value to an output digest with a hash function. That’s how the hash functions often used in the hash tables, maps, or dictionaries of your favorite programming language work. They let you replace a value – whether that value is a simple number, a long string of text, or even a large amount of binary data – with a fixed size digest (just a long number in hex).
Cryptographic hashes work the same way, they just have some additional properties that are useful in a security context:
- One-way – given the hash of a value it is infeasible to determine the input. “Infeasible” is the understated way cryptographers like to say “it would take a huge data center working for a long time.”
- Collision resistant – it’s infeasible (there’s that word again) to find two different inputs that map to the hash digest.
- Small changes in the input result in large changes to the digest, making it look like the inputs are not related.
These properties make it hard for an attacker to take a digest and determine what the input was. That’s exactly what we need here: we need to be able log the output in a way that it doesn’t reveal what we’ve logged.
Now, there is a big caveat. While it is “infeasible” to determine the input from the hash value, if you have some idea what the input might be you can basically guess to see if it’s what you think it is. This is how password cracking works. Because people tend to use similar passwords (“123456” anyone? qwerty? be honest!), software can be written that tries all of the common passwords and their variations against a database of hashed passwords. So don’t use this technique on easily guessable data.
Password hashing also uses a variety of tricks to make it harder crack passwords. I’m not going to cover how that works in detail here – feel free to read about it if you are interested.
Sample Code
Below is the sample code that matches the description above. Feel free to use it in any of your projects.