Sunny and cloudy days with Mathematica

I recently had a chat with a friend and fellow data scientist about two fun topics—weather and Wolfram Mathematica. On the topic of Mathematica, I was telling him that I had not used it in a long time. On the topic of weather, we were discussing what makes a day a good day and what makes it’s a bad day. You can imagine the criteria we came up with. Is it a sunny day or is it a cloudy day? What color is the sky? Gray or blue? And then we started geeking it out to determine how you would write a program to determine how given a set of pictures how you could classify then into nice days and “not so nice” days.

This conversation inspired me and prompted me to explore Mathematica and to see how I could use it to solve this problem. To my surprise, it took just a few minutes and one line of code to come up with the answer. It has been a long time since I had used Mathematica and it was amazing to see how far it has come in the intervening years. When folks think of the latest data science languages, Python and R readily come to mind, but it was truly a pleasure to see the machine learning features that Mathematica provides out of the box and how easy it is to get started.

In the following paragraphs, I will give you a quick overview of a few of the machine learning features that come with Wolfram Mathematica version 11.3. Eventually, all building up to the climactic finale with the one-liner program for weather classification mentioned above.

Let’s start with a very simple example. The Mathematica function used for language identification. LanguageIdentify takes pieces of text, and identifies what human language is being used:

 

There is similar functionality to identify images. Yes, you can pass an image as a parameter in Mathematica!

It is very easy to start doing sentiment analysis in Mathematica:

As well as identify famous people:

Other built-in classifiers are:

  • “Language” – Determine the language used for a string
  • “FacebookTopic” – Determine the topic or theme using a list of standard topics used by Facebook
  • “NameGender” – Take an educated guess at the gender for a given name. Billy, Rose, Joe, etc.
  • “Profanity” – Suggest if a given phrase contains profanity or not. I didn’t try it with images, but it would be interesting to try out.
  • “Spam” – Classify emails by their content.

Another classification function in Mathematica which is pretty nifty will give an idea of how “near” something is to something else. Here is an example of the function using numbers:

It also works with letters, colors, words, and images to name a few other supported data types. Let’s take a moment to pause and realize how simple Mathematica makes implementing data science algorithms. If we were implementing this function in Python or R, we would already be knee deep talking about “k-means”, “sci-kit learn”, etc. In Mathematica’s implementation, I am guessing it does use a state of the art algorithm to implement k-means but the details are hidden under the hood and we can just use the function.

This was all a long-winding road that is taking us to the promised solution to the “sunny-cloud day problem”. Here it is:

The Classify method took two parameters. A mapping of pictures labeling them “Sunny” or “Cloudy” and second parameter of a previously unseen and unlabeled landscape picture. The result is the label that Mathematica assigns to the previously unseen and unlabeled picture.

So that was so easy it was a little anticlimactic. But it’s simple because Mathematica is doing a lot of work under the covers. Humans are extremely good at classification. If you showed the images above to a six-year-old child, they would most likely be able to get the pattern quickly with just the images given above. The most advanced convolutional neural networks (CNN’s) require thousands of images during training to solve the same problem. Once trained they can expand their domain knowledge with just a few more pictures using these previous “memories”. For example, I could give a CNN a few pictures of me and another friend and it would be able to recognize and distinguish both of us because it already has thousands of other pictures of other people in it’s “memory”. Probably not unlike the six-year-old child that has their “database” of previously seen and classified images.

I don’t know the internal details of the Mathematica implementation, but I suspect that a similar method is being used here. Wolfram offers another product called Wolfram Alpha which is an extensive knowledge base in a wide variety of topics and it probably includes a database of pictures that supported the function call above.

This article scratched the surface of the capabilities of Mathematica, but hopefully it can motivate you to also do some exploration on your own. Feel free to reach out if you have any questions. Happy coding.