Predicting indoor location using machine learning and wifi information

06/03/2021

After trying out a Python project called whereami, I spent some time trying to replicate it in JavaScript.

The aim of the package is to predict indoor location using wifi information and machine learning.

The user first needs to record some training data for each location and then, a random forest classifier uses this data to predict the user's current location.

The package I built, whereami.js, relies on node-wifi and random-forest-classifier.

Recording data

To record data, the node-wifi module scans for nearby networks and returns a bunch of information.

networks = [
    {
        ssid: '...',
        bssid: '...',
        mac: '...', // equals to bssid (for retrocompatibility)
        channel: <number>,
        frequency: <number>, // in MHz
        signal_level: <number>, // in dB
        quality: <number>, // same as signal level but in %
        security: 'WPA WPA2' // format depending on locale for open networks in Windows
        security_flags: '...' // encryption protocols (format currently depending of the OS)
        mode: '...' // network mode like Infra (format currently depending of the OS)
    },
    ...
];

For this particular project, I mainly need to keep the network ssid, bssid, and its quality. The further you are from your router, the lower the quality of the connection is, and the closer, the higher. Therefore, by gathering quality data from all nearby networks detected, we can use this information to train a classifier and predict an indoor location.

After scanning the networks and keeping the information I need, I am saving it in a JSON file, with the location as filename. The data in the files look like this:

// kitchen.json
[
  {
    "Fake wifi1 14:0c:76:7a:68:90": 12,
    "Fake wifi2 48:4a:e9:05:a2:72": 14,
    "Fake wifi3 48:4a:e9:05:a2:71": 14,
    "Fake wifi4 68:a3:78:6d:a3:20": 18,
    "Fake wifi5 00:07:cb:71:54:35": 18,
    "Fake wifi6 cc:2d:1b:61:18:f5": 24,
    "Fake wifi7 e8:1d:a8:0c:62:7c": 40,
    "Fake wifi8 7a:8a:20:b4:f1:28": 58,
    "Fake wifi9 78:8a:20:b4:f1:28": 60,
    "Fake wifi10 e8:1d:a8:0c:62:78": 80,
    "Fake wifi11 e8:1d:a8:0c:5b:c8": 116,
    "Fake wifi12 e8:1d:a8:0c:5b:cc": 102,
  },
  ...
];

For the purpose of my project, I am scanning the networks 5 times to gather more training data for each location.

Once the data is recorded for multiple locations, I am using a random forest classifier to predict the current location.

Formatting

Between recording and predicting, the data needs to be formatted.

To improve the accuracy of the predictions, we need to make sure the networks present in the training data objects are the same, between all training sessions, but also when recording live data to be predicted.

If a network was present when training one room but wasn't found when training another, this data shouldn't be used.

To clean up the training data, I decided to use the live data as reference, and filter out the networks that weren't present in all objects. Once I have data that's filtered, I order each object alphabetically, and add a new pair of key/value data to represent the room.

In the end, the training data looks like:

[
  {
    "Fake wifi1 e8:1d:a8:0c:5b:c8": 114,
    "Fake wifi2 e8:1d:a8:0c:5b:cc": 102,
    "Fake wifi3 e8:1d:a8:0c:62:78": 80,
    "Fake wifi4 e8:1d:a8:0c:62:7c": 40,
    "Fake wifi5 cc:2d:1b:61:18:f5": 26,
    "Fake wifi6 48:4a:e9:05:a2:72": 14,
    room: 0,
  },
  ...
  {
    "Fake wifi1 e8:1d:a8:0c:5b:c8": 116,
    "Fake wifi2 e8:1d:a8:0c:5b:cc": 102,
    "Fake wifi3 e8:1d:a8:0c:62:78": 80,
    "Fake wifi4 e8:1d:a8:0c:62:7c": 40,
    "Fake wifi5 cc:2d:1b:61:18:f5": 24,
    "Fake wifi6 48:4a:e9:05:a2:72": 14,
    room: 1,
  },
  ...
  {
    "Fake wifi1 e8:1d:a8:0c:5b:c8": 114,
    "Fake wifi2 e8:1d:a8:0c:5b:cc": 104,
    "Fake wifi3 e8:1d:a8:0c:62:78": 80,
    "Fake wifi4 e8:1d:a8:0c:62:7c": 42,
    "Fake wifi5 cc:2d:1b:61:18:f5": 24,
    "Fake wifi6 48:4a:e9:05:a2:72": 18,
    room: 2,
  },
  ...
];

Now that the training data is ready, we can move on to predicting live data.

Predicting

The classifier I am using is a random forest classifier, using the random-forest-classifier module. Running the prediction is done with the following code:

var rf = new RandomForestClassifier({
  n_estimators: 10,
});

const classes = ["bedroom", "bathroom", "kitchen"];

rf.fit(trainingData, null, "room", function(err, trees) {
  var pred = rf.predict([formattedLiveData], trees);
  return classes[pred[0]]; // the room predicted.
});

The 1st argument of the fit function is the training data formatted, the 2nd argument is the features used and we pass null to use all features in the training objects. If we wanted to only use some properties and not others, we would pass an array of these properties. The 3rd argument is the target feature, the room we want to predict.

In the callback, we call the predict function, passing the formatted live data. This returns the value of the room predicted, in this case 0, 1 or 2, as there are 3 rooms in the training dataset. And finally, we return the name of the room predicted.

Applications

Ultimately, I would like to be able to use this kind of tool to build IoT projects.

If something similar could run on a mobile phone, that we have on ourselves most of the time, predicting indoor location could be used to control appliances or interfaces.

For example, lights could be turned on/off as someone is moving around their house, TV could be paused as you're leaving the living room, phone notifications could be muted automatically at night if you're in your bedroom.

At the moment, this cannot run on mobile the way it is built, and the Network API that runs in the browser doesn't give enough information to build something similar on the client-side. However, it could run on a RaspberryPi so I could build some kind of small wearable to test this.

That's it for now! 🙂