EvolutionIQ HOWTO: Best Practices for Working Remotely / Working From Home (WFH)

EvolutionIQ HOWTO: Best Practices for Working Remotely / Working From Home (WFH)

author: Tomas Vykruta

Advantages of Remote Work

Working remotely benefits us individually, and benefits the company too. However, as is the case with all great things in life, it requires learning a set of new skills. Read this document carefully.

The benefits are very real and measurable:

  • Increased productivity – a shared, open floor plan environment is distracting. The most distracting thing at home is the fridge!
  • Increased happiness – working from home on occasion has been shown to make people happier!
  • Decrease commute time – zero commute time is tough to beat!

 

However, as superheroes have taught us throughout the years:

With great power comes great responsibility.

Transparency

When you are working from home, you will need to maintain a higher level of transparency. 

  • Provide thorough updates in daily standups
  • Respond to slack messages timely and thoughtfully
  • Read document sent to you quickly and provide feedback quickly

The quality bar for the above should be higher when you are working remotely compared to when you’re in the office. You’ll really appreciate this when your coworkers do it correctly.

Communication Skills

Related to transparency, aim to achieve a higher level of communication when working remotely.

  • Use a headset with a microphone for all meetings. Don’t force your counterpart to have to turn up their speaker and ask you to repeat yourself or speak louder.
  • If you’re hosting a meeting, always come prepared with an agenda
  • If you’re joining a meeting, make sure you’re read up on documents before the meeting.
  • Prefer communicating on public channels in slack, avoid direct messages
  • Sign in 1 minute early for meetings

 

Make Your Presence Felt. Be Extra Social. 

Give people kudos on slack often. When your team mates shares a cool tip, writes a document or submits a change you care about, let  them know. Emojis on slack are just fine! Each interaction strengthens your relationships. We all want to be loved.

  • If you’re stepping out for lunch or an errand, let everyone know, and set an appropriate /away message in slack.
  • Say “Good morning everyone” when you start the day, and “signing off! See you tomorrow” when your day ends. 

Dedicated, Quiet Work Physical and Digital Space

When working remotely, set yourself up for success.

  • Set up a dedicated desk with a monitor
  • May seem obvious – but clear the desk of clutter. Treat it like a professional work space, and it will treat you the same way.
  • Turn off the TV, social media tabs
  • Avoid working from distracting places like the beach or a park. Your home work space is more relaxing, quieter, consistent, and has a fridge and bathroom close to it.
  • Treat your digital space the same way: set up a dedicated work profile on your computer. Avoid digital distractions.

Have a Plan

Understand what you want to achieve, what you need to accomplish and how it fits into a larger business model. 

Time Commitment

It’s easy when you are operating remotely to completely lose track of time. It’s much harder to be self disciplined. Do the right thing. Don’t let your teammates down.

  • Set your alarm to wake up on time
  • Keep regular working hours
  • Do take occasional breaks
  • Avoid working overtime, burnout is a real thing

Set Expectations with your Loved Ones

Set expectations with your roommates / spouse / kids about your working hours and distractions. Setting expectations early on is much easier on everyone, and you’ll earn extra respect you for taking your career seriously.

Share your agenda and calendar so they know when not to interrupt you.

Take Advantage of the Freedom

If you’re feeling burned out, take advantage of your remote location. Go outside for a solo walk, breathe in the fresh air, enjoy the sunshine and quiet time to think. 

Improve and Give Feedback

Like any skill in life, working remotely requires practice and discipline. There’s always room for improvement. Keep improving, and help your teammates improve too by giving them (sometimes critical) feedback.

Avoid Personal Slack Conversations

Prefer to use public channels, not personal conversations on Slack. If a conversation starts in a personal chat, don’t be afraid to copy and paste it and move it into a public channel.

DeepFraud AI named to CIO Insurance Outlook’s Top 10 of 2019!

A digital transformation is no longer an option for insurers. Changing business strategies and dynamic customer demands make it critical for them to embrace AI, IoT, and use them intelligently across the value chain for superior brand differentiation and profitability. By processing the real-time data collected by sensors through AI, carriers can understand their clients more deeply, resulting in the creation of new product categories, more personalized pricing, and real-time service delivery.

Thanks to AI, smart insurance companies are experiencing a transformation in business strategy from “detect and repair” to “predict and prevent.” Organizations are developing cutting edge AI-based solutions to transform the insurance industry through the audio, image, and video analysis, wherein insurers can now automate several processes and settle claims quickly. For instance, an individual having a car insurance policy can file an instant claim in the event of an accident by uploading pictures of the crash via mobile app. AI-based algorithms, programmed to curate relevant claims-related data from images, can then accurately gauge the extent of damage, and automate the claims assessment process. Auto insurers can also launch a usage-based insurance model, wherein case a policyholder drives recklessly, sensors would swiftly detect and transmit the data to the insurer’s IoT platform for a corresponding increase in the premium.

https://artificial-intelligence.insuranceciooutlook.com/vendor/deepfraud-the-onestop-solution-for-fraud-detection-triage-and-investigation-cid-434-mid-54.html

Strategies for Data Imbalance in Fraud Classifiers

For years, fraudsters would file claims and collect money from overstated or completely made up injuries and disabilities.

Nevertheless, experts predict online insurance fraud to soar to a whopping $32 billion in 2020.

Putting it into perspective, this amount is superior to the profits posted recently by some worldwide household, blue chip companies in 2017, such as Coca-Cola ($2 billions), Warren Buffet’s Berkshire Hathaway ($24 billions) and JP Morgan Chase ($23.5 billions).

In addition to the implementation of chip card technology, companies have been investing massive amounts in other technologies for detecting fraudulent transactions.

Would Machine Learning & AI constitute great allies in this battle?

Classification Problems

In Machine Learning, problems like fraud detection are usually framed as classification problems —predicting a discrete class label output given a data observation. Examples of classification problems that can be thought of are Spam DetectorsRecommender Systems and Loan Default Prediction.

Talking about the payment fraud detection, the classification problem involves creating models that have enough intelligence in order to properly classify transactions as either legit or fraudulent, based on transaction details such as amount, merchant, location, time and others.

Financial fraud still amounts for considerable amounts of money. Hackers and crooks around the world are always looking into new ways of committing financial fraud at each minute. Relying exclusively on rule-based, conventionally programmed systems for detecting financial fraud would not provide the appropriate time-to-market. This is where Machine Learning shines as a unique solution for this type of problem.

The main challenge when it comes to modeling fraud detection as a classification problem comes from the fact that in real world data, the majority of transactions is not fraudulent. Investment in technology for fraud detection has increased over the years so this shouldn’t be a surprise, but this brings us a problem: imbalanced data.

Imbalanced Data

Imagine that you are a teacher. The school director gives you the task of generating a report with predictions for each of the students’ final year result: pass or fail. You’re supposed to come up with these predictions by analyzing student data from previous years: gradesabsencesengagement together with the final result, the target variable?—?which could be either pass or fail. You must submit your report in some minutes.

The “problem” here is that you are a very good teacher. As a result, almost none of your past students has failed your classes. Let’s say that 99% of your students have passed final year exams.

What would you do?

The most faststraightforward way to proceed in this case would be predicting that 100% of all your students would passAccuracy in this case would be 99% when simulating past years. Not bad, right?

Would this “model” be correct and fault proof regardless of characteristics from all your future student populations?

Certainly not. Perhaps you wouldn’t even need a teacher to do these predictions, as anyone could simply try guessing that the whole class would pass based on data from previous years, and still achieve a good accuracy rate. Bottomline is that this prediction would have no value. And one of the most important missions of a Data Scientist is creating business value out of data.

How would you predict their final year results?

We’ll take a look into a practical case of fraud detection and learn how to overcome the issue with imbalanced data.

Our Data

Our dataset contains transactions made by users of european cardholders. This dataset presents transactions that occurred in two days, where we have 492 frauds out of 284,807 transactions. The dataset is highly imbalanced, with the positive class (frauds) accounting for 0.172% of all transactions.

First 5 observations from our data, showing the first 10 variables.

It is important to note that due to confidentiality reasons, the data was anonymized?—?variable names were renamed to V1, V2, V3 until V28. Moreover, most of it was scaled, except for the Amount and Class variablesthe latter being our binary, target variable.

It’s always good to do some EDA?—?Exploratory Data Analysis before getting our hands dirty with our prediction models and analysis. But since this is an unique case where most variables add no context, as they’ve been anonymized, we’ll skip directly to our problem: dealing with imbalanced data.

Only 0.17% of our data is positively labeled (fraud).

There are many ways of dealing with imbalanced data. We will focus in the following approaches:

    1. Oversampling?—?SMOTE
    1. Undersampling?—?RandomUnderSampler
  1. Combined Class Methods?—?SMOTE + ENN

Approach 1: Oversampling

Oversampling.

One popular way to deal with imbalanced data is by oversampling. To oversample means to artificially create observations in our data set belonging to the class that is under represented in our data.

One common technique is SMOTE?—?Synthetic Minority Over-sampling Technique. At a high level, SMOTE creates synthetic observations of the minority class (in this case, fraudulent transactions). At a lower level, SMOTE performs the following steps:

    • Finding the k-nearest-neighbors for minority class observations (finding similar observations)
  • Randomly choosing one of the k-nearest-neighbors and using it to create a similar, but randomly tweaked, new observations.

There are many SMOTE implementations out there. In our case, we will leverage the SMOTE class from the imblearn library. The imblearn library is a really useful toolbox for dealing with imbalanced data problems.

To learn more about the SMOTE technique, you can check out this link.

Approach 2: Undersampling

Undersampling.

Undersampling works by sampling the dominant class to reduce the number of samples. One simple way of undersampling is randomly selecting a handful of samples from the class that is overrepresented.

The RandomUnderSampler class from the imblearn library is a fast and easy way to balance the data by randomly selecting a subset of data for the targeted classes. It works by performing k-means clustering on the majority class and removing data points from high-density centroids.

Approach 3: Combined Class Methods

Mixing Oversampling and Undersampling

SMOTE can generate noisy samples by interpolating new points between marginal outliers and inliers. This issue can be solved by cleaning the resulted space obtained after over-sampling.

In this regard, we will use SMOTE together with edited nearest-neighbours(ENN). Here, ENN is used as the cleaning method after SMOTE over-sampling to obtain a cleaner space. This is something that is easily achievable by using imblearn’s SMOTEENN class.

Initial Results

Our model uses a Random Forests Classifier in order to predict fraudulent transactions. Without doing anything to tackle the issue of imbalanced data, our model was able to achieve 100% precision for the negative class label.

This was expected since we’re dealing with imbalanced data, so for the model it’s easy to notice that predicting everything as negative class will reduce the error.

We have some good results for precision, considering both classes. However, recall is not as good as precision for the positive class (fraud).

Let’s add one more dimension to our analysis and check the Area Under the Receiver-Operating Characteristic (AUROC) metric. Intuitively, AUROCrepresents the likelihood of your model distinguishing observations from two classes. In other words, if you randomly select one observation from each class, what’s the probability that your model will be able to “rank” them correctly?

Our AUROC score is already pretty decent. Were we able to improve it even further?

So, Have We Won?

After using oversamplingundersampling and combined class approaches for dealing with imbalanced data, we got the following results.

  • SMOTE

By using SMOTE in order to oversample our data, we got some mixed results. We were able to improve our recall for the positive class by 5%?—?we reduced false positives. However, that came with a price: our precision is now 5%worse than before. It is common to have a precision?—?recall trade-off in Machine Learning. In this specific case, it is important to analyze how would this impact us financially.

In one side, we have the financial amount that would be lost due to the increase in false negatives, thus decreased fraud detection precision. On the other, we could potentially be losing clients due to wrongfully classifying transactions as fraud, as well as increasing operational costs for cancelling credit, printing new ones and posting them to the clients.

In terms of AUROC, we got a slightly better score:

  • RandomUnderSampler

Undersampling proved to be a bad approach for this problem. While our recall score has improved, precision for the positive class has almost vanished.

The results above show us that it wouldn’t be a good strategy to use undersampling for dealing with our imbalanced data problem.

  • SMOTE + ENN

SMOTE + ENN proved to be the best approach in our scenario. While precision was penalized by 5% like with SMOTE, our recall score was increased by 7%.

As for the AUROC metric, the result was also better:

Recap

In this post, I showed three different approaches to deal with imbalanced data?—?all of the leveraging the imblearn library:

    1. Oversampling (using SMOTE)
    1. Undersampling (using RandomUnderSampler)
  1. Combined Approach (using SMOTE+ENN)

Key Takeaways

    • Imbalanced data can be a serious problem for building predictive models, as it can affect our prediction capabilities and mask the fact that our model is not doing so good
    • Imblearn provides some great functionality for dealing with imbalanced data
    • Depending on your data, SMOTERandomUnderSampler or SMOTE + ENN techniques could be used. Each approach is different and it is really the matter of understanding which of them makes more sense for your situation.
  • It is important considering the trade-off between precision and recall and deciding accordingly which of them to prioritize when possible, considering possible business outcomes.