Geochemical Facies Analysis notes
August 11, 2018
Some notes on analyzing XRF data using scikit-learn for personal use.
PREPARE THE DATA
The data we are analyzing consists of X-ray fluorescence (XRF) measurements of cuttings from a lateral section of an unconventional well. It is given in a comma-separated .csv file.
One of the most important transformations needed for the data is feature scaling. Machine learning algorithms don’t perform well if the different features have very different scales. Note that the depth is on an increasing scale.
from sklearn.preprocessing import scale data = geochem_df.iloc[:, 2:] #dataframe op data = scale(data)
In geochemistry, certain ratios of elements can be used to glean information about physical, chemical or biological effects. In this example, the three ratios we focused on are: Si/Zr, Si/Al, and Zr/Al.
geochem_df['Si/Zr'] = geochem_df['SiO2'] / geochem_df['Zr'] geochem_df['Si/Al'] = geochem_df['SiO2'] / geochem_df['Al2O3'] geochem_df['Zr/Al'] = geochem_df['Zr'] / geochem_df['Al2O3']
Determine the most influential factors and reduce the number of features to only those that contribute the most. Some features are highly corralated while others are unimportant for machine learning.
- Extract and scale data.
- Determine number of factors using Principle Component Analysis / Common Factor Analysis.
- Determine k-number of clusters.