Jahab: Python AI & ML

Machine Learning in Python
-----------------------------------
-In Machine Learning it is common to work with very large data sets.
-Machine Learning is for! Analyzing data and predicting the outcome!
-machine learning using Python’s powerful libraries like Scikit-learn, TensorFlow, and Keras

1) Scikit-Learn: A simple and efficient tool for data mining and data analysis.(Package : pip install -U scikit-learn)
2) TensorFlow: An open-source library for numerical computation and large-scale machine learning.
3) Keras: A high-level neural networks API, written in Python and capable of running on top of TensorFlow.
4) PyTorch: An open-source machine learning library based on the Torch library.

Basic Workflow:
---------------
-Here’s a typical workflow for a machine learning project in Python:

1) Data Collection: Gather the data you need for your model.
2) Data Preprocessing: Clean and prepare your data for analysis.
3) Model Selection: Choose the appropriate machine learning model.
4) Training: Train your model using your dataset.
5) Evaluation: Evaluate your model’s performance.
6) Prediction: Use the model to make predictions on new data.

Data Types
-----------
-To analyze data, it is important to know what type of data we are dealing with.
-We can split the data types into three main categories:

1) Numerical
2) Categorical
3) Ordinal

1) Numeical:
---------------
- Numerical data are numbers, and can be split into two numerical categories:

a) Discrete Data : counted data that are limited to integers. Example: The number of cars passing by.
b) Continuous Data : measured data that can be any number. Example: The price of an item, or the size of an item

2) Categorical
------------------
-Categorical data are values that cannot be measured up against each other. Example: a color value, or any yes/no values.

3) Ordinal
-------------
-Ordinal data are like categorical data, but can be measured up against each other. Example: school grades where A is better than B and so on.
============================================
In Machine Learning (and in mathematics) there are often three values that interests us:

1) Mean - The average value
2) Median - The mid point value
3) Mode - The most common value
4) std - Standard deviation
5) percentile - Percentiles are used in statistics to give you a number that describes the value that a given percent of the values are lower than

Ex1:
---
import numpy
from scipy import stats

speed = [45,55,67,89,90]
x1 = numpy.mean(speed)
print(x1)

x2 = numpy.median(speed)
print(x2)

x3 = stats.mode(speed)
print(x3)

x4 = numpy.std(speed)
print(x4)
---------------------------
Ex2:
---
import numpy
ages = [5,31,43,48,50,41,7,11,15,39,80,82,32,2,8,6,25,36,27,61,31]
x = numpy.percentile(ages, 75) #What is the 75. percentile? The answer is 43, meaning that 75% of the people are 43 or younger.
print(x)
---------------------------
Data Distribution
-----------------
1. uniform - random array, of a given size, and between two given values.
2. normal - create an array where the values are concentrated around a given value.

Ex1: uniform
---
import numpy
x = numpy.random.uniform(0.0, 5.0, 250)
print(x)

Ex2: uniform
---
import numpy
import matplotlib.pyplot as plt
x = numpy.random.uniform(0.0, 5.0, 250)
plt.hist(x, 5)
plt.show()

Ex3: uniform
---
import numpy
import matplotlib.pyplot as plt
x = numpy.random.uniform(0.0, 5.0, 100000)
plt.hist(x, 100)
plt.show()
---------------------------
Ex1: normal
---
import numpy
import matplotlib.pyplot as plt
x = numpy.random.normal(5.0, 1.0, 100000)
print(x)

Ex2: normal
---
import numpy
import matplotlib.pyplot as plt
x = numpy.random.normal(5.0, 1.0, 100000)
plt.hist(x, 100)
plt.show()

Ex3: normal using Scatterplot
---
import numpy
import matplotlib.pyplot as plt
x = numpy.random.normal(5.0, 1.0, 1000)
y = numpy.random.normal(10.0, 2.0, 1000)
plt.scatter(x, y)
plt.show()
==============================
1. Scikit-Learn (Sklearn)
------------------------
-Scikit-learn (Sklearn) is the most useful and robust library for machine learning in Python.
-It provides a selection of efficient tools for machine learning and statistical modeling including
1. Classification(logistic regression, decision trees, random forests, support vector machines (SVMs) and gradient boosting),
2. Regression(predicting continuous outputs)
3. Clustering(grouping data points into similar clusters) and
4. Dimensionality reduction(reducing the number of features in your data) via a consistence interface in Python.

a) classification(predicting categorical labels) - Logistic Regression Algorithm
--------------
import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, classification_report

# Load Iris dataset
iris = datasets.load_iris()
X = iris.data
y = iris.target

# Splitting the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Standardizing features
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Training the logistic regression model
log_reg = LogisticRegression()
log_reg.fit(X_train, y_train)

# Making predictions on the testing set
y_pred = log_reg.predict(X_test)

# Evaluating the model
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)
--------------------------------
Ex: Classification - KNN Classifier Algorithm

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier

# Load the Iris dataset
iris = load_iris()
X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size=0.2, random_state=42)

# Initialize the KNN classifier
knn = KNeighborsClassifier(n_neighbors=3)

# Train the classifier
knn.fit(X_train, y_train)

# Make predictions on the test data
predictions = knn.predict(X_test)

# Evaluate the model
accuracy = knn.score(X_test, y_test)
print("Accuracy:", accuracy)
-------------------------------
Ex: Linear Regression Algorithm
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error

# Load the California Housing dataset
housing = fetch_california_housing()
X_train, X_test, y_train, y_test = train_test_split(housing.data, housing.target, test_size=0.2, random_state=42)

# Initialize the Linear Regression model
lr = LinearRegression()

# Train the model
lr.fit(X_train, y_train)

# Make predictions on the test data
predictions = lr.predict(X_test)

# Evaluate the model
mse = mean_squared_error(y_test, predictions)
print("Mean Squared Error:", mse)
---------------------------------------
Ex: Clustering - KMeans Algorithm

from sklearn.datasets import load_iris
from sklearn.cluster import KMeans

# Load the Iris dataset
iris = load_iris()

# Initialize the KMeans clustering model
kmeans = KMeans(n_clusters=3)

# Fit the model to the data
kmeans.fit(iris.data)

# Get the cluster labels
cluster_labels = kmeans.labels_

print("Cluster Labels:", cluster_labels)
--------------------------------------
Ex: Dimensionality Reduction
from sklearn.datasets import load_digits
from sklearn.decomposition import PCA

# Load the digits dataset
digits = load_digits()

# Initialize PCA for dimensionality reduction
pca = PCA(n_components=2)

# Apply PCA to the data
reduced_data = pca.fit_transform(digits.data)

print("Original data shape:", digits.data.shape)
print("Reduced data shape:", reduced_data.shape)
---------------------------------------
***********************************************
TensorFlow python
----------------
- TensorFlow is a Google product, which is one of the most famous deep learning tools widely used in the research area of machine learning and deep neural network.
- TensorFlow is basically a software library for numerical computation using data flow graphs where:

1) nodes in the graph represent mathematical operations.
2) edges in the graph represent the multidimensional data arrays (called tensors) communicated between them.

Variables:
---------
-TensorFlow has Variable nodes too which can hold variable data.
-They are mainly used to hold and update parameters of a training model.
-Variables are in-memory buffers containing tensors.

ex1:
---
import tensorflow as tf
a = tf.constant(10)
b = tf.constant(32)
print(a+b)
-------------------
===============================================
speech recognition using AI with Python
---------------------------------------
#ex1 : Analyze the audio file
import numpy as np
from scipy.io import wavfile

frequency_sampling, audio_signal = wavfile.read("1.wav")
print('\nSignal shape:', audio_signal.shape)
print('Signal Datatype:', audio_signal.dtype)
print('Signal duration:', round(audio_signal.shape[0] / float(frequency_sampling), 2), 'seconds')
---------------
#ex2:
import numpy as np
import matplotlib.pyplot as plt
from scipy.io import wavfile

frequency_sampling, audio_signal = wavfile.read("1.wav")
audio_signal = audio_signal / np.power(2, 15)
signal = audio_signal [:100]
time_axis = 1000 * np.arange(0, len(signal), 1) / float(frequency_sampling)
plt.plot(time_axis, signal, color='blue')
plt.xlabel('Time (milliseconds)')
plt.ylabel('Amplitude')
plt.title('Input audio signal')
plt.show()
------------------
NLTK (Natural Language Toolkit Package)
---------------------------------------
-Natural Language Toolkit
-It is used for classification, tokenization, stemming, tagging, parsing, and semantic reasoning

#ex1: download
import nltk

# Download essential datasets and models
nltk.download('punkt') # Tokenizers for sentence and word tokenization
nltk.download('stopwords') # List of common stop words
nltk.download('wordnet') # WordNet lexical database for lemmatization
nltk.download('averaged_perceptron_tagger_eng') # Part-of-speech tagger
nltk.download('maxent_ne_chunker_tab') # Named Entity Recognition model
nltk.download('words') # Word corpus for NER
nltk.download('punkt_tab')
------------------------------
Tokenization:
-------------
-Tokenization is one of the common preprocessing tasks.
-It involves splitting text into smaller units—tokens.
-These tokens can be words, sentences, or even sub-word units, depending on the task.

#ex2 : Word Tokenize
from nltk.tokenize import word_tokenize
sentence = "NLTK makes natural language processing easy."
tokens = word_tokenize(sentence)
print(tokens)
-------------------------
#ex3 : Send Tokensize
import string
from nltk.tokenize import word_tokenize, sent_tokenize
text = "Natural Language Processing (NLP) is cool! Let's explore it."

# Sentence Tokenization - splits the text into sentences
sentences = sent_tokenize(text)
print("Sentences:", sentences)
---------------------
#Stemming and lemmatization are techniques used to reduce words to their base or root form.
#ex4
from nltk.stem import PorterStemmer, WordNetLemmatizer
from nltk.tokenize import word_tokenize

word = "running"
stemmer = PorterStemmer()
stemmed_word = stemmer.stem(word)

lemmatizer = WordNetLemmatizer()
lemmatized_word = lemmatizer.lemmatize(word)

print("Stemmed Word:", stemmed_word)
print("Lemmatized Word:", lemmatized_word)
----------------------------------------
#Chunking is the process of grouping words together based on their part-of-speech tags.
#Parsing is the process of analyzing the grammatical structure of a sentence.
#ex5
import nltk
sentence=[("a","DT"),("clever","JJ"),("fox","NN"),("was","VBP"),
("jumping","VBP"),("over","IN"),("the","DT"),("wall","NN")]
grammar = "NP:{<DT>?<JJ>*<NN>}"
parser_chunking = nltk.RegexpParser(grammar)
parser_chunking.parse(sentence)
Output_chunk = parser_chunking.parse(sentence)
Output_chunk.draw()
------------------------
#chatbot
import nltk
import re

def chatbot():
while True:
user_input = input("User: ")
user_input = user_input.lower()
user_input = re.sub(r'[^\w\s]', '', user_input)
tokens = nltk.word_tokenize(user_input)

if 'hello' in tokens:
print("Chatbot: Hi there!")
elif 'bye' in tokens:
print("Chatbot: Goodbye!")
break
else:
print("Chatbot: Sorry, I didn't understand.")

chatbot()

Jahab

Friday, 29 August 2025

Python AI & ML

No comments:

Post a Comment

PowerBi - Dashboard

Report Abuse