Archive

Posts Tagged ‘C/C++’

Fast memory copy (SSE4)

January 29th, 2011 No comments

Visual Studio 2008 has supports “Enable Instruction Functions” options (see a project settings -> C/C++ -> Optimization). Note that this option can enlarge code.

Also memcpy function implementation has written with using sse2 (movdqa).

int CopyMemSSE4(int* piDst, int* piSrc, unsigned long SizeInBytes)
{
// Initialize pointers to start of the USWC memory

_asm
{
mov esi, piSrc
mov edx, piSrc

// Initialize pointer to end of the USWC memory
add edx, SizeInBytes

// Initialize pointer to start of the cacheable WB buffer
mov edi, piDst

// Start of Bulk Load loop
inner_start:
// Load data from USWC Memory using Streaming Load
MOVNTDQA xmm0, xmmword ptr [esi]
MOVNTDQA xmm1, xmmword ptr [esi+16]
MOVNTDQA xmm2, xmmword ptr [esi+32]
MOVNTDQA xmm3, xmmword ptr [esi+48]

// Copy data to buffer
MOVDQA xmmword ptr [edi], xmm0
MOVDQA xmmword ptr [edi+16], xmm1
MOVDQA xmmword ptr [edi+32], xmm2
MOVDQA xmmword ptr [edi+48], xmm3

// Increment pointers by cache line size and test for end of loop
add esi, 040h
add edi, 040h
cmp esi, edx
jne inner_start
}
// End of Bulk Load loop

return 0;
}

#define DATA_SIZE 0x01000000

int main(int argc, char* argv[])
{
int *piSrc = NULL;
int *piDst = NULL;
unsigned long dwDataSizeInBytes = sizeof(int) * DATA_SIZE;

piSrc = (int *)_aligned_malloc(dwDataSizeInBytes, dwDataSizeInBytes);
piDst = (int *)_aligned_malloc(dwDataSizeInBytes, dwDataSizeInBytes);

memset(piSrc, 255, dwDataSizeInBytes);
memset(piDst, 0, dwDataSizeInBytes);

CopyMemSSE4(piDst, piSrc, dwDataSizeInBytes);

_aligned_free(piSrc);
_aligned_free(piDst);
}

Additional links:

integer types with specified widths

January 27th, 2011 No comments

stdint.h is a header file in the C standard library introduced in the C99 standard library section 7.18 to allow programmers to write more portable code by providing a set of typedefs that specify exact-width integer types, together with the defined minimum and maximum allowable values for each type, using macros. This header is particularly useful for embedded programming which often involves considerable manipulation of hardware specific I/O registers requiring integer data of fixed widths, specific locations and exact alignments. stdint.h (for C), and stdint.h and cstdint (for C++).

stdint.h defines:

int8_t
int16_t
int32_t
uint8_t
uint16_t
uint32_t

stdint.h is not shipped with older C++ compilers and Visual Studio C++ products prior to Visual Studio 2010.

wikipedia

Low Level Virtual Machine (LLVM)

October 6th, 2010 No comments

The Low Level Virtual Machine (LLVM) is a compiler infrastructure, written in C++, which is designed for compile-time, link-time, run-time, and “idle-time” optimization of programs written in arbitrary programming languages. Originally implemented for C/C++, the language-independent design (and the success) of LLVM has since spawned a wide variety of front ends, including Objective-C, Fortran, Ada, Haskell, Java bytecode, Python, Ruby, ActionScript, GLSL, and others.

LLVM can provide the middle layers of a complete compiler system, taking intermediate form (IF) code from a compiler and outputting an optimized IF that can then be converted and linked into machine-dependent assembler code for a target platform. LLVM can accept the IF from the GCC toolchain, allowing it to be used with a wide array of existing compilers written for that project.

LLVM can also generate relocatable machine code at compile-time or link-time or even binary machine code at run-time.

LLVM supports a language-independent instruction set and type system. Each instruction is in static single assignment form (SSA), meaning that each variable (called a typed register) is assigned once and is frozen. This helps simplify the analysis of dependencies among variables. LLVM allows code to be compiled statically, as it is under the traditional GCC system, or left for late-compiling from the IF to machine code in a just-in-time compiler (JIT) in a fashion similar to Java. The type system consists of basic types such as integers or floats and five derived types: pointers, arrays, vectors, structures, and functions. A type construct in a concrete language can be represented by combining these basic types in LLVM. For example, a class in C++ can be represented by a combination of structures, functions and arrays of function pointers.

Additional links:

Standard Template Library (STL) lectures

September 10th, 2010 No comments

In the following series, learn all about STL from the great Stephan T. Lavavej, Microsoft’s keeper of the STL cloth (this means he manages the partnership with the owners of STL and Microsoft, including, of course, bug fixes and enhancements to the STL that ships as part of Visual C++).

  • Part 1 (sequence containers)
  • Part 2 (associative containers)
  • Part 3 (smart pointers)
  • Part 4 (an extended example of using the STL to solve Nurikabe puzzles)
  • etc

Alexander A. Stepanov

September 6th, 2010 No comments

Alexander Alexandrovich Stepanov (Russian: Александр Александрович Степанов) (born November 16, 1950 in Moscow) is the primary designer and implementer of the C++ Standard Template Library [1], which he started to develop around 1992 while employed at HP Labs. He had earlier been working for Bell Labs close to Andrew Koenig and tried to convince Bjarne Stroustrup to introduce something like Ada Generics in C++.

Лекция «Наибольшая общая мера последние 2500 лет» (часть 1 и часть 2)
Слайды: англ и рус.

Лекция «Преобразования и их орбиты» (часть 1 и часть 2)

Elements of Programming – (November 3, 2010) Speakers Alexander Stepanov and Paul McJones give a presentation on the book titled “Elements of Programming”. They explain why they wrote and attempt to explain their book. They describe programming as a mathematical discipline and that it is extremely useful and should not be overlooked.

Stepanov’s homepage

97 Things Every Programmer Should Know

August 18th, 2010 No comments

97 Things Every Programmer Should KnowGet 97 short and extremely useful tips from some of the most experienced and respected practitioners in the industry, including Uncle Bob Martin, Scott Meyers, Dan North, Linda Rising, Udi Dahan, Neal Ford, and many more. They encourage you to stretch yourself by learning new languages, looking at problems in new ways, following specific practices, taking responsibility for your work, and becoming as good at the entire craft of programming as you possibly can.

O’Relly homepage

There is the 97 Things Every Programmer Should Know project, pearls of wisdom for programmers collected from leading practitioners. You can read through the Contributions Appearing in the Book.

Russian translation of these tips.

Levenshtein distance

July 28th, 2010 No comments

In information theory and computer science, the Levenshtein distance is a metric for measuring the amount of difference between two sequences (i.e. an edit distance). The term edit distance is often used to refer specifically to Levenshtein distance.

The Levenshtein distance between two strings is defined as the minimum number of edits needed to transform one string into the other, with the allowable edit operations being insertion, deletion, or substitution of a single character. It is named after Vladimir Levenshtein, who considered this distance in 1965. (wikipedia with code, russian version)

QThread

June 18th, 2010 No comments

QThread was designed and is intended to be used as an interface or a control point to an operating system thread, not as a place to put code that you want to run in a thread. We object-oriented programmers subclass because we want to extend or specialize the base class functionality. The only valid reasons I can think of for subclassing QThread is to add functionality that QThread doesn’t have, e.g. perhaps providing a pointer to memory to use as the thread’s stack, or possibly adding real-time interfaces/support. Code to download a file, or to query a database, or to do any other kind of processing should not be added to a subclass of QThread; it should be encapsulated in an object of it’s own.

You’re doing it wrong…

// create the producer and consumer and plug them together
Producer producer;
Consumer consumer;

bool bOk = producer.connect(&consumer,
                            SIGNAL(consumed()),
                            SLOT(produce()));
Q_ASSERT(bOk);
bOk = consumer.connect(&producer,
                       SIGNAL(produced(QByteArray *)),
                       SLOT(consume(QByteArray *)));
Q_ASSERT(bOk);

// they both get their own thread
QThread producerThread;
producer.moveToThread(&producerThread);

QThread consumerThread;
consumer.moveToThread(&consumerThread);

// go!
producerThread.start();
consumerThread.start();

Reference: Threading without the headache or QThread’s no longer abstract (see attached file)

Software optimization resources

April 11th, 2010 No comments

Unscrambling C Declarations

March 28th, 2010 No comments

Note: Based on some feedback I should clarify that this does not cover C99 syntax

Even though the C programming language has been around since the late 1960’s, many programmers still have trouble understanding how C declarations are formed. This is not unsurprising due to the complexity that can arise when mixing pointer, array and function-pointer declarations.

In this posting we shall look at some complex declarations to try and understand them by considering how they are formed. The intent is not so you can go off and write wonderfully complex declarations, but more hopefully you may actually be able to understand someone else’s code. Finally we shall look at how most complex declarations can be easily simplified.

Here I’m going to focus on object declarations/definitions rather than functions. Also, in this posting I’m not going to examine structure, union or enumeration specifies. They’ll keep for another day.

Tags: , ,